From anjan.purkayastha at gmail.com  Mon Mar  3 12:31:11 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 12:31:11 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
Message-ID: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>

hi
i am tried to use the perl wrappers for EMBOSS with:

use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
use Bio::Factory::EMBOSS;

however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
directory mentioned above.

so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
attached error message.

any ideas on what i need to do to  make this work?
all advice will be appreciated.

tia,

anjan


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emboss_install_error_message.rtf
Type: application/rtf
Size: 123212 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080303/08ee41b2/attachment-0001.rtf>

From cjfields at uiuc.edu  Mon Mar  3 13:54:06 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 12:54:06 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <AC7CFEBA-5699-45A0-9426-8885AEA78216@uiuc.edu>

You'll need to install bioperl-run.  Bio::Factory::EMBOSS is in  
bioperl-run, not the main bioperl distribution (aka bioperl-core).

chris

On Mar 3, 2008, at 11:31 AM, ANJAN PURKAYASTHA wrote:

> hi
> i am tried to use the perl wrappers for EMBOSS with:
>
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
>
> however it seems that Bio::Factory::EMBOSS cannot be found in the  
> bioperl
> directory mentioned above.
>
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i  
> got the
> attached error message.
>
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
>
> tia,
>
> anjan
>
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> < 
> emboss_install_error_message 
> .rtf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Mon Mar  3 14:34:20 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 3 Mar 2008 20:34:20 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>

Hi Anjan,

Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
rather part of bioperl-run. For some reason CPAN went for the old (1.4)
version of bioperl-run rather than the current 1.5.2.

And indeed, I seem to run into the same problem:
cpan> d /bioperl/

Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found

but when I ask in a different way the right distributions show up. [Sendu,
any idea what's going on here?]

cpan> ls
SENDU
 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz

So try doing

cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz

Or if CPAN refuses to cooperate, you can grab it from here:
http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release


Dave

From arareko at campus.iztacala.unam.mx  Mon Mar  3 14:25:14 2008
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 03 Mar 2008 13:25:14 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <47CC509A.10306@campus.iztacala.unam.mx>

Hi Anjan,

It looks like you are using the latest BioPerl developer release
(bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
then you should try installing the latest BioPerl-run as well
(bioperl-run-1.5.2_100). After you install it, you'll have to modify
your 'use lib' pragma for your script to work as you expect:

use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
use Bio::Factory::EMBOSS;

Hope this helps.

Regards,
Mauricio.


ANJAN PURKAYASTHA wrote:
> hi
> i am tried to use the perl wrappers for EMBOSS with:
> 
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
> 
> however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
> directory mentioned above.
> 
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
> attached error message.
> 
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
> 
> tia,
> 
> anjan
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Mon Mar  3 15:05:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 14:05:16 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
Message-ID: <43EC247B-EC01-483D-82B1-D861590A141A@uiuc.edu>


On Mar 3, 2008, at 1:34 PM, Dave Messina wrote:

> Hi Anjan,
>
> Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
> rather part of bioperl-run. For some reason CPAN went for the old  
> (1.4)
> version of bioperl-run rather than the current 1.5.2.
>
> And indeed, I seem to run into the same problem:
> cpan> d /bioperl/
>
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
>
> but when I ask in a different way the right distributions show up.  
> [Sendu,
> any idea what's going on here?]

It's marked as a developer release, which I think requires a full path  
(as you have below) and not just the package name.

chris

> cpan> ls
> SENDU
> 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
>  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
>   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
>  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz
>
> So try doing
>
> cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz
>
> Or if CPAN refuses to cooperate, you can grab it from here:
> http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release
>
>
> Dave


From anjan.purkayastha at gmail.com  Mon Mar  3 14:57:33 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 14:57:33 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <47CC509A.10306@campus.iztacala.unam.mx>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
Message-ID: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>

guys,
thanks! i got bioperl-run to work.
next question, let's say i want to run the palindrome program in emboss
using the bioperl wrapper.   now, palindrome takes in a list of parameter
values- these are fed into emboss as a key-value hash. where do i find the
correct names of the keys to create the input hash?
tia.

anjan

On Mon, Mar 3, 2008 at 2:25 PM, Mauricio Herrera Cuadra <
arareko at campus.iztacala.unam.mx> wrote:

> Hi Anjan,
>
> It looks like you are using the latest BioPerl developer release
> (bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
> then you should try installing the latest BioPerl-run as well
> (bioperl-run-1.5.2_100). After you install it, you'll have to modify
> your 'use lib' pragma for your script to work as you expect:
>
> use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
> use Bio::Factory::EMBOSS;
>
> Hope this helps.
>
> Regards,
> Mauricio.
>
>
> ANJAN PURKAYASTHA wrote:
> > hi
> > i am tried to use the perl wrappers for EMBOSS with:
> >
> > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> > use Bio::Factory::EMBOSS;
> >
> > however it seems that Bio::Factory::EMBOSS cannot be found in the
> bioperl
> > directory mentioned above.
> >
> > so i tried to install Bio::Factory::EMBOSS from the cpan website. i got
> the
> > attached error message.
> >
> > any ideas on what i need to do to  make this work?
> > all advice will be appreciated.
> >
> > tia,
> >
> > anjan
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 03:48:15 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 09:48:15 +0100
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
Message-ID: <47CD0CCF.4060306@medecine.unige.ch>

Hello,

Trying to run Bio::TreeIO by this command:

perl -e 'use Bio::TreeIO'

I get the following error:

Weak references are not implemented in the version of perl at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
recent version of bioperl around 5 month ago. Any suggestions of why 
this module can't be loaded correctly?

Greetings, Daniel

From bix at sendu.me.uk  Tue Mar  4 06:55:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 11:55:32 +0000
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
In-Reply-To: <47CD0CCF.4060306@medecine.unige.ch>
References: <47CD0CCF.4060306@medecine.unige.ch>
Message-ID: <47CD38B4.1070200@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Trying to run Bio::TreeIO by this command:
> 
> perl -e 'use Bio::TreeIO'
> 
> I get the following error:
> 
> Weak references are not implemented in the version of perl 
> [...]
> I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
> recent version of bioperl around 5 month ago. Any suggestions of why 
> this module can't be loaded correctly?

Redhat/Fedora apparently has Perl issues.

First try installing the latest version of Scalar::Util yourself:

perl -MCPAN -e shell
force install Scalar::Util

If that doesn't work, you'll have to download and compile Perl yourself 
from source (don't use Fedora's installation system).

From apapanicolaou at ice.mpg.de  Tue Mar  4 07:03:27 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:03:27 +0100
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
Message-ID: <47CD3A8F.9050902@ice.mpg.de>

hello all,

1) I was wondering if you would you know what this error means and had 
time to help...

Use of uninitialized value in concatenation (.) or string at 
/usr/local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287

line 287 is
            else {
                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
            }

this is the header
# $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
#
# BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm


this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8, 
bioperl: tried with both 1.5.2_102 from cvs and checked out svn version 
today)

use Bio::SearchIO::Writer::GbrowseGFF;
use Bio::SearchIO;
if ($program eq "blastn"){
#my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                        -file   => ">$infile.$query.blast.gff");
#my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                           -file   => ">>$infile.blast.gff");
$out_gff->write_result($result);
$out_gff_whole->write_result($result);
}


Where $result is a blast result...

The aim is to parse a multi-query blast report and split it into 
different queries and make another file with all the queries. I'm sure 
i'm forgetting something but I can't figure what...

The GFF file is produced, but I do get the error above...

2) Finally, there is a small bug but I don't think it comes from this 
module? The id attribute is printed out e.g iD=match_sequence31 with iD 
wrongly capitalised...

many thanks for your time
alexie

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From apapanicolaou at ice.mpg.de  Tue Mar  4 07:04:16 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:04:16 +0100
Subject: [Bioperl-l] Gbrowse.pm followup
Message-ID: <47CD3AC0.4080801@ice.mpg.de>

Oh the iD bug is fixed in the svn developer branch.

ta
a

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From cjfields at uiuc.edu  Tue Mar  4 08:16:04 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 07:16:04 -0600
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <4A68AA28-E508-4257-86E1-393CA9B74082@uiuc.edu>

I have run into a number of problems with the GbrowseGFF module myself  
(I think I committed the ID fix, actually).  It works but needs  
revision and needs better conformity with GFF3.

You can post (1) as a bug and well look into it when we can.  It's  
possible (depending on how extensive the fix is) this may have to wait  
until 1.7.

chris

On Mar 4, 2008, at 6:03 AM, Alexie Papanicolaou wrote:

> hello all,
>
> 1) I was wondering if you would you know what this error means and  
> had time to help...
>
> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>          else {
>              $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>          }
>
> this is the header
> # $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
> #
> # BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm
>
>
> this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8,  
> bioperl: tried with both 1.5.2_102 from cvs and checked out svn  
> version today)
>
> use Bio::SearchIO::Writer::GbrowseGFF;
> use Bio::SearchIO;
> if ($program eq "blastn"){
> #my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                      -file   => ">$infile.$query.blast.gff");
> #my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                         -file   => ">>$infile.blast.gff");
> $out_gff->write_result($result);
> $out_gff_whole->write_result($result);
> }
>
>
>
> Where $result is a blast result...
>
> The aim is to parse a multi-query blast report and split it into  
> different queries and make another file with all the queries. I'm  
> sure i'm forgetting something but I can't figure what...
>
> The GFF file is produced, but I do get the error above...
>
> 2) Finally, there is a small bug but I don't think it comes from  
> this module? The id attribute is printed out e.g iD=match_sequence31  
> with iD wrongly capitalised...
>
> many thanks for your time
> alexie
>
> -- 
> --
> Alexie Papanicolaou
> Entomology
> Max Planck Institute for Chemical Ecology
> Hans Knoell Str 8
> Jena 07745
> Germany
> Email apapanicolaou at ice.mpg.de
> Tel +493641571561
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 07:35:03 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 13:35:03 +0100
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
	alignment
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
Message-ID: <47CD41F7.2000401@medecine.unige.ch>

Hello,

Is it possible to remove only columns containing e.g. more than 75% gaps 
from an alignment? I was thinking at

$aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])

This would allow me to remove all gaps or gap-only columns but not using 
a threshold.

Greetings, Daniel

From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 08:46:33 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 14:46:33 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
Message-ID: <47CD52B9.5060906@medecine.unige.ch>

Hello,

I would like to use bioperl to calculate a branch length score for a 
given set of nodes and a tree. I know how to get the total branch length 
by using $tree->total_branch_length, but how could I get the length of 
the subtree spanning some given nodes which are dispersed over the whole 
tree (a subset of nodes from the tree which are not monophyletic)?

Greetings, Daniel

From bix at sendu.me.uk  Tue Mar  4 09:37:53 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:37:53 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD52B9.5060906@medecine.unige.ch>
References: <47CD52B9.5060906@medecine.unige.ch>
Message-ID: <47CD5EC1.2020103@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> I would like to use bioperl to calculate a branch length score for a 
> given set of nodes and a tree. I know how to get the total branch length 
> by using $tree->total_branch_length, but how could I get the length of 
> the subtree spanning some given nodes which are dispersed over the whole 
> tree (a subset of nodes from the tree which are not monophyletic)?

One 'cheat' way of doing it might be to use splice(-keep_ids => 
\@node_ids) or similar, then run total_branch_length() on that. No idea 
if it will actually give you the right answer though. Let us know! :)

From bix at sendu.me.uk  Tue Mar  4 09:26:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:26:10 +0000
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
 alignment
In-Reply-To: <47CD41F7.2000401@medecine.unige.ch>
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
	<47CD41F7.2000401@medecine.unige.ch>
Message-ID: <47CD5C02.8060306@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Is it possible to remove only columns containing e.g. more than 75% gaps 
> from an alignment? I was thinking at
> 
> $aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])
> 
> This would allow me to remove all gaps or gap-only columns but not using 
> a threshold.

Well, you can use gap_col_matrix() to decide which columns you don't 
want, and then use remove_columns().

From hlapp at gmx.net  Tue Mar  4 10:24:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 10:24:13 -0500
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <87808BE4-B6A3-4C7F-A6DC-42ED2686375B@gmx.net>


On Mar 4, 2008, at 7:03 AM, Alexie Papanicolaou wrote:

> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>            else {
>                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>            }


Note that this is a warning, not an error. However, if none of  
$prefix, $seqname, $qpmax, $qpmin can be undefined (or be equal to an  
empty string, which they will default to if undefined) at this  
position, then there is a problem (and it is before the above line).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Mar  4 11:02:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 11:02:02 -0500
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>


On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:

> Daniel Gerlach wrote:
>> Hello,
>> I would like to use bioperl to calculate a branch length score for  
>> a given set of nodes and a tree. I know how to get the total  
>> branch length by using $tree->total_branch_length, but how could I  
>> get the length of the subtree spanning some given nodes which are  
>> dispersed over the whole tree (a subset of nodes from the tree  
>> which are not monophyletic)?
>
> One 'cheat' way of doing it might be to use splice(-keep_ids =>  
> \@node_ids) or similar, then run total_branch_length() on that. No  
> idea if it will actually give you the right answer though. Let us  
> know! :)

Related to that, will contract_linear_paths() actually do the right  
thing and adjust branch lengths if it removes internal nodes with  
outdegree 1?

Rutger - does Bio::Phylo handle this correctly?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 11:12:53 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 17:12:53 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <47CD7505.5080105@medecine.unige.ch>

Hello,

Thanks for the quick answer. I tried:

use Bio::TreeIO;
my $treeio = Bio::TreeIO->new(-format => 'newick',
                   -fh => \*DATA);
my $tree = $treeio->next_tree;
print $tree->total_branch_length,"\n";
$tree->splice(-keep_id => [A,B,E]);
print $tree->total_branch_length,"\n";

__DATA__
(((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);

Which gives me the message "MSG: After splicing, the original root was 
removed but there are multiple candidates for the new root!" however the 
root E was not removed.

If I do it the complementary way by splicing out all unwanted nodes - 
splice(-remove_id => [C,D]) - I get what I want:

34
25

Greetings, Daniel

Sendu Bala wrote:
> Daniel Gerlach wrote:
>> Hello,
>>
>> I would like to use bioperl to calculate a branch length score for a 
>> given set of nodes and a tree. I know how to get the total branch 
>> length by using $tree->total_branch_length, but how could I get the 
>> length of the subtree spanning some given nodes which are dispersed 
>> over the whole tree (a subset of nodes from the tree which are not 
>> monophyletic)?
> 
> One 'cheat' way of doing it might be to use splice(-keep_ids => 
> \@node_ids) or similar, then run total_branch_length() on that. No idea 
> if it will actually give you the right answer though. Let us know! :)

From bix at sendu.me.uk  Tue Mar  4 11:37:47 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 16:37:47 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
	<BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
Message-ID: <47CD7ADB.6050808@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:
> 
>> Daniel Gerlach wrote:
>>> Hello,
>>> I would like to use bioperl to calculate a branch length score for a 
>>> given set of nodes and a tree. I know how to get the total branch 
>>> length by using $tree->total_branch_length, but how could I get the 
>>> length of the subtree spanning some given nodes which are dispersed 
>>> over the whole tree (a subset of nodes from the tree which are not 
>>> monophyletic)?
>>
>> One 'cheat' way of doing it might be to use splice(-keep_ids => 
>> \@node_ids) or similar, then run total_branch_length() on that. No 
>> idea if it will actually give you the right answer though. Let us 
>> know! :)
> 
> Related to that, will contract_linear_paths() actually do the right 
> thing and adjust branch lengths if it removes internal nodes with 
> outdegree 1?

I think ultimately it boils down to remove_Descendent() being called as 
appropriate which does the branch length alteration. From a glance I 
can't answer your question with certainly, but it 'should' do the right 
thing. It needs to be tested; when I implemented these things I was only 
concerned with tree topology, not branch lengths or anything else.


From David.Messina at sbc.su.se  Tue Mar  4 15:47:06 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 4 Mar 2008 21:47:06 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
Message-ID: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>

> where do i find the
> correct names of the keys to create the input hash?


I've never used this module, but from a quick look at the code it appears to
pass on any parameters to palindrome.

I'm guessing you've already done this, but have you tried using the
parameter names and values that palindrome itself asks for?


Dave

From cjfields at uiuc.edu  Tue Mar  4 16:34:21 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 15:34:21 -0600
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
Message-ID: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>

I don't know what the current status is for OBDA, but we have several  
bugs listed for Bio::DB::Flat which need someone versed in OBDA to  
look at them (they are all interrelated):

http://bugzilla.open-bio.org/show_bug.cgi?id=2336
http://bugzilla.open-bio.org/show_bug.cgi?id=2337
http://bugzilla.open-bio.org/show_bug.cgi?id=2338
http://bugzilla.open-bio.org/show_bug.cgi?id=2339

If anyone has any input I would greatly appreciate it.  I have been  
trying to stomp as many bugs as possible so we can work on a new  
release.

chris

From bosborne11 at verizon.net  Tue Mar  4 16:42:05 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 04 Mar 2008 16:42:05 -0500
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
In-Reply-To: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
References: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
Message-ID: <BAD3A0C8-69C4-45FF-A666-527289428A58@verizon.net>

Chris,

I'll take a look at them this weekend.

Brian O.


On Mar 4, 2008, at 4:34 PM, Chris Fields wrote:

> I don't know what the current status is for OBDA, but we have  
> several bugs listed for Bio::DB::Flat which need someone versed in  
> OBDA to look at them (they are all interrelated):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2336
> http://bugzilla.open-bio.org/show_bug.cgi?id=2337
> http://bugzilla.open-bio.org/show_bug.cgi?id=2338
> http://bugzilla.open-bio.org/show_bug.cgi?id=2339
>
> If anyone has any input I would greatly appreciate it.  I have been  
> trying to stomp as many bugs as possible so we can work on a new  
> release.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anjan.purkayastha at gmail.com  Tue Mar  4 18:52:09 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Tue, 4 Mar 2008 18:52:09 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
	<628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
Message-ID: <f03f40920803041552u49a68a64gf31b90cd0159ce19@mail.gmail.com>

guys,
thanks for all your inputs.
i went to the following site:
http://www.koders.com/perl/fid5F28A3DDD453F0DB4995B7DDF304B02DBBACE0A0.aspx?s=calculate

they have the key names for most of the emboss programs.

thanks,
anjan

On Tue, Mar 4, 2008 at 3:47 PM, Dave Messina <David.Messina at sbc.su.se>
wrote:

>
> where do i find the
> > correct names of the keys to create the input hash?
>
>
>
> I've never used this module, but from a quick look at the code it appears
> to pass on any parameters to palindrome.
>
> I'm guessing you've already done this, but have you tried using the
> parameter names and values that palindrome itself asks for?
>
>
> Dave
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================

From staffa at niehs.nih.gov  Wed Mar  5 18:43:30 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 18:43:30 -0500
Subject: [Bioperl-l] SeqIO
Message-ID: <C3F49A52.88AD%staffa@niehs.nih.gov>

So the Howto says that Bio::SeqIO will read almost any known format
including GCG. 
So I create a GCG file with Seqlab and try to printout its sequence as a
string. ( I did guess at the way to get the sequence string:

#!/usr/bin/perl -w
use strict;
$| = 1;
use Bio::SeqIO;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;
my $sequence = $seq_object->seq;
print "$sequence\n";
my $status = &windowscore($sequence);
}

But what it returned was the entire contents of the file with no format
decoding. Have I been deluded?

NewDNALength:810March5,200818:26Type:NCheck:3368..1TGTTCGAATTCCGTGCGGTCCACCT
CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From cjfields at uiuc.edu  Wed Mar  5 21:22:53 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 20:22:53 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>

I thought GCG format changed somewhere along the way but I maybe I'm  
wrong?  Regardless, you'll have to post this as a bug (along with an  
example file).

Also, kind of odd that the sequence data wasn't checked...

chris

On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Wed Mar  5 21:33:48 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Mar 2008 18:33:48 -0800
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>

probably you should try specifying the format explicitly first- as in  
(-format => 'gcg')

-j
On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:

> I thought GCG format changed somewhere along the way but I maybe  
> I'm wrong?  Regardless, you'll have to post this as a bug (along  
> with an example file).
>
> Also, kind of odd that the sequence data wasn't checked...
>
> chris
>
> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>
>> So the Howto says that Bio::SeqIO will read almost any known format
>> including GCG.
>> So I create a GCG file with Seqlab and try to printout its  
>> sequence as a
>> string. ( I did guess at the way to get the sequence string:
>>
>> #!/usr/bin/perl -w
>> use strict;
>> $| = 1;
>> use Bio::SeqIO;
>> my $number_of_files = @ARGV;
>> if(!$number_of_files){print "no files entered\n";exit:}
>> foreach my $file (@ARGV){
>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>> my $seq_object = $seqio_object->next_seq;
>> my $sequence = $seq_object->seq;
>> print "$sequence\n";
>> my $status = &windowscore($sequence);
>> }
>>
>> But what it returned was the entire contents of the file with no  
>> format
>> decoding. Have I been deluded?
>>
>> NewDNALength:810March5,200818:26Type:NCheck: 
>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG 
>> CGAAGGT
>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC 
>> GGCTGCT
>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT 
>> GCAGAGC
>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG 
>> GCCAGCG
>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG 
>> TCCCCTG
>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 
>> 51GGCAG
>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG 
>> AGACATC
>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG 
>> CCGCCC6
>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT 
>> TCATGCG
>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG 
>> CAGCCGC
>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA 
>> GGG
>>
>>
>>
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Wed Mar  5 21:01:07 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:01:07 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <19DC527F-3D34-4F3E-9B4C-D2C6011A2C8F@verizon.net>

Nick,

Take a look at the GCG files that are used in the SeqIO tests:

bioperl-live//t/data/test.gcg
bioperl-live//t/data/test_badlf.gcg

Does the file that you created have a format like the format in those  
files? I'm guessing you're going to say 'yes', from the looks of your  
output.

Brian O.


On Mar 5, 2008, at 6:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From staffa at niehs.nih.gov  Wed Mar  5 22:09:11 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 22:09:11 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>
Message-ID: <C3F4CA87.8074%staffa@niehs.nih.gov>

Verily,
One interpretation of the docs might be: will read any format if the format
is specified.
I was hoping that I could write a program that one needn't specify format.
It'd be more user-friendly and useful.


On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:

> probably you should try specifying the format explicitly first- as in
> (-format => 'gcg')
> 
> -j
> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> 
>> I thought GCG format changed somewhere along the way but I maybe
>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>> with an example file).
>> 
>> Also, kind of odd that the sequence data wasn't checked...
>> 
>> chris
>> 
>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>> 
>>> So the Howto says that Bio::SeqIO will read almost any known format
>>> including GCG.
>>> So I create a GCG file with Seqlab and try to printout its
>>> sequence as a
>>> string. ( I did guess at the way to get the sequence string:
>>> 
>>> #!/usr/bin/perl -w
>>> use strict;
>>> $| = 1;
>>> use Bio::SeqIO;
>>> my $number_of_files = @ARGV;
>>> if(!$number_of_files){print "no files entered\n";exit:}
>>> foreach my $file (@ARGV){
>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>> my $seq_object = $seqio_object->next_seq;
>>> my $sequence = $seq_object->seq;
>>> print "$sequence\n";
>>> my $status = &windowscore($sequence);
>>> }
>>> 
>>> But what it returned was the entire contents of the file with no
>>> format
>>> decoding. Have I been deluded?
>>> 
>>> NewDNALength:810March5,200818:26Type:NCheck:
>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>> CGAAGGT
>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>> GGCTGCT
>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>> GCAGAGC
>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>> GCCAGCG
>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>> TCCCCTG
>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>> 51GGCAG
>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>> AGACATC
>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>> CCGCCC6
>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>> TCATGCG
>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>> CAGCCGC
>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>> GGG
>>> 
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Mar  5 22:44:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:44:14 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
	<1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
Message-ID: <9146DF9D-C0D6-4F18-9B7E-7BB42FCE0737@uiuc.edu>

Heh, good one!  Though Jason may have worked out the issue (not  
indicating the format explicitly).  Would be worth looking at the  
tested files.

As for dinosaurs, well I can't talk ...

chris

On Mar 5, 2008, at 8:49 PM, Brian Osborne wrote:

> Chris,
>
> Many many years ago, when dinosaurs roamed the earth, only about  
> half of the formats had their own tests. A primitive being saw this  
> and created simple tests for all the 'missing' formats. His thought  
> probably was 'this is better than nothing'. In fact this being  
> assumed that GCG was an outdated and unused format, even as long ago  
> as that time was.
>
> The origins of so much of what we now know as 'Bioperl' are  
> frequently mysterious, or incomprehensible to modern day humans...
>
> Brian O.
>
> On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:
>
>> Also, kind of odd that the sequence data wasn't checked...


From bosborne11 at verizon.net  Wed Mar  5 21:49:26 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:49:26 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>

Chris,

Many many years ago, when dinosaurs roamed the earth, only about half  
of the formats had their own tests. A primitive being saw this and  
created simple tests for all the 'missing' formats. His thought  
probably was 'this is better than nothing'. In fact this being assumed  
that GCG was an outdated and unused format, even as long ago as that  
time was.

The origins of so much of what we now know as 'Bioperl' are frequently  
mysterious, or incomprehensible to modern day humans...

Brian O.

On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:

> Also, kind of odd that the sequence data wasn't checked...


From cjfields at uiuc.edu  Wed Mar  5 22:54:15 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:54:15 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <67C6AE9D-3934-4717-A97A-4C31DB4F7E33@uiuc.edu>

You can leave off the format, but you must append the correct file  
extension for the parser to determine the correct format ('.gcg' for  
GCG, for example).  There is also Bio::Tools::GuessSeqFormat though it  
doesn't cover all formats.

chris

On Mar 5, 2008, at 9:09 PM, Staffa, Nick (NIH/NIEHS) wrote:

> Verily,
> One interpretation of the docs might be: will read any format if the  
> format
> is specified.
> I was hoping that I could write a program that one needn't specify  
> format.
> It'd be more user-friendly and useful.
>
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>
>> probably you should try specifying the format explicitly first- as in
>> (-format => 'gcg')
>>
>> -j
>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>
>>> I thought GCG format changed somewhere along the way but I maybe
>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>> with an example file).
>>>
>>> Also, kind of odd that the sequence data wasn't checked...
>>>
>>> chris
>>>
>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>
>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>> including GCG.
>>>> So I create a GCG file with Seqlab and try to printout its
>>>> sequence as a
>>>> string. ( I did guess at the way to get the sequence string:
>>>>
>>>> #!/usr/bin/perl -w
>>>> use strict;
>>>> $| = 1;
>>>> use Bio::SeqIO;
>>>> my $number_of_files = @ARGV;
>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>> foreach my $file (@ARGV){
>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>> my $seq_object = $seqio_object->next_seq;
>>>> my $sequence = $seq_object->seq;
>>>> print "$sequence\n";
>>>> my $status = &windowscore($sequence);
>>>> }
>>>>
>>>> But what it returned was the entire contents of the file with no
>>>> format
>>>> decoding. Have I been deluded?
>>>>
>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>> CGAAGGT
>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>> GGCTGCT
>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>> GCAGAGC
>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>> GCCAGCG
>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>> TCCCCTG
>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>> 51GGCAG
>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>> AGACATC
>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>> CCGCCC6
>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>> TCATGCG
>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>> CAGCCGC
>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>> GGG
>>>>
>>>>
>>>>
>>>> Nick Staffa
>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>> Scientific Computing Support Group
>>>> NIEHS Information Technology Support Services Contract
>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>> National Institute of Environmental Health Sciences
>>>> National Institutes of Health
>>>> Research Triangle Park, North Carolina
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at gmail.com  Thu Mar  6 03:16:25 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 6 Mar 2008 16:16:25 +0800
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
Message-ID: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>

Dear experts,

Is there any? The TRANSFAC text file which contain entry like this.
Especially we wich to capture the PWM for each of the Transcription
factor.

Regards,
Edward

__BEGIN__
VV  TRANSFAC MATRIX TABLE, Release 11.1 - licensed - 2007-03-31, (C)
Biobase GmbH
XX
//
AC  M00001
XX
ID  V$MYOD_01
XX
DT  19.10.1992 (created); ewi.
DT  22.10.1997 (updated); dbo.
CO  Copyright (C), Biobase GmbH.
XX
NA  MyoD
XX
DE  myoblast determination gene product
XX
BF  T00526; MyoD; Species: mouse, Mus musculus.
BF  T09177; MyoD; Species: mouse, Mus musculus.
XX
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G

....etc....

From watashi at post.com  Thu Mar  6 07:06:42 2008
From: watashi at post.com (Masa Masa)
Date: Thu, 6 Mar 2008 07:06:42 -0500
Subject: [Bioperl-l] failure of add_seqfeature
Message-ID: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>

Dear experts,

Would anybody know why the following codes generate an error of:


------------- EXCEPTION  -------------
MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained within parent feature, and expansion is not valid
STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
STACK toplevel test.pl:118

--------------------------------------
15616 15693
79568 83016

=================


use Bio::Graphics;                                                                    
use Bio::SeqFeature::Generic;                                                         
use Bio::SeqIO;                         


my $bsg = 'Bio::SeqFeature::Generic';                                                 
                                                                                      
my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], -display_name=>'U');      
                                                                                      
for (my $i=0; $i < @from; $i++) {                                                     
print "$from[$i] $to[$i]\n";                                                            
    $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to[$i]));            
    if ($i > 10) {                                                                    
       exit;                                                                                 
    }
}                

-- 
Want an e-mail address like mine?
Get a free e-mail account today at www.mail.com!


From heikki at sanbi.ac.za  Thu Mar  6 07:20:03 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 6 Mar 2008 14:20:03 +0200
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <200803061420.04123.heikki@sanbi.ac.za>


Nick,

This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:

/Length: .*Type: .*Check: .*\.\.$/

It is the second  line in GCG file. If first line matches to some other format 
regex, this will not not be evaluated.

Let us know,

	-Heikki

On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> Verily,
> One interpretation of the docs might be: will read any format if the format
> is specified.
> I was hoping that I could write a program that one needn't specify format.
> It'd be more user-friendly and useful.
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> > probably you should try specifying the format explicitly first- as in
> > (-format => 'gcg')
> >
> > -j
> >
> > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >> I thought GCG format changed somewhere along the way but I maybe
> >> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >> with an example file).
> >>
> >> Also, kind of odd that the sequence data wasn't checked...
> >>
> >> chris
> >>
> >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>> So the Howto says that Bio::SeqIO will read almost any known format
> >>> including GCG.
> >>> So I create a GCG file with Seqlab and try to printout its
> >>> sequence as a
> >>> string. ( I did guess at the way to get the sequence string:
> >>>
> >>> #!/usr/bin/perl -w
> >>> use strict;
> >>> $| = 1;
> >>> use Bio::SeqIO;
> >>> my $number_of_files = @ARGV;
> >>> if(!$number_of_files){print "no files entered\n";exit:}
> >>> foreach my $file (@ARGV){
> >>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>> my $seq_object = $seqio_object->next_seq;
> >>> my $sequence = $seq_object->seq;
> >>> print "$sequence\n";
> >>> my $status = &windowscore($sequence);
> >>> }
> >>>
> >>> But what it returned was the entire contents of the file with no
> >>> format
> >>> decoding. Have I been deluded?
> >>>
> >>> NewDNALength:810March5,200818:26Type:NCheck:
> >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>> CGAAGGT
> >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>> GGCTGCT
> >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>> GCAGAGC
> >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>> GCCAGCG
> >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>> TCCCCTG
> >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>> 51GGCAG
> >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>> AGACATC
> >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>> CCGCCC6
> >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>> TCATGCG
> >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>> CAGCCGC
> >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>> GGG
> >>>
> >>>
> >>>
> >>> Nick Staffa
> >>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>> Scientific Computing Support Group
> >>> NIEHS Information Technology Support Services Contract
> >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>> National Institute of Environmental Health Sciences
> >>> National Institutes of Health
> >>> Research Triangle Park, North Carolina
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From bix at sendu.me.uk  Thu Mar  6 08:07:21 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 13:07:21 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
Message-ID: <47CFEC89.1000705@sendu.me.uk>

Edward Wijaya wrote:
> Dear experts,
> 
> Is there any? The TRANSFAC text file which contain entry like this.
> Especially we wich to capture the PWM for each of the Transcription
> factor.

Yes; I've written a module to do this, I just haven't committed it yet 
because certain things aren't quite right in terms of the API. But to 
just grab the PWM it should work fine. If you want I can email you the 
modules.

From sdavis2 at mail.nih.gov  Thu Mar  6 08:40:25 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 6 Mar 2008 08:40:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47CFEC89.1000705@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
Message-ID: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>

On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> Edward Wijaya wrote:
>  > Dear experts,
>  >
>  > Is there any? The TRANSFAC text file which contain entry like this.
>  > Especially we wich to capture the PWM for each of the Transcription
>  > factor.
>
>  Yes; I've written a module to do this, I just haven't committed it yet
>  because certain things aren't quite right in terms of the API. But to
>  just grab the PWM it should work fine. If you want I can email you the
>  modules.

I believe there are a set of non-bioperl modules called TFBS.  See
here (although I'm not sure this is the most up-to-date site):

http://tfbs.genereg.net/

Sean

From David.Messina at sbc.su.se  Thu Mar  6 09:55:24 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 6 Mar 2008 15:55:24 +0100
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <628aabb70803060655k5245296etf5ee2f31755230d3@mail.gmail.com>

Hi Masa,

Could you give us a little more information?

A complete test case (the code you included doesn't run because for example
the @from array doesn't exist) and input file would be helpful, as well as
the version of BioPerl you are using.

Dave

From staffa at niehs.nih.gov  Thu Mar  6 10:23:34 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 10:23:34 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <200803061420.04123.heikki@sanbi.ac.za>
Message-ID: <C3F576A6.88C9%staffa@niehs.nih.gov>

Here's the scoop:
When I use Jason's suggestion, (-format => 'gcg'),
My program works without complaint on the original file that looks like:
!!NA_SEQUENCE 1.0
   NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..

       1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
et c.

BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
(which should be retro-gcg format (before version 11?)),
my program runs, but there IS a complaint:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line 1.
BUT 
If I remove (-format => 'gcg'),  I get no complaint, but the sequence
returned still has its numbers imbedded. This effects my calculations.

Thanks, at least i know what my options are.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:

> 
> Nick,
> 
> This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:
> 
> /Length: .*Type: .*Check: .*\.\.$/
> 
> It is the second  line in GCG file. If first line matches to some other format
> regex, this will not not be evaluated.
> 
> Let us know,
> 
> -Heikki
> 
> On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
>> Verily,
>> One interpretation of the docs might be: will read any format if the format
>> is specified.
>> I was hoping that I could write a program that one needn't specify format.
>> It'd be more user-friendly and useful.
>> 
>> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>>> probably you should try specifying the format explicitly first- as in
>>> (-format => 'gcg')
>>> 
>>> -j
>>> 
>>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>>> I thought GCG format changed somewhere along the way but I maybe
>>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>>> with an example file).
>>>> 
>>>> Also, kind of odd that the sequence data wasn't checked...
>>>> 
>>>> chris
>>>> 
>>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>>> including GCG.
>>>>> So I create a GCG file with Seqlab and try to printout its
>>>>> sequence as a
>>>>> string. ( I did guess at the way to get the sequence string:
>>>>> 
>>>>> #!/usr/bin/perl -w
>>>>> use strict;
>>>>> $| = 1;
>>>>> use Bio::SeqIO;
>>>>> my $number_of_files = @ARGV;
>>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>>> foreach my $file (@ARGV){
>>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>>> my $seq_object = $seqio_object->next_seq;
>>>>> my $sequence = $seq_object->seq;
>>>>> print "$sequence\n";
>>>>> my $status = &windowscore($sequence);
>>>>> }
>>>>> 
>>>>> But what it returned was the entire contents of the file with no
>>>>> format
>>>>> decoding. Have I been deluded?
>>>>> 
>>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>>> CGAAGGT
>>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>>> GGCTGCT
>>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>>> GCAGAGC
>>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>>> GCCAGCG
>>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>>> TCCCCTG
>>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>>> 51GGCAG
>>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>>> AGACATC
>>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>>> CCGCCC6
>>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>>> TCATGCG
>>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>>> CAGCCGC
>>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>>> GGG
>>>>> 
>>>>> 
>>>>> 
>>>>> Nick Staffa
>>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>>> Scientific Computing Support Group
>>>>> NIEHS Information Technology Support Services Contract
>>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>>> National Institute of Environmental Health Sciences
>>>>> National Institutes of Health
>>>>> Research Triangle Park, North Carolina
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From hlapp at gmx.net  Thu Mar  6 10:26:52 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 10:26:52 -0500
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <6BD917FC-803E-471B-A0C4-219286E53C47@gmx.net>

It seems you are adding subfeatures with a location that is not  
within their parent feature location. If that's indeed what you want  
to do, add the 'EXPAND' argument.

Excerpted from the POD of Bio::SeqFeature::Generic:

         Usage   : $feat->add_SeqFeature($subfeat);
                   $feat->add_SeqFeature($subfeat,'EXPAND')
         Function: adds a SeqFeature into the subSeqFeature array.
                   with no 'EXPAND' qualifer, subfeat will be tested
                   as to whether it lies inside the parent, and throw
                   an exception if not.

                   If EXPAND is used, the parent's start/end/strand will
                   be adjusted so that it grows to accommodate the new
                   subFeature


On Mar 6, 2008, at 7:06 AM, Masa Masa wrote:

> Dear experts,
>
> Would anybody know why the following codes generate an error of:
>
>
> ------------- EXCEPTION  -------------
> MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained  
> within parent feature, and expansion is not valid
> STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/ 
> site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
> STACK toplevel test.pl:118
>
> --------------------------------------
> 15616 15693
> 79568 83016
>
> =================
>
>
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use Bio::SeqIO;
>
>
> my $bsg = 'Bio::SeqFeature::Generic';
>
> my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], - 
> display_name=>'U');
>
> for (my $i=0; $i < @from; $i++) {
> print "$from[$i] $to[$i]\n";
>     $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to 
> [$i]));
>     if ($i > 10) {
>        exit;
>     }
> }
>
> -- 
> Want an e-mail address like mine?
> Get a free e-mail account today at www.mail.com!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Mar  6 10:41:49 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 15:41:49 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>	
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
Message-ID: <47D010BD.4000801@sendu.me.uk>

Sean Davis wrote:
> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>> Edward Wijaya wrote:
>>  > Dear experts,
>>  >
>>  > Is there any? The TRANSFAC text file which contain entry like this.
>>  > Especially we wich to capture the PWM for each of the Transcription
>>  > factor.
>>
>>  Yes; I've written a module to do this, I just haven't committed it yet
>>  because certain things aren't quite right in terms of the API. But to
>>  just grab the PWM it should work fine. If you want I can email you the
>>  modules.
> 
> I believe there are a set of non-bioperl modules called TFBS.  See
> here (although I'm not sure this is the most up-to-date site):
> 
> http://tfbs.genereg.net/

I believe it's out of date enough to not work on the latest Transfac 
data, though I haven't used tried to confirm.

At any rate, the Transfac (Pro) database is pretty strange and 
complicated, and the TFBS modules certainly don't let you access 
everything in the way you might want or expect.

From cain.cshl at gmail.com  Thu Mar  6 11:43:35 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:43:35 -0500
Subject: [Bioperl-l] anonymous cvs?
Message-ID: <1204821815.6689.7.camel@frissell>

Hi All,

So now that the transition to svn is complete (and I like it), should
anonymous cvs still be working?  I believe there was discussion about
keeping it going via mirroring, and I hope that is the case.  It will
make life a little easier for people who want to do automated installs
of GBrowse and would like to use the installer script to get bioperl via
anon cvs.  If anon cvs is no longer available, does anyone have
suggestions for the best route to take for getting command line svn on
Windows?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 11:48:08 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:48:08 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <1204822088.6689.8.camel@frissell>

I should have mentioned that I tried it and it is not currently working:

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl checkout bioperl-live 
can't create temporary directory /tmp/cvs-serv32067
No space left on device


On Thu, 2008-03-06 at 11:43 -0500, Scott Cain wrote:
> Hi All,
> 
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
> 
> Thanks,
> Scott
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From Marc.Logghe at ablynx.com  Thu Mar  6 11:22:10 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 6 Mar 2008 17:22:10 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F576A6.88C9%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>

Hi Nick,
I don't think you should leave out the -format option. You have to leave
it in but the format should be provided by the B::T::GuessSeqFormat
object.
Something like:

#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::Tools::GuessSeqFormat;

$| = 1;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
  my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
  my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
$guesser->guess);
  my $seq_object = $seqio_object->next_seq;
  my $sequence = $seq_object->seq;
  print "$sequence\n";
}

HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> Sent: donderdag 6 maart 2008 16:24
> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Here's the scoop:
> When I use Jason's suggestion, (-format => 'gcg'),
> My program works without complaint on the original file that looks
like:
> !!NA_SEQUENCE 1.0
>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
> 
>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> et c.
> 
> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> (which should be retro-gcg format (before version 11?)),
> my program runs, but there IS a complaint:
> Use of uninitialized value in scalar chomp at
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
1.
> BUT
> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
> returned still has its numbers imbedded. This effects my calculations.
> 
> Thanks, at least i know what my options are.
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:
> 
> >
> > Nick,
> >
> > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a
gcg
> file:
> >
> > /Length: .*Type: .*Check: .*\.\.$/
> >
> > It is the second  line in GCG file. If first line matches to some
other
> format
> > regex, this will not not be evaluated.
> >
> > Let us know,
> >
> > -Heikki
> >
> > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> >> Verily,
> >> One interpretation of the docs might be: will read any format if
the
> format
> >> is specified.
> >> I was hoping that I could write a program that one needn't specify
> format.
> >> It'd be more user-friendly and useful.
> >>
> >> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> >>> probably you should try specifying the format explicitly first- as
in
> >>> (-format => 'gcg')
> >>>
> >>> -j
> >>>
> >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >>>> I thought GCG format changed somewhere along the way but I maybe
> >>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >>>> with an example file).
> >>>>
> >>>> Also, kind of odd that the sequence data wasn't checked...
> >>>>
> >>>> chris
> >>>>
> >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>>>> So the Howto says that Bio::SeqIO will read almost any known
format
> >>>>> including GCG.
> >>>>> So I create a GCG file with Seqlab and try to printout its
> >>>>> sequence as a
> >>>>> string. ( I did guess at the way to get the sequence string:
> >>>>>
> >>>>> #!/usr/bin/perl -w
> >>>>> use strict;
> >>>>> $| = 1;
> >>>>> use Bio::SeqIO;
> >>>>> my $number_of_files = @ARGV;
> >>>>> if(!$number_of_files){print "no files entered\n";exit:}
> >>>>> foreach my $file (@ARGV){
> >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>>>> my $seq_object = $seqio_object->next_seq;
> >>>>> my $sequence = $seq_object->seq;
> >>>>> print "$sequence\n";
> >>>>> my $status = &windowscore($sequence);
> >>>>> }
> >>>>>
> >>>>> But what it returned was the entire contents of the file with no
> >>>>> format
> >>>>> decoding. Have I been deluded?
> >>>>>
> >>>>> NewDNALength:810March5,200818:26Type:NCheck:
> >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>>>>
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>>>> CGAAGGT
> >>>>>
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>>>> GGCTGCT
> >>>>>
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>>>> GCAGAGC
> >>>>>
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>>>> GCCAGCG
> >>>>>
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>>>> TCCCCTG
> >>>>>
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>>>> 51GGCAG
> >>>>>
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>>>> AGACATC
> >>>>>
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>>>> CCGCCC6
> >>>>>
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>>>> TCATGCG
> >>>>>
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>>>> CAGCCGC
> >>>>>
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>>>> GGG
> >>>>>
> >>>>>
> >>>>>
> >>>>> Nick Staffa
> >>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>>>> Scientific Computing Support Group
> >>>>> NIEHS Information Technology Support Services Contract
> >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>>>> National Institute of Environmental Health Sciences
> >>>>> National Institutes of Health
> >>>>> Research Triangle Park, North Carolina
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher
> >>>> Lab of Dr. Robert Switzer
> >>>> Dept of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stefan.kirov at bms.com  Thu Mar  6 10:51:25 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 06 Mar 2008 10:51:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <47D012FD.7090600@bms.com>

Sendu Bala wrote:
> Sean Davis wrote:
>> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>>> Edward Wijaya wrote:
>>> > Dear experts,
>>> >
>>> > Is there any? The TRANSFAC text file which contain entry like this.
>>> > Especially we wich to capture the PWM for each of the Transcription
>>> > factor.
>>>
>>> Yes; I've written a module to do this, I just haven't committed it yet
>>> because certain things aren't quite right in terms of the API. But to
>>> just grab the PWM it should work fine. If you want I can email you the
>>> modules.
>>
>> I believe there are a set of non-bioperl modules called TFBS. See
>> here (although I'm not sure this is the most up-to-date site):
>>
>> http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest Transfac
> data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange and
> complicated, and the TFBS modules certainly don't let you access
> everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
Also be careful: there is a difference between PFM and PWM. Getting PWM
through most programs I have encountered will assume random distribution
(0.25 per each position in the background), unless you specify your own.
This could be something you may be comfortable with, but you definitely
should be aware of.

From jay at jays.net  Thu Mar  6 12:03:51 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 06 Mar 2008 11:03:51 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <47D023F7.4000803@jays.net>

Scott Cain wrote:
> It will make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.

Those installer scripts can't use anon SVN instead?

> If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>   

At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
there isn't an SVN equivalent?

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From whs at ebi.ac.uk  Thu Mar  6 12:08:51 2008
From: whs at ebi.ac.uk (William Spooner)
Date: Thu, 6 Mar 2008 17:08:51 +0000
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <07E3119E-0354-4E93-9980-3CB2B26DF2BE@ebi.ac.uk>

This will be important for Ensembl as well. As far as I know all of  
their install docs refer to BioPerl's anonymous CVS.

On 6 Mar 2008, at 16:43, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
William Spooner
Visiting Scientist
whs at ebi.ac.uk


From MEC at stowers-institute.org  Thu Mar  6 11:58:57 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 6 Mar 2008 10:58:57 -0600
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <BD62CBAC4395B94096109020651BE2EC11CE393C@exchmb-02.stowers-institute.org>

we use TFBS all the time against data coming from a recent local install of TRANSFAC(r) Professional 11.1 (2007-03-31)

the most recent is 11.4 (2007-12-14)

TFBS::* has the nice advantage that you can interoperate Transfac pwms with other (say, Jaspar) matrices and/or simple consesus sequence patterns; and it COULD be fairly easily extended to allow interoperation with other sources, say cisRED.  "One interface to rule them all" - bwa ha ha.

However, if you DO have locally installed Transfac (Pro) ($$), and want to use just it, then you should know that you can also call their `match` routines from the unix command line (though this is not documented to my knowledge).  I can supply my cheat sheet or otherwise advise if desired.  Also, if you go this way, I've written the requisite TFMatchOut2GFF to convert TRANSFAC match's output to GFF, if it suits your purpose, which I could release if asked.

If you want to use TFBS::**, I have written a command-line wrapper for the TFBS perl modules that might give you a leg up if you decide to use TFBS::**.  I could release them too, if useful.

But I agree, if I recall, TFBS::* were dropped from ongoing active development due to issues with data access policys.  And, I think that they no longer with with remotely hosted Transfac.  They did a few years ago. I think I tested a while ago and found that they do not.

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, March 06, 2008 9:42 AM
> To: Sean Davis
> Cc: bioperl-l at lists.open-bio.org; Edward Wijaya
> Subject: Re: [Bioperl-l] BioPerl Module to Parse Transfac
> Flat File Database
>
> Sean Davis wrote:
> > On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> >> Edward Wijaya wrote:
> >>  > Dear experts,
> >>  >
> >>  > Is there any? The TRANSFAC text file which contain
> entry like this.
> >>  > Especially we wich to capture the PWM for each of the
> >> Transcription  > factor.
> >>
> >>  Yes; I've written a module to do this, I just haven't
> committed it
> >> yet  because certain things aren't quite right in terms of
> the API.
> >> But to  just grab the PWM it should work fine. If you want I can
> >> email you the  modules.
> >
> > I believe there are a set of non-bioperl modules called TFBS.  See
> > here (although I'm not sure this is the most up-to-date site):
> >
> > http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest
> Transfac data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange
> and complicated, and the TFBS modules certainly don't let you
> access everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Mar  6 12:10:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:10:35 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>

BioPerl CVS is no longer being updated; you have to use Subversion to  
grab the latest (we have anon. svn set up for this).  We discussed  
syncing svn commits over to cvs but found it way too problematic and  
decided to make a clean break.

The best option I can think of as a replacement (so everyone isn't  
dependent on installing svn to get Gbrowse and bioperl-live) is to get  
a cron job set up which drops a bioperl-live archive into bioperl.org/ 
DIST or bioperl.org/SRC.  We have already talked about doing this for  
nightly builds from svn main trunk; we can probably set that up on our  
end.  Would that be feasible as a fallback in case svn isn't present?

The subversion project page has information on Windows versions:

http://subversion.tigris.org/project_packages.html

chris

On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cain.cshl at gmail.com  Thu Mar  6 12:22:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:22:29 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
Message-ID: <1204824149.6689.14.camel@frissell>

Hi Chris,

I think a nightly generated tarball would be sufficient for my use.  We
used anon cvs to get the lastest bioperl and then threw it away once it
was installed, so a tarball is just as good,if not better, since users
wouldn't need to install svn.  Not needing to install svn is good thing
for all my users, since I think many distributions do not supply it by
default.

Thanks,
Scott


On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
> BioPerl CVS is no longer being updated; you have to use Subversion to  
> grab the latest (we have anon. svn set up for this).  We discussed  
> syncing svn commits over to cvs but found it way too problematic and  
> decided to make a clean break.
> 
> The best option I can think of as a replacement (so everyone isn't  
> dependent on installing svn to get Gbrowse and bioperl-live) is to get  
> a cron job set up which drops a bioperl-live archive into bioperl.org/ 
> DIST or bioperl.org/SRC.  We have already talked about doing this for  
> nightly builds from svn main trunk; we can probably set that up on our  
> end.  Would that be feasible as a fallback in case svn isn't present?
> 
> The subversion project page has information on Windows versions:
> 
> http://subversion.tigris.org/project_packages.html
> 
> chris
> 
> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
> 
> > Hi All,
> >
> > So now that the transition to svn is complete (and I like it), should
> > anonymous cvs still be working?  I believe there was discussion about
> > keeping it going via mirroring, and I hope that is the case.  It will
> > make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl  
> > via
> > anon cvs.  If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >
> > Thanks,
> > Scott
> >
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 12:28:13 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:28:13 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <47D023F7.4000803@jays.net>
References: <1204821815.6689.7.camel@frissell>  <47D023F7.4000803@jays.net>
Message-ID: <1204824493.6689.19.camel@frissell>

Hi Jay,

It could use anon svn, though svn is considerably less ubiquitous, so it
effectively adds another prerequisite.

For cvs, the GUI WinCVS provides command line cvs as well.  I was
wondering if there was an easy to install equivalent for svn, though it
may be moot for me if the powers that be will provide a nightly
tarball :-)

Scott

On Thu, 2008-03-06 at 11:03 -0600, Jay Hannah wrote:
> Scott Cain wrote:
> > It will make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl via
> > anon cvs.
> 
> Those installer scripts can't use anon SVN instead?
> 
> > If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >   
> 
> At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
> and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
> there isn't an SVN equivalent?
> 
> j
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Thu Mar  6 12:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:28:36 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204824149.6689.14.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
Message-ID: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>

I'm working on the nightly build script now and will post back when  
everything is set up.

chris

On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:

> Hi Chris,
>
> I think a nightly generated tarball would be sufficient for my use.   
> We
> used anon cvs to get the lastest bioperl and then threw it away once  
> it
> was installed, so a tarball is just as good,if not better, since users
> wouldn't need to install svn.  Not needing to install svn is good  
> thing
> for all my users, since I think many distributions do not supply it by
> default.
>
> Thanks,
> Scott
>
>
>
> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>> BioPerl CVS is no longer being updated; you have to use Subversion to
>> grab the latest (we have anon. svn set up for this).  We discussed
>> syncing svn commits over to cvs but found it way too problematic and
>> decided to make a clean break.
>>
>> The best option I can think of as a replacement (so everyone isn't
>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>> get
>> a cron job set up which drops a bioperl-live archive into  
>> bioperl.org/
>> DIST or bioperl.org/SRC.  We have already talked about doing this for
>> nightly builds from svn main trunk; we can probably set that up on  
>> our
>> end.  Would that be feasible as a fallback in case svn isn't present?
>>
>> The subversion project page has information on Windows versions:
>>
>> http://subversion.tigris.org/project_packages.html
>>
>> chris
>>
>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>
>>> Hi All,
>>>
>>> So now that the transition to svn is complete (and I like it),  
>>> should
>>> anonymous cvs still be working?  I believe there was discussion  
>>> about
>>> keeping it going via mirroring, and I hope that is the case.  It  
>>> will
>>> make life a little easier for people who want to do automated  
>>> installs
>>> of GBrowse and would like to use the installer script to get bioperl
>>> via
>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>> suggestions for the best route to take for getting command line  
>>> svn on
>>> Windows?
>>>
>>> Thanks,
>>> Scott
>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>> GMOD Coordinator (http://www.gmod.org/)
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 15:38:22 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 14:38:22 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
	<ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
Message-ID: <2F746C5B-902C-4510-AEA3-2C46D4F51E7A@uiuc.edu>

Okay, I have set up nightly builds for bioperl-live, db, network, and  
run here:

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

At the moment this is running via a crontab off a script in my portal  
account, retrieving everything via anon. svn and bundling it up into  
zip and tarball archives.  I would like to set it up to grab  
everything off dev but I don't want to mess with my ssh setup, so if  
anyone has ideas there...

The script also adds a CHANGELOG file (last 10 commits) and removes  
the .svn directories prior to bundling.  The archive name has the  
subversion revision number and date included; md5 checksums are in the  
SIGNATURES file.  I'll check on it again tomorrow to make sure cron  
ran it.

We can probably set up automated PPM builds as well; might be worth  
testing down the road (we need a way to set defaults for Build args  
prior to getting that running).

chris

On Mar 6, 2008, at 11:28 AM, Chris Fields wrote:

> I'm working on the nightly build script now and will post back when  
> everything is set up.
>
> chris
>
> On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:
>
>> Hi Chris,
>>
>> I think a nightly generated tarball would be sufficient for my  
>> use.  We
>> used anon cvs to get the lastest bioperl and then threw it away  
>> once it
>> was installed, so a tarball is just as good,if not better, since  
>> users
>> wouldn't need to install svn.  Not needing to install svn is good  
>> thing
>> for all my users, since I think many distributions do not supply it  
>> by
>> default.
>>
>> Thanks,
>> Scott
>>
>>
>>
>> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>>> BioPerl CVS is no longer being updated; you have to use Subversion  
>>> to
>>> grab the latest (we have anon. svn set up for this).  We discussed
>>> syncing svn commits over to cvs but found it way too problematic and
>>> decided to make a clean break.
>>>
>>> The best option I can think of as a replacement (so everyone isn't
>>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>>> get
>>> a cron job set up which drops a bioperl-live archive into  
>>> bioperl.org/
>>> DIST or bioperl.org/SRC.  We have already talked about doing this  
>>> for
>>> nightly builds from svn main trunk; we can probably set that up on  
>>> our
>>> end.  Would that be feasible as a fallback in case svn isn't  
>>> present?
>>>
>>> The subversion project page has information on Windows versions:
>>>
>>> http://subversion.tigris.org/project_packages.html
>>>
>>> chris
>>>
>>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>>
>>>> Hi All,
>>>>
>>>> So now that the transition to svn is complete (and I like it),  
>>>> should
>>>> anonymous cvs still be working?  I believe there was discussion  
>>>> about
>>>> keeping it going via mirroring, and I hope that is the case.  It  
>>>> will
>>>> make life a little easier for people who want to do automated  
>>>> installs
>>>> of GBrowse and would like to use the installer script to get  
>>>> bioperl
>>>> via
>>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>>> suggestions for the best route to take for getting command line  
>>>> svn on
>>>> Windows?
>>>>
>>>> Thanks,
>>>> Scott
>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 16:48:37 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 15:48:37 -0600
Subject: [Bioperl-l] Nightly build archives now available
Message-ID: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>

We now have nightly bundled archives for bioperl-live, bioperl-db,  
bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
am every night.

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

The archives are date-stamped and also have the Subversion revision,  
just in case one wanted to ensure they get the correct version for the  
bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
(if there are any).  These are currently derived off the anon. svn  
repository.

chris

From David.Messina at sbc.su.se  Thu Mar  6 18:50:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 00:50:04 +0100
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>

Very slick and well-thought-out, Chris -- nice job!


Dave

From hlapp at gmx.net  Thu Mar  6 19:06:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 19:06:41 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <E39ADEB2-1FE7-4024-8188-EFF98F17BF52@gmx.net>

Awesome - thanks for doing this, Chris! -hilmar

On Mar 6, 2008, at 4:48 PM, Chris Fields wrote:

> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~  
> 1:00 am every night.
>
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>
> The archives are date-stamped and also have the Subversion  
> revision, just in case one wanted to ensure they get the correct  
> version for the bug fix.  They also contain a CHANGELOG file for  
> the last 10 revisions (if there are any).  These are currently  
> derived off the anon. svn repository.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From staffa at niehs.nih.gov  Thu Mar  6 18:27:31 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 18:27:31 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>
Message-ID: <C3F5E813.8903%staffa@niehs.nih.gov>

Thanks 
I really appreciate all the interest given and help generated.
that sure sounds like a great idea, but i think
Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
Is there a substitute?
It works great with
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.

as seen in:
gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
guesser guesses gcg
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAGGGCA
GAGCCTCGGGA  et c.
(yes, I added
my $file_type = $guesser->guess;
print "guesser guesses $file_type\n";
)

BUT
when applied to a genbank sequence passed thru the Seqlab editor and turned
into GCG, to wit:
!!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993
DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate kinase
            (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3).
ACCESSION   X05246 Y00261
...
...
BASE COUNT      583 a    367 c    442 g    519 t
ORIGIN

 HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156  ..

       1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
et c.

It thinks it is a flawed PIR:

gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
guesser guesses pir

------------- EXCEPTION  -------------
MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993


Must look at why guesser is thinking PIR.


On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:

> Hi Nick,
> I don't think you should leave out the -format option. You have to leave
> it in but the format should be provided by the B::T::GuessSeqFormat
> object.
> Something like:
> 
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Tools::GuessSeqFormat;
> 
> $| = 1;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
>   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
>   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
> $guesser->guess);
>   my $seq_object = $seqio_object->next_seq;
>   my $sequence = $seq_object->seq;
>   print "$sequence\n";
> }
> 
> HTH,
> Marc
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
>> Sent: donderdag 6 maart 2008 16:24
>> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
>> Cc: Chris Fields
>> Subject: Re: [Bioperl-l] SeqIO
>> 
>> Here's the scoop:
>> When I use Jason's suggestion, (-format => 'gcg'),
>> My program works without complaint on the original file that looks
> like:
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.
>> 
>> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
>> (which should be retro-gcg format (before version 11?)),
>> my program runs, but there IS a complaint:
>> Use of uninitialized value in scalar chomp at
>> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
> 1.
>> BUT
>> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
>> returned still has its numbers imbedded. This effects my calculations.
>> 
>> Thanks, at least i know what my options are.
>> 
>> 
>> 
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>


From cjfields at uiuc.edu  Thu Mar  6 23:32:39 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 22:32:39 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
Message-ID: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>

I would like to get automated PPM builds set up as well but I think we  
have to rework some Build.PL stuff to get that going.

The next thing is to set up a regular script to check test/POD coverage.

chris

On Mar 6, 2008, at 5:50 PM, Dave Messina wrote:

> Very slick and well-thought-out, Chris -- nice job!
>
>
> Dave

From Marc.Logghe at ablynx.com  Fri Mar  7 04:04:35 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Mar 2008 10:04:35 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F5E813.8903%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A938@alpaca.lan.ablynx.com>

Ahh, my reply did not make much sense when I took a new look. I was the
one who learnt something here :-)
Did not know that Bio::SeqIO was already using B::T::GuessSeqFormat
under the hood. Learnt as well that you have to be careful with the
filename extension because this seems to have precedence.
Regards,
Marc


> -----Original Message-----
> From: Staffa, Nick (NIH/NIEHS) [mailto:staffa at niehs.nih.gov]
> Sent: vrijdag 7 maart 2008 0:28
> To: Marc Logghe; Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Thanks
> I really appreciate all the interest given and help generated.
> that sure sounds like a great idea, but i think
> Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
> Is there a substitute?
> It works great with
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> 
> as seen in:
> gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
> guesser guesses gcg
>
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAG
GG
> CA
> GAGCCTCGGGA  et c.
> (yes, I added
> my $file_type = $guesser->guess;
> print "guesser guesses $file_type\n";
> )
> 
> BUT
> when applied to a genbank sequence passed thru the Seqlab editor and
> turned
> into GCG, to wit:
> !!NA_SEQUENCE 1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate
kinase
>             (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC
2.7.2.3).
> ACCESSION   X05246 Y00261
> ...
> ...
> BASE COUNT      583 a    367 c    442 g    519 t
> ORIGIN
> 
>  HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156
..
> 
>        1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
> et c.
> 
> It thinks it is a flawed PIR:
> 
> gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
> guesser guesses pir
> 
> ------------- EXCEPTION  -------------
> MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE
1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> 
> 
> Must look at why guesser is thinking PIR.
> 
> 
> 
> 
> On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:
> 
> > Hi Nick,
> > I don't think you should leave out the -format option. You have to
leave
> > it in but the format should be provided by the B::T::GuessSeqFormat
> > object.
> > Something like:
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GuessSeqFormat;
> >
> > $| = 1;
> > my $number_of_files = @ARGV;
> > if(!$number_of_files){print "no files entered\n";exit:}
> > foreach my $file (@ARGV){
> >   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
> >   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file,
-format =>
> > $guesser->guess);
> >   my $seq_object = $seqio_object->next_seq;
> >   my $sequence = $seq_object->seq;
> >   print "$sequence\n";
> > }
> >
> > HTH,
> > Marc
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> >> Sent: donderdag 6 maart 2008 16:24
> >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> >> Cc: Chris Fields
> >> Subject: Re: [Bioperl-l] SeqIO
> >>
> >> Here's the scoop:
> >> When I use Jason's suggestion, (-format => 'gcg'),
> >> My program works without complaint on the original file that looks
> > like:
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> >>
> >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> >> (which should be retro-gcg format (before version 11?)),
> >> my program runs, but there IS a complaint:
> >> Use of uninitialized value in scalar chomp at
> >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0>
line
> > 1.
> >> BUT
> >> If I remove (-format => 'gcg'),  I get no complaint, but the
sequence
> >> returned still has its numbers imbedded. This effects my
calculations.
> >>
> >> Thanks, at least i know what my options are.
> >>
> >>
> >>
> >> Nick Staffa
> >> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >> Scientific Computing Support Group
> >> NIEHS Information Technology Support Services Contract
> >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >> National Institute of Environmental Health Sciences
> >> National Institutes of Health
> >> Research Triangle Park, North Carolina
> >


From bix at sendu.me.uk  Fri Mar  7 05:32:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 10:32:01 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
Message-ID: <47D119A1.10408@sendu.me.uk>

Chris Fields wrote:
> I would like to get automated PPM builds set up as well but I think we 
> have to rework some Build.PL stuff to get that going.

What's the hold-up on that front?

From heikki at sanbi.ac.za  Fri Mar  7 06:09:25 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 7 Mar 2008 13:09:25 +0200
Subject: [Bioperl-l] BioSQL V1.0.0 released
Message-ID: <200803071309.25294.heikki@sanbi.ac.za>

BIOSQL V1.0.0 RELEASED
http://news.open-bio.org/archives/2008_03.html#000094


Congratulations, Hilmar!

		-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From cjfields at uiuc.edu  Fri Mar  7 08:53:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:53:50 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D119A1.10408@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
Message-ID: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>

I haven't tried it out yet, to tell the truth.  The worry I have is  
prompting during the build process for database tests, networking, etc.

I have looked for it, but couldn't determine whether we have a way to  
run 'perl Build.PL' and bypass prompts with passed arguments.  The  
only one I could find was 'network', for network tests.

Scott Cain and I have corresponded about this before, i.e. it would be  
nice to have boolean flags for each prompt (prereqs, database tests,  
scripts, network, etc).  For nightly PPMs I would forego tests and  
include scripts.

chris

On Mar 7, 2008, at 4:32 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I would like to get automated PPM builds set up as well but I think  
>> we have to rework some Build.PL stuff to get that going.
>
> What's the hold-up on that front?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Mar  7 08:22:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:22:27 -0600
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <200803071309.25294.heikki@sanbi.ac.za>
References: <200803071309.25294.heikki@sanbi.ac.za>
Message-ID: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>

Same here.  Great news!

chris

On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:

> BIOSQL V1.0.0 RELEASED
> http://news.open-bio.org/archives/2008_03.html#000094
>
>
> Congratulations, Hilmar!
>
> 		-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Mar  7 09:10:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:10:08 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
Message-ID: <47D14CC0.8000104@sendu.me.uk>

Chris Fields wrote:
> I haven't tried it out yet, to tell the truth.  The worry I have is 
> prompting during the build process for database tests, networking, etc.
> 
> I have looked for it, but couldn't determine whether we have a way to 
> run 'perl Build.PL' and bypass prompts with passed arguments.  The only 
> one I could find was 'network', for network tests.
> 
> Scott Cain and I have corresponded about this before, i.e. it would be 
> nice to have boolean flags for each prompt (prereqs, database tests, 
> scripts, network, etc).  For nightly PPMs I would forego tests and 
> include scripts.

I don't quite understand how you're making the nightlys right now, but 
you should be using the dist actions:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

Ie.

One time (and one time only):
perl Build.PL (it doesn't matter how you answer the questions)

Then every night:
./Build dist
./Build ppmdist

You then upload the resulting .tar.gz and .zip files.


Only if Build.PL or ModuleBuildBioperl are updated might you need to:
./Build realclean
perl Build.PL
again. But this should be a rare event and even more rarely would it be 
/required/ (probably never).

From bix at sendu.me.uk  Fri Mar  7 09:19:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:19:36 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <47D14EF8.5090107@sendu.me.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is 
>> prompting during the build process for database tests, networking, etc.
>>
>> I have looked for it, but couldn't determine whether we have a way to 
>> run 'perl Build.PL' and bypass prompts with passed arguments.  The 
>> only one I could find was 'network', for network tests.
>>
>> Scott Cain and I have corresponded about this before, i.e. it would be 
>> nice to have boolean flags for each prompt (prereqs, database tests, 
>> scripts, network, etc).  For nightly PPMs I would forego tests and 
>> include scripts.
> 
> I don't quite understand how you're making the nightlys right now, but 
> you should be using the dist actions:
> 
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
> 
> Ie.
> 
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
> 
> Then every night:
> ./Build dist
> ./Build ppmdist
> 
> You then upload the resulting .tar.gz and .zip files.

Ah, having uploaded the various archives you'll have to manually delete 
them before dunning the dist action the next night, otherwise dist will 
ask you if you want to overwrite them. Otherwise dist asks no questions.


From cjfields at uiuc.edu  Fri Mar  7 09:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 08:28:36 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>


On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is  
>> prompting during the build process for database tests, networking,  
>> etc.
>> I have looked for it, but couldn't determine whether we have a way  
>> to run 'perl Build.PL' and bypass prompts with passed arguments.   
>> The only one I could find was 'network', for network tests.
>> Scott Cain and I have corresponded about this before, i.e. it would  
>> be nice to have boolean flags for each prompt (prereqs, database  
>> tests, scripts, network, etc).  For nightly PPMs I would forego  
>> tests and include scripts.
>
> I don't quite understand how you're making the nightlys right now,  
> but you should be using the dist actions:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> Ie.
>
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
>
> Then every night:
> ./Build dist
> ./Build ppmdist
>
> You then upload the resulting .tar.gz and .zip files.
>
>
> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
> ./Build realclean
> perl Build.PL
> again. But this should be a rare event and even more rarely would it  
> be /required/ (probably never).

I'm not making a distribution; the archives are merely cleaned up svn  
checkouts (no .svn directories).  This is essentially what the  
net_install script would get when installing GBrowse using the 'dev'  
option, except you don't need to install Subversion to get updates.   
Also, at this point we don't have an analogous 'Download tarball'  
setting for browsable svn either, so this is a suitable alternative.

Again, I don't want to deal with prompts while running a cron job  
(this is a bash script), particularly if I can't guarantee the number  
of prompts or the prompting order won't change down the line.  If we  
can set up a way around that using passed args to Build.PL then it  
would make life much easier and we could automate 'Build dist', 'Build  
ppmdist', 'Build testcover', etc.

chris

From bix at sendu.me.uk  Fri Mar  7 09:54:41 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:54:41 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
Message-ID: <47D15731.2050000@sendu.me.uk>

Chris Fields wrote:
> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>> One time (and one time only):
>> perl Build.PL (it doesn't matter how you answer the questions)
>>
>> Then every night:
>> ./Build dist
>> ./Build ppmdist
>>
>> You then upload the resulting .tar.gz and .zip files.
>>
>>
>> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
>> ./Build realclean
>> perl Build.PL
>> again. But this should be a rare event and even more rarely would it 
>> be /required/ (probably never).
> 
> I'm not making a distribution; the archives are merely cleaned up svn 
> checkouts (no .svn directories).  This is essentially what the 
> net_install script would get when installing GBrowse using the 'dev' 
> option, except you don't need to install Subversion to get updates.  
> Also, at this point we don't have an analogous 'Download tarball' 
> setting for browsable svn either, so this is a suitable alternative.

The dist action does what you want. I did a diff on the most recent 
nightly build and the .tar.gz produced by the dist action of a checkout 
of revision 14603:

$ diff -r bioperl-1.5.2_100 bioperl-live
diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/chaos.pm
2c2
< # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
---
 > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
Only in bioperl-live/Bio/Tools: WebBlat.pm
Only in bioperl-live: CHANGELOG
Only in bioperl-1.5.2_100: MANIFEST
Only in bioperl-1.5.2_100: META.yml
diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
1,31c1,30
< # Note: this file was auto-generated by Module::Build::Compat version 0.03
[snip]
---
 > #!/usr/bin/perl -w
 >
 > # This is a stub that simply tells you to use Build.PL instead
[snip]
Only in bioperl-live: bioperl.lisp
diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl 
bioperl-live/maintenance/cvs2cl_by_file.pl
29c29
< ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
---
 > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $

I don't know what's going on with the date differences, but for a file 
found in a folder called '/DIST/nightly_builds/', you want the MANIFEST 
and META.yml files. You also want the Compat version of Build.PL since 
we haven't yet moved to forcing people to use Build.PL.

'./Build dist' does the right thing.


> Again, I don't want to deal with prompts while running a cron job (this 
> is a bash script), particularly if I can't guarantee the number of 
> prompts or the prompting order won't change down the line.  If we can 
> set up a way around that using passed args to Build.PL then it would 
> make life much easier and we could automate 'Build dist', 'Build 
> ppmdist', 'Build testcover', etc.

Again, you only need to run 'perl Build.PL' once and answer the 
questions only once. Then you can svn update and run the actions with no 
more questions to answer. This isn't a problem that needs to be solved. 
It is /supposed/ to be this way. It's ready to use! Please make use of 
it; it's one of the (many) reasons I moved Bioperl over to Build.PL in 
the first place.


From cjfields at uiuc.edu  Fri Mar  7 10:29:11 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 09:29:11 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D15731.2050000@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
Message-ID: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>


On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>>> One time (and one time only):
>>> perl Build.PL (it doesn't matter how you answer the questions)
>>>
>>> Then every night:
>>> ./Build dist
>>> ./Build ppmdist
>>>
>>> You then upload the resulting .tar.gz and .zip files.
>>>
>>>
>>> Only if Build.PL or ModuleBuildBioperl are updated might you need  
>>> to:
>>> ./Build realclean
>>> perl Build.PL
>>> again. But this should be a rare event and even more rarely would  
>>> it be /required/ (probably never).
>> I'm not making a distribution; the archives are merely cleaned up  
>> svn checkouts (no .svn directories).  This is essentially what the  
>> net_install script would get when installing GBrowse using the  
>> 'dev' option, except you don't need to install Subversion to get  
>> updates.  Also, at this point we don't have an analogous 'Download  
>> tarball' setting for browsable svn either, so this is a suitable  
>> alternative.
>
> The dist action does what you want. I did a diff on the most recent  
> nightly build and the .tar.gz produced by the dist action of a  
> checkout of revision 14603:
>
> $ diff -r bioperl-1.5.2_100 bioperl-live
> diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/ 
> chaos.pm
> 2c2
> < # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
> ---
> > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
> Only in bioperl-live/Bio/Tools: WebBlat.pm
> Only in bioperl-live: CHANGELOG
> Only in bioperl-1.5.2_100: MANIFEST
> Only in bioperl-1.5.2_100: META.yml
> diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
> 1,31c1,30
> < # Note: this file was auto-generated by Module::Build::Compat  
> version 0.03
> [snip]
> ---
> > #!/usr/bin/perl -w
> >
> > # This is a stub that simply tells you to use Build.PL instead
> [snip]
> Only in bioperl-live: bioperl.lisp
> diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl bioperl-live/ 
> maintenance/cvs2cl_by_file.pl
> 29c29
> < ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
> ---
> > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $
>
> I don't know what's going on with the date differences, but for a  
> file found in a folder called '/DIST/nightly_builds/', you want the  
> MANIFEST and META.yml files. You also want the Compat version of  
> Build.PL since we haven't yet moved to forcing people to use Build.PL.
>
> './Build dist' does the right thing.
>
>
>> Again, I don't want to deal with prompts while running a cron job  
>> (this is a bash script), particularly if I can't guarantee the  
>> number of prompts or the prompting order won't change down the  
>> line.  If we can set up a way around that using passed args to  
>> Build.PL then it would make life much easier and we could automate  
>> 'Build dist', 'Build ppmdist', 'Build testcover', etc.
>
> Again, you only need to run 'perl Build.PL' once and answer the  
> questions only once. Then you can svn update and run the actions  
> with no more questions to answer. This isn't a problem that needs to  
> be solved. It is /supposed/ to be this way. It's ready to use!  
> Please make use of it; it's one of the (many) reasons I moved  
> Bioperl over to Build.PL in the first place.

Then set it up the way you want.  I give up.

chris

From bix at sendu.me.uk  Fri Mar  7 10:43:44 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 15:43:44 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
Message-ID: <47D162B0.5070402@sendu.me.uk>

Chris Fields wrote:
> 
> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>
>> Again, you only need to run 'perl Build.PL' once and answer the 
>> questions only once. Then you can svn update and run the actions with 
>> no more questions to answer. This isn't a problem that needs to be 
>> solved. It is /supposed/ to be this way. It's ready to use! Please 
>> make use of it; it's one of the (many) reasons I moved Bioperl over to 
>> Build.PL in the first place.
> 
> Then set it up the way you want.  I give up.

I really don't understand that response. I have merely informed you how 
Build.PL and the actions work, since you didn't know. I have informed 
you it already does what you want in terms of automation; there's 
nothing to wait for, no more work to do. I have requested you use it, 
since there is little value in duplicating code and effort.

Now that you have the information, you can make an informed choice as to 
how to proceed, based on your needs. If you have good reasons for 
sticking with your current nightly build process, by all means stick 
with them.

Mainly I just wanted to make clear (as a general point for anyone 
interested) that the questions asked by Build.PL aren't an issue or 
obstacle in terms of automating builds or tests.

From cjfields at uiuc.edu  Fri Mar  7 11:15:31 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 10:15:31 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D162B0.5070402@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
Message-ID: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>

On Mar 7, 2008, at 9:43 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>>
>>> Again, you only need to run 'perl Build.PL' once and answer the  
>>> questions only once. Then you can svn update and run the actions  
>>> with no more questions to answer. This isn't a problem that needs  
>>> to be solved. It is /supposed/ to be this way. It's ready to use!  
>>> Please make use of it; it's one of the (many) reasons I moved  
>>> Bioperl over to Build.PL in the first place.
>> Then set it up the way you want.  I give up.
>
> I really don't understand that response. I have merely informed you  
> how Build.PL and the actions work, since you didn't know. I have  
> informed you it already does what you want in terms of automation;  
> there's nothing to wait for, no more work to do. I have requested  
> you use it, since there is little value in duplicating code and  
> effort.
>
> Now that you have the information, you can make an informed choice  
> as to how to proceed, based on your needs. If you have good reasons  
> for sticking with your current nightly build process, by all means  
> stick with them.
>
> Mainly I just wanted to make clear (as a general point for anyone  
> interested) that the questions asked by Build.PL aren't an issue or  
> obstacle in terms of automating builds or tests.

It doesn't come across that way; it comes off as pretty  
condescending.  And please don't assume I lack experience with how  
Module::Build works (I have used 'Build ppmdist' and 'Build testcover'  
quite a few times recently, and the next item on my agenda is to fix  
the various issues with Build.PL and database checking, which you  
already know).

So my response is pretty simple; if you feel the need to use  
'Build.PL' to make nightlies, then by all means set it up.  I find it  
much harder to work with the current Build process in an automated way  
using a bash script, so I work around it.  If it makes you happier we  
can switch the directory over to 'nightly_checkouts', but I think  
that's just mincing semantics.

Okay, it's pretty obvious we're not on the same page here.  I'll go  
through it carefully so you understand the problem:

1)  I am running a 'svn co' on anon. svn for the various distros to a  
temp directory.  This is done using a bash script.  If I attempt to  
change into the distribution directory and run 'perl Build.PL' from  
the bash script, I immediately run into permissions issues and several  
odd things:

Checking prerequisites...
  - ERROR: Bio::Root::Version is not installed
(I think you ran Build.PL directly, so will use CPAN to install  
prerequisites on demand)
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
Warning: You are not allowed to write into directory "/root/.cpan/ 
sources/authors".
     I'll continue, but if you encounter problems, they may be due
     to insufficient permissions.
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/ 
authors/01mailrc.txt.gz-8678': Permission denied]
Fetching with Net::FTP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
Cannot open Local file /root/.cpan/sources/authors/01mailrc.txt.gz:  
Permission denied
....

2) I suspect, even if I worked around permissions and set up the job  
as root or admin and worked out why it can't find  
'Bio::Root::Version' (?!?), this would still be a terrific pain in the  
*** to deal with as the Build.PL process is expecting answers for each  
and every prompt, and the process differs for each distribution.  Yes,  
I could set something up to deal with that on in the script.  No, I  
will not do that as any additions or changes to prompts could break/ 
hang the script or (worse) silently change what the archive contains.   
Hence my indication that passing flags to 'perl Build.PL' would be a  
nice way to work around that.  For that I haven't heard a response, so  
I assume that functionality isn't there (or am I assuming incorrectly?).

So, from where I stand, even if using Build.PL is the /proper/ way to  
do it, it doesn't work as expected using an automated process (i.e.  
cron).  Make sense?

chris


From matthewehodges at gmail.com  Fri Mar  7 11:16:47 2008
From: matthewehodges at gmail.com (Matt)
Date: Fri, 7 Mar 2008 16:16:47 +0000 (UTC)
Subject: [Bioperl-l] Reciprocal blast
Message-ID: <loom.20080307T160014-878@post.gmane.org>

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the 
protein models of various species also in fasta format. The aim is o have an 
output showing presence/not presence. I think this is possible to do using 
perl, but i'm very much a beginner so any help in this would be greatly 
appreciated.
Thanks
Matt


From bix at sendu.me.uk  Fri Mar  7 12:34:17 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 17:34:17 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D17C99.9050009@sendu.me.uk>

Chris Fields wrote:
> 1)  I am running a 'svn co' on anon. svn for the various distros to a 
> temp directory.

Is it important that you do a fresh co every night? Why not do a co once 
   and then do a 'svn update' every night? This is the crux of the 
problems: if you choose to simply update, then you only have to get 
'perl Build.PL' to work once.


> If I attempt to change into the distribution directory and run 'perl Build.PL' from the 
> bash script, I immediately run into permissions issues and several odd 
> things:
> 
> Checking prerequisites...
>  - ERROR: Bio::Root::Version is not installed
> (I think you ran Build.PL directly, so will use CPAN to install 
> prerequisites on demand)
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
>   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
> Warning: You are not allowed to write into directory 
> "/root/.cpan/sources/authors".
[snip]

I'm assuming this is on portal? The CPAN setup for users is a little 
broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm

$CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"

Then you can run and configure cpan correctly and install Bundle::CPAN. 
Some of the zlib stuff failed to install for me, but that doesn't seem 
to matter.

Of course, I guess it makes sense for root to just install all of 
Bioperl's prereqs anyway, so that testing can be automated in the future.

Anyway, once you have cpan happy 'perl Build.PL' will run fine. Answer 
'n' to everything and then your cron job just has to call './Build dist'.


> 2) I suspect, even if I worked around permissions and set up the job as 
> root or admin and worked out why it can't find 'Bio::Root::Version' 
> (?!?), this would still be a terrific pain in the *** to deal with as 
> the Build.PL process is expecting answers for each and every prompt, and 
> the process differs for each distribution.

You won't be running Build.PL in the cron job.


> passing flags to 'perl Build.PL' would be a nice way to work around 
> that.  For that I haven't heard a response, so I assume that 
> functionality isn't there (or am I assuming incorrectly?).

It isn't AFAIK, but my point is that it doesn't need to be (for this 
particular use-case at least).


> So, from where I stand, even if using Build.PL is the /proper/ way to do 
> it, it doesn't work as expected using an automated process (i.e. cron).  
> Make sense?

Only if you can't run 'svn update' instead of 'svn co' each night.

From cjfields at uiuc.edu  Fri Mar  7 13:00:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 12:00:52 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D17C99.9050009@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
Message-ID: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>


On Mar 7, 2008, at 11:34 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> 1)  I am running a 'svn co' on anon. svn for the various distros to  
>> a temp directory.
>
> Is it important that you do a fresh co every night? Why not do a co  
> once   and then do a 'svn update' every night? This is the crux of  
> the problems: if you choose to simply update, then you only have to  
> get 'perl Build.PL' to work once.

Unless you update Build.PL (which will happen as the distributions  
grow).  Then you need to rerun 'perl Build.PL'.  It seems safer to run  
that each time with a 'pass-through' flag for automated builds.

>> If I attempt to change into the distribution directory and run  
>> 'perl Build.PL' from the bash script, I immediately run into  
>> permissions issues and several odd things:
>> Checking prerequisites...
>> - ERROR: Bio::Root::Version is not installed
>> (I think you ran Build.PL directly, so will use CPAN to install  
>> prerequisites on demand)
>> CPAN: Storable loaded ok
>> Going to read /root/.cpan/Metadata
>>  Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
>> Warning: You are not allowed to write into directory "/root/.cpan/ 
>> sources/authors".
> [snip]
>
> I'm assuming this is on portal? The CPAN setup for users is a little  
> broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm
>
> $CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"
>
> Then you can run and configure cpan correctly and install  
> Bundle::CPAN. Some of the zlib stuff failed to install for me, but  
> that doesn't seem to matter.
>
> Of course, I guess it makes sense for root to just install all of  
> Bioperl's prereqs anyway, so that testing can be automated in the  
> future.
>
> Anyway, once you have cpan happy 'perl Build.PL' will run fine.  
> Answer 'n' to everything and then your cron job just has to call './ 
> Build dist'.

I agree about setting up the prereqs.  I could also (as mentioned  
before) set this up as root.  However, if we go this route we need to  
have 'perl Build.PL' included in the process in order to ensure a  
clean build process each time and to prevent the script from breaking  
whenever someone decides to change Build.PL.

>> 2) I suspect, even if I worked around permissions and set up the  
>> job as root or admin and worked out why it can't find  
>> 'Bio::Root::Version' (?!?), this would still be a terrific pain in  
>> the *** to deal with as the Build.PL process is expecting answers  
>> for each and every prompt, and the process differs for each  
>> distribution.
>
> You won't be running Build.PL in the cron job.

See above.  I don't want to set up something automated which can't be  
maintained in the long term.

>> passing flags to 'perl Build.PL' would be a nice way to work around  
>> that.  For that I haven't heard a response, so I assume that  
>> functionality isn't there (or am I assuming incorrectly?).
>
> It isn't AFAIK, but my point is that it doesn't need to be (for this  
> particular use-case at least).

See above.  There are very good reasons to allow this (and the  
functionality has been requested before, particularly from the GMOD  
crowd).  If I can pass in a single flag (for instance, --defaults,  
which just uses the default arg for each prompt) then it would make  
it /much/ easier.

>> So, from where I stand, even if using Build.PL is the /proper/ way  
>> to do it, it doesn't work as expected using an automated process  
>> (i.e. cron).  Make sense?
>
> Only if you can't run 'svn update' instead of 'svn co' each night.

I think a single co with updates is feasible (I can do that with the  
current setup; just run the initial co, copy the directory over to a  
temp copy, then go about my business).

I'll leave the nightly build setup as is for now and work on getting  
Build.PL working (something we need anyway for Devel::Cover and  
Pod::Coverage work).

chris

From David.Messina at sbc.su.se  Fri Mar  7 13:14:38 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 19:14:38 +0100
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <loom.20080307T160014-878@post.gmane.org>
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>

Hey Matt,

Your question is a little beyond the scope of this mailing list. I don't
know what your bioinformatics background is, but in my experience it's best
to get started hands-on, either in a class or with someone you can sit down
and work through it with. You'll have a million questions, and a mailing
list isn't really suitable for that.

That being said, I would run the blasts on the command-line, parse out the
best hits with BioPerl, and then use hashes to identify mutual best hits.

Briefly, you have two datasets A & B. Format each dataset into a blast
database using xdformat or formatdb. Run two blasts, one with A as query and
B as database and then one with B as query and A as database. The two output
files, each containing multiple Blast reports, can then be processed with
Bio::SearchIO to extract the best hit for each protein.

Read this tutorial for help with that:
http://www.bioperl.org/wiki/HOWTO:SearchIO

Once you get the best hit for each protein, then you can use Perl to find
every instance where two proteins, one from each set, are each other's best
hit. One way would be to create two hashes, one for each set, with query
proteins as keys and best hits as values, and then step through to find the
reciprocal bests.


Dave

From jay at jays.net  Fri Mar  7 13:51:35 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 12:51:35 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
References: <loom.20080307T160014-878@post.gmane.org>
	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
Message-ID: <47D18EB7.3060906@jays.net>

Dave Messina wrote:
> Your question is a little beyond the scope of this mailing list. I don't
> know what your bioinformatics background is, but in my experience it's best
> to get started hands-on, either in a class or with someone you can sit down
> and work through it with. You'll have a million questions, and a mailing
> list isn't really suitable for that.
>
> That being said, I would run the blasts on the command-line, parse out the
> best hits with BioPerl, and then use hashes to identify mutual best hits.
>   

Hi Matt,

If you're a glutton for punishment and want to see a ball of Perl that 
automates and tracks stats across my version of "reciprocal blasts" 
(mine is called cross_blast()), help yourself:

   svn checkout svn://vc.jays.net/seqlab seqlab

I abandoned my maiden voyage into bioinformatics, called "SeqLab," as a 
stand-alone entity when the subsequent thousand tasks I worked turned 
out to be unrelated to all the software I had built so far. My naive 
grand unification vision for all of bioinformatics didn't quite work out 
as I had planned. -laugh-  Nowadays I just cherry-pick solutions out of 
its guts on demand.  :)

I'm happy to field any questions you have about that code, if it helps 
you any.

Cheers,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Fri Mar  7 14:43:03 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 13:43:03 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <47D18EB7.3060906@jays.net>
References: <loom.20080307T160014-878@post.gmane.org>	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
	<47D18EB7.3060906@jays.net>
Message-ID: <47D19AC7.1060907@jays.net>

Jay Hannah wrote:
> I'm happy to field any questions you have about that code, if it helps 
> you any.

I created a wiki page since I stopped paying the bill on the 
"seqlab.net" domain:   :)

http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

Cheers,

j


From cain.cshl at gmail.com  Fri Mar  7 15:17:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 15:17:29 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <1204921049.6467.9.camel@frissell>

Hi Chris,

Thanks much for this.  I have one observation though: both the http and
ftp directories are empty except for a log file :-/

Also, I saw that you mentioned the 'accept the defaults' option I asked
about in January.  I did implement that on Build.PL at the exact time
that the transition from cvs to svn was happening, so I never got
committed back.  Hopefully I still have it :-)  I'll look around and
commit it when I find it.

Scott

On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
> am every night.
> 
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> 
> The archives are date-stamped and also have the Subversion revision,  
> just in case one wanted to ensure they get the correct version for the  
> bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
> (if there are any).  These are currently derived off the anon. svn  
> repository.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Fri Mar  7 15:25:01 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 14:25:01 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204921049.6467.9.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
Message-ID: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>

I was testing a few things earlier using 'Build dist' which tanked the  
old archives.  I reran the script manually so everything should be up  
now.

If you have the default setting implemented for Build.PL that would be  
great.  There is a lingering minor issue with Data::Dumper error  
output via perl 5.10, but beyond that it should be fine.

chris

On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:

> Hi Chris,
>
> Thanks much for this.  I have one observation though: both the http  
> and
> ftp directories are empty except for a log file :-/
>
> Also, I saw that you mentioned the 'accept the defaults' option I  
> asked
> about in January.  I did implement that on Build.PL at the exact time
> that the transition from cvs to svn was happening, so I never got
> committed back.  Hopefully I still have it :-)  I'll look around and
> commit it when I find it.
>
> Scott
>
> On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
>> We now have nightly bundled archives for bioperl-live, bioperl-db,
>> bioperl-run, and bioperl-network running; these will be updated ~  
>> 1:00
>> am every night.
>>
>> http://www.bioperl.org/DIST/nightly_builds/
>> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>>
>> The archives are date-stamped and also have the Subversion revision,
>> just in case one wanted to ensure they get the correct version for  
>> the
>> bug fix.  They also contain a CHANGELOG file for the last 10  
>> revisions
>> (if there are any).  These are currently derived off the anon. svn
>> repository.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From N.Haigh at sheffield.ac.uk  Fri Mar  7 16:01:43 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri,  7 Mar 2008 21:01:43 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
	<BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
Message-ID: <1204923703.47d1ad37a614a@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

-- snip --

> 
> I'll leave the nightly build setup as is for now and work on getting  
> Build.PL working (something we need anyway for Devel::Cover and  
> Pod::Coverage work).
> 

One of the test metrics of Devel::Cover is Pod::Coverage ....no need to have a seperate Pod::Coverage test :o)

Nath

From cain.cshl at gmail.com  Fri Mar  7 17:25:53 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 17:25:53 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
Message-ID: <1204928753.6467.19.camel@frissell>

OK, I added my 'accept the defaults' option.  Use it like this:

  perl Build.PL --accept 1

Scott

On Fri, 2008-03-07 at 14:25 -0600, Chris Fields wrote:
> I was testing a few things earlier using 'Build dist' which tanked the  
> old archives.  I reran the script manually so everything should be up  
> now.
> 
> If you have the default setting implemented for Build.PL that would be  
> great.  There is a lingering minor issue with Data::Dumper error  
> output via perl 5.10, but beyond that it should be fine.
> 
> chris
> 
> On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:
> 
> > Hi Chris,
> >
> > Thanks much for this.  I have one observation though: both the http  
> > and
> > ftp directories are empty except for a log file :-/
> >
> > Also, I saw that you mentioned the 'accept the defaults' option I  
> > asked
> > about in January.  I did implement that on Build.PL at the exact time
> > that the transition from cvs to svn was happening, so I never got
> > committed back.  Hopefully I still have it :-)  I'll look around and
> > commit it when I find it.
> >
> > Scott
> >
> > On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> >> We now have nightly bundled archives for bioperl-live, bioperl-db,
> >> bioperl-run, and bioperl-network running; these will be updated ~  
> >> 1:00
> >> am every night.
> >>
> >> http://www.bioperl.org/DIST/nightly_builds/
> >> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> >>
> >> The archives are date-stamped and also have the Subversion revision,
> >> just in case one wanted to ensure they get the correct version for  
> >> the
> >> bug fix.  They also contain a CHANGELOG file for the last 10  
> >> revisions
> >> (if there are any).  These are currently derived off the anon. svn
> >> repository.
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From n.haigh at sheffield.ac.uk  Sat Mar  8 07:55:39 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sat, 08 Mar 2008 12:55:39 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>	<47D119A1.10408@sendu.me.uk>	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>	<47D14CC0.8000104@sendu.me.uk>	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>	<47D15731.2050000@sendu.me.uk>	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D28CCB.50507@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:

- -- snip --

> 1)  I am running a 'svn co' on anon. svn for the various distros to a
> temp directory.  This is done using a bash script.  If I attempt to
> change into the distribution directory and run 'perl Build.PL' from the
> bash script, I immediately run into permissions issues and several odd
> things:
> 

- -- snip --

Hi Chris,

Do you need to do any svn commands after the checkout? If not, you can do "svn export" instead:
http://svnbook.red-bean.com/en/1.0/re10.html

This basically recursively gets the URL specified without the .svn dirs. However, you then won't be able to run any svn commands on it, as it won't be
a working copy....save bandwidth and possible post processing to delete all the .svn dirs.

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH0ozL9gTv6QYzVL4RAkvcAJ9eSosx3+YWfbBg/KT6+HZrbweGSgCguLCe
ZYtTxSi5q6iiR+sVGDQEZ68=
=uFNP
-----END PGP SIGNATURE-----

From nm249 at cornell.edu  Sat Mar  8 11:48:44 2008
From: nm249 at cornell.edu (Naama Menda)
Date: Sat, 08 Mar 2008 11:48:44 -0500
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <A042F0A0-F007-45C2-A838-7666A85F5836@cornell.edu>

Hi Hilmar,

I have a loading script that uses Bio::Ontology::OntologyI  for  
parsing  obo files and loading terms  into chado schema.
I'm trying to find all relationship types, and it seems that the  
parser looks at the distinct relationship types used by the terms in  
the file,
but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
This is important for storing the relationships in the right  
context , for example all relationships types defined by Sequence  
Ontology should be stored in
the chado schema using the SO cv_id, while other relationship types,  
not defined as Typedef in the obo file, should be stored using the  
'relationship' cv_id.
Without a way to parse Typedefs, I also cannot use Bio::Ontology for  
parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).

Is there another function in Bio::Ontology that handles Typedefs? If  
not can one be added?

Thanks!
-Naama Menda

From bix at sendu.me.uk  Sat Mar  8 18:30:40 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 08 Mar 2008 23:30:40 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204928753.6467.19.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<1204921049.6467.9.camel@frissell>	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>
Message-ID: <47D321A0.9010209@sendu.me.uk>

Scott Cain wrote:
> OK, I added my 'accept the defaults' option.  Use it like this:
> 
>   perl Build.PL --accept 1

Thanks for that Scott, but can you revert and have another go at that 
commit, because you ended up wiping out the recent commits by Chris and 
myself.

Also, rather than individually alter the Bioperl-specific methods like 
choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
perhaps by overriding prompt() itself? Other questions may get added in 
the future, and some existing questions aren't immediately obvious, so 
it would be nice to be sure an automated process like a cron job /never/ 
gets asked a question.

From hlapp at gmx.net  Sun Mar  9 17:37:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 17:37:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <47D2C36C.2020802@cornell.edu>
References: <47D2C36C.2020802@cornell.edu>
Message-ID: <DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>

Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing  
rather than skipping those could be added to the code; I also once  
started and almost completed a project to integrated the go-perl .obo  
parser into the Bio::OntologyIO framework, but the final touches fell  
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and  
traversing it in some way, go-perl might have all you need. If you  
need more than that, could you elaborate?

	-hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

> Hi Hilmar,
>
> I have a loading script that uses Bio::Ontology::OntologyI  for  
> parsing  obo files and loading terms  into chado schema.
> I'm trying to find all relationship types, and it seems that the  
> parser looks at the distinct relationship types used by the terms  
> in the file,
> but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
> This is important for storing the relationships in the right  
> context , for example all relationships types defined by Sequence  
> Ontology should be stored in
> the chado schema using the SO cv_id, while other relationship  
> types, not defined as Typedef in the obo file, should be stored  
> using the 'relationship' cv_id.
> Without a way to parse Typedefs, I also cannot use Bio::Ontology  
> for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
>
> Is there another function in Bio::Ontology that handles Typedefs?  
> If not can one be added?
>
> Thanks!
> -Naama Menda

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Sun Mar  9 21:34:05 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 21:34:05 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <48F99F4E-F17B-4000-8460-9F2CB9E0D75A@gmail.com>

My main problem is that go-perl does not handle updates, so if I want  
to update GO I need an empty schema. We find it more complicated to  
re-load our annotations than to update cvterms and their related data.
Our loading script compares an existing load of an ontology to the  
obo file and updates/insets/deletes accordingly.
We are now in the process of committing this code to GMOD, and  
thought this will be a good opportunity for adding the Typedef  
parsing option.

Thanks,
-Naama


On Sun, Mar 9, 2008 at 5:37 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing
rather than skipping those could be added to the code; I also once
started and almost completed a project to integrated the go-perl .obo
parser into the Bio::OntologyIO framework, but the final touches fell
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and
traversing it in some way, go-perl might have all you need. If you
need more than that, could you elaborate?

        -hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

 > Hi Hilmar,
 >
 > I have a loading script that uses Bio::Ontology::OntologyI  for
 > parsing  obo files and loading terms  into chado schema.
 > I'm trying to find all relationship types, and it seems that the
 > parser looks at the distinct relationship types used by the terms
 > in the file,
 > but not at the ' [Typedef] ' fields (I used 'get_predicate_terms 
()' ).
 > This is important for storing the relationships in the right
 > context , for example all relationships types defined by Sequence
 > Ontology should be stored in
 > the chado schema using the SO cv_id, while other relationship
 > types, not defined as Typedef in the obo file, should be stored
 > using the 'relationship' cv_id.
 > Without a way to parse Typedefs, I also cannot use Bio::Ontology
 > for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
 >
 > Is there another function in Bio::Ontology that handles Typedefs?
 > If not can one be added?
 >
 > Thanks!
 > -Naama Menda

--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Mar  9 22:13:15 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:13:15 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
Message-ID: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>


On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:

> My main problem is that go-perl does not handle updates, so if I  
> want to update GO I need an empty schema. We find it more  
> complicated to re-load our annotations than to update cvterms and  
> their related data.
> Our loading script compares an existing load of an ontology to the  
> obo file and updates/insets/deletes accordingly.

load_ontology.pl in bioperl-db should have all this functionality,  
though of course that doesn't give you the typedef support (yet).

>
> We are now in the process of committing this code to GMOD

Cool - obviously load_ontology.pl doesn't work off of Chado but  
instead uses BioSQL as the schema (though the ontology model is  
*very* similar between the two).

BTW please keep the Bioperl list in the loop, others may have insight  
too or be interested in the information.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Mar  9 22:43:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:43:13 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
Message-ID: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>


On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
>> My main problem is that go-perl does not handle updates, so if I  
>> want to update GO I need an empty schema. We find it more  
>> complicated to re-load our annotations than to update cvterms and  
>> their related data.
>> Our loading script compares an existing load of an ontology to the  
>> obo file and updates/insets/deletes accordingly.
>
> load_ontology.pl in bioperl-db should have all this functionality,  
> though of course that doesn't give you the typedef support (yet).
>
> Will you add this support to obo.pm? I had a look at it and it  
> seems easy to implement.
> Will there be a patch? Or in the next Bioperl release?

If you have ideas for how to implement this we'd be thrilled if you  
can provide a patch.

Most changes in BioPerl happen because and by people who have an itch  
to scratch. Seems like this one is right down your alley?

I'd in principle be interested in doing this too but can't give any  
promises as to when I might have time (unless I need it myself :)

>
>
>>
>> We are now in the process of committing this code to GMOD
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but  
> instead uses BioSQL as the schema (though the ontology model is  
> *very* similar between the two).
>
> We  store ontologies in Chado, and that was the reason for writing  
> a new loader.

Looking at it it seems you wrote a whole new language binding? Did  
you find it too difficult to build on one of the existing ones (which  
use Class::DBI if I recall correctly, though Scott will have the  
details here)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Sun Mar  9 22:26:04 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 22:26:04 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
Message-ID: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
> My main problem is that go-perl does not handle updates, so if I want to
> update GO I need an empty schema. We find it more complicated to re-load our
> annotations than to update cvterms and their related data.
> Our loading script compares an existing load of an ontology to the obo
> file and updates/insets/deletes accordingly.
>
>
> load_ontology.pl in bioperl-db should have all this functionality, though
> of course that doesn't give you the typedef support (yet).
>

Will you add this support to obo.pm? I had a look at it and it seems easy to
implement.
Will there be a patch? Or in the next Bioperl release?

>
>
> We are now in the process of committing this code to GMOD
>
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> uses BioSQL as the schema (though the ontology model is *very* similar
> between the two).
>

We  store ontologies in Chado, and that was the reason for writing a new
loader.

>
> BTW please keep the Bioperl list in the loop, others may have insight too
> or be interested in the information.
>
> -hilmar
>
> --
>
Thanks!
-Naama

> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>

From akarger at CGR.Harvard.edu  Mon Mar 10 09:33:49 2008
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 10 Mar 2008 09:33:49 -0400
Subject: [Bioperl-l] Reciprocal blast
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <72AF30DC2881964CB911FD08E57157E7367BD5@lsdiv-msxbe-001.nucleus.harvard.edu>

<plug>
There's a cut & paste protocol for Reciprocal best hit blast at

http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/Pro
tocols/Sequences.html

Let me know if you need to tweak things.
</plug>

-Amir Karger 

> -----Original Message-----
> From: Matt [mailto:matthewehodges at gmail.com] 
> Sent: Friday, March 07, 2008 11:17 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Reciprocal blast
> 
> Dear experts,
> 
> I want to do a best reciprocal blastp of a fasta protein 
> dataset against the 
> protein models of various species also in fasta format. The 
> aim is o have an 
> output showing presence/not presence. I think this is 
> possible to do using 
> perl, but i'm very much a beginner so any help in this would 
> be greatly 
> appreciated.
> Thanks
> Matt
> 
> 
> 


From Daniel.Gerlach at medecine.unige.ch  Mon Mar 10 12:13:39 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Mon, 10 Mar 2008 17:13:39 +0100
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
Message-ID: <47D55E33.8060205@medecine.unige.ch>

Dear all,

This is a very basic question. I have a tree object in $tree and want to 
save its newick representation in a variable as a string:

my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Unfortunately this does not work and he prints out the newick tree on 
stdout plus the message "Use of uninitialized value in print at ...". He 
also prints out the tree on the stdout if I remove the line "print 
$tree_string". The variable $tree_string seems to be empty.

D.

From naama.menda at gmail.com  Mon Mar 10 11:09:12 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Mon, 10 Mar 2008 11:09:12 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
Message-ID: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> >
> > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
> >
> > My main problem is that go-perl does not handle updates, so if I want to
> > update GO I need an empty schema. We find it more complicated to re-load our
> > annotations than to update cvterms and their related data.
> > Our loading script compares an existing load of an ontology to the obo
> > file and updates/insets/deletes accordingly.
> >
> >
> > load_ontology.pl in bioperl-db should have all this functionality,
> > though of course that doesn't give you the typedef support (yet).
> >
>
> Will you add this support to obo.pm? I had a look at it and it seems easy
> to implement.
> Will there be a patch? Or in the next Bioperl release?
>
>
> If you have ideas for how to implement this we'd be thrilled if you can
> provide a patch.
>
> Most changes in BioPerl happen because and by people who have an itch to
> scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any
> promises as to when I might have time (unless I need it myself :)
>

I'll try to provide a patch for this. I'll let you know how it goes..

>
>
>
> >
> > We are now in the process of committing this code to GMOD
> >
> >
> > Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> > uses BioSQL as the schema (though the ontology model is *very* similar
> > between the two).
> >
>
> We  store ontologies in Chado, and that was the reason for writing a new
> loader.
>
>
> Looking at it it seems you wrote a whole new language binding? Did you
> find it too difficult to build on one of the existing ones (which use
> Class::DBI if I recall correctly, though Scott will have the details here)
>

We already had most of the classes. We use these for other code at SGN,
since it's all OO perl. All we needed to do is to add some methods and
accessors to  our Chado classes, and  write a loading script that stores the
ontology from the db and the ontology from the file in hashrefs, compare the
2 and insert/update accordingly. Our main concern was for updating
pre-loaded ontologies (for new ontology files GMOD's make-ontologies works
great!)


> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> -Naama

From clarsen at vecna.com  Mon Mar 10 11:56:55 2008
From: clarsen at vecna.com (Christopher Larsen)
Date: Mon, 10 Mar 2008 11:56:55 -0400 (EDT)
Subject: [Bioperl-l] Reciprocal best blast hits / Orthology
Message-ID: <49819.64.47.82.110.1205164615.squirrel@mail.vecna.com>

Matt, Dave,

Regarding reciprocal best blast hit, yes -- its beyond the list and heres
how/ where to go. It seems what you are looking for is actually an
Ortholog search. If so there is more to it than reciprocity and
ranking--other groups are using phylo trees and bootstrap values etc.

Perhaps check out the perl written up by David Roos and Chris Stockerts
work: OrthoMCL. Their group is quite helpful as well.

http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi

The perl install goes local and will help you to identify a homolog that
should have the same enzyme function or cellular role. Importantly it
tends to the idea of paralogy and pseudogenes as well so you dont step
into a pit. The site explains more.

The point is, you're on the right track, but theres a group that's been
through what you are doing and can supply you with a working
implementation thats very robust and uses BioPerl modules already, so you
don't have to scratch up some code. Also you can check out INPARANOID for
the same reasons. Having just been through this, I'm just trying to lead
to you where we went. Right now we point OrthoMCL at a whole folder of
proteomes (*.faa) and it groups them accordingly.

Brian O: Don't know if this folds well into your MCL wiki page or not,
apologies.

If the group wants some post-processing code that shows the
presence/absence of proteins in any one group perhaps we can help too as
there are a few things written that take the raw output directly.

Cheers,

Chris L

=========================
Message: 5

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the
protein models of various species also in fasta format. The aim is o have an
output showing presence/not presence. I think this is possible to do using
perl, but i'm very much a beginner so any help in this would be greatly
appreciated.

Thanks

Matt

-- 
Christopher Larsen, Ph.D.
Senior Scientist
Research Grants Manager
Vecna Technologies
5004 Lehigh Ave
College Park, MD 20740
240-737-1625

From Kevin.M.Brown at asu.edu  Mon Mar 10 12:17:11 2008
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Mar 2008 09:17:11 -0700
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <1A4207F8295607498283FE9E93B775B4048A0825@EX02.asurite.ad.asu.edu>

You need to either pass in a FileHandle or a path to an output file else
you are going to see the behavior you are getting.

open my $tree_string, ">TreeFile.txt";
my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');

OR

my $out = new Bio::TreeIO(-file => "TreeFile.txt", -format => 'newick');

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Daniel Gerlach
> Sent: Monday, March 10, 2008 9:14 AM
> To: bioperl-l at portal.open-bio.org
> Subject: [Bioperl-l] Bio::TreeIO - tree object to string
> 
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree 
> and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print 
> at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.
> 
> D.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Mon Mar 10 12:51:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 10 Mar 2008 16:51:59 +0000
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <47D5672F.6000709@sendu.me.uk>

Daniel Gerlach wrote:
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.

The -fh argument is supposed to be a file handle, not a string. You can 
use whatever standard Perl method you like for attaching a filehandle to 
a scalar.

Eg.
my $tree_string = '';
open(my $fake_fh, "+<", \$tree_string);
my $out = new Bio::TreeIO(-fh => $fake_fh, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Alternatively, my $tree_string = $tree->simplify_to_leaves_string() 
might give you want you want.

From stephan.rosecker at ish.de  Mon Mar 10 12:27:57 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Mon, 10 Mar 2008 17:27:57 +0100
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
Message-ID: <web-15326627@mail-be-01.mail01.ish.de>

Dear list,

I try to understand how to fetch unigene-cluster with help of bio-db and a local 
biosql-db, but without success.
I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".

#!/usr/local/bin/perl -w

use strict;
use Bio::DB::BioDB;
use Bio::DB::Query::BioQuery;

my $db = Bio::DB::BioDB->new(
         -database => 'biosql',
         -user   => 'postgres',
         -pass   => 'foo',
         -dbname   => 'bioseqdb',
         -host     => 'foo.bar',
         -port     => 5435,    # optional
         -driver   => 'Pg'
);

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections(
          ["Bio::PrimarySeqI c::subject",
          "Bio::PrimarySeqI p::object",
         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
$query->where(["p.accession_number = 'Hs.2'"]);

my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
my $qres = $adp->find_by_query($query);
my $qres2 = $adp2->find_by_query($query);

while(my $pseq = $qres->next_object()) {
         print $pseq->accession_number,?\n?;
}
while(my $pseq = $qres2->next_object()) {
         print $pseq->accession_number,?\n?;
}

Maybe this way is wrong.
Hope you can help me.

stephan
  

From hlapp at gmx.net  Mon Mar 10 22:53:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 22:53:47 -0400
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
In-Reply-To: <web-15326627@mail-be-01.mail01.ish.de>
References: <web-15326627@mail-be-01.mail01.ish.de>
Message-ID: <A95440B6-AFFA-4F45-B90F-B74D3C0770C7@gmx.net>

Stephan -

what is the result that you are getting? Do you receive an error? Or  
simply nothing?

(BTW note that the object adaptor that you let the query execute will  
determine what you kind of object you get in return. Hence, I'd  
expect your $qres2 to return Bio::ClusterI compliant objects, not  
Bio::PrimarySeqI ones. That is probably not at the root of the  
problem here, though.)

	-hilmar

On Mar 10, 2008, at 12:27 PM, stephan.rosecker wrote:

> Dear list,
>
> I try to understand how to fetch unigene-cluster with help of bio- 
> db and a local biosql-db, but without success.
> I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use Bio::DB::BioDB;
> use Bio::DB::Query::BioQuery;
>
> my $db = Bio::DB::BioDB->new(
>         -database => 'biosql',
>         -user   => 'postgres',
>         -pass   => 'foo',
>         -dbname   => 'bioseqdb',
>         -host     => 'foo.bar',
>         -port     => 5435,    # optional
>         -driver   => 'Pg'
> );
>
> my $query = Bio::DB::Query::BioQuery->new();
>
> $query->datacollections(
>          ["Bio::PrimarySeqI c::subject",
>          "Bio::PrimarySeqI p::object",
>         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
> $query->where(["p.accession_number = 'Hs.2'"]);
>
> my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
> my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
> my $qres = $adp->find_by_query($query);
> my $qres2 = $adp2->find_by_query($query);
>
> while(my $pseq = $qres->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
> while(my $pseq = $qres2->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
>
> Maybe this way is wrong.
> Hope you can help me.
>
> stephan
>  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Mar 10 23:17:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 23:17:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
	<e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
Message-ID: <D355F5D0-756C-4CED-A341-496B23DD5A8B@gmx.net>


On Mar 10, 2008, at 11:09 AM, Naama Menda wrote:

>
>
> On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>>
>> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>>
>>> My main problem is that go-perl does not handle updates, so if I  
>>> want to update GO I need an empty schema. We find it more  
>>> complicated to re-load our annotations than to update cvterms and  
>>> their related data.
>>> Our loading script compares an existing load of an ontology to  
>>> the obo file and updates/insets/deletes accordingly.
>>
>> load_ontology.pl in bioperl-db should have all this functionality,  
>> though of course that doesn't give you the typedef support (yet).
>>
>> Will you add this support to obo.pm? I had a look at it and it  
>> seems easy to implement.
>> Will there be a patch? Or in the next Bioperl release?
>
> If you have ideas for how to implement this we'd be thrilled if you  
> can provide a patch.
>
> Most changes in BioPerl happen because and by people who have an  
> itch to scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any  
> promises as to when I might have time (unless I need it myself :)
>
> I'll try to provide a patch for this. I'll let you know how it goes..

That'd be awesome! Don't hesitate to let us know if you hit bumps.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From stephan.rosecker at ish.de  Tue Mar 11 07:08:08 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Tue, 11 Mar 2008 12:08:08 +0100
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
Message-ID: <web-15341068@mail-be-01.mail01.ish.de>

Dear list,

I have started the "bp_load_seqdatabase.pl" script from the "bioperl-db-1.5.2_100" 
package with the unigene
"Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS is postgres on a 
similar machine.
BioSQL core schema is v1.0.0..

The job runs since friday.

./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb --dbuser foo --dbpass 
bar --driver Pg --format ClusterIO::unigene ../ncbi/Hs.data

Is it normal that it takes so long?
What are your experiences?

best regards
stephan

From diriano at uni-potsdam.de  Tue Mar 11 07:55:54 2008
From: diriano at uni-potsdam.de (=?UTF-8?B?RGllZ28gTWF1cmljaW8gUmlhwpbDsW8gUGFjaMOzwpdu?=)
Date: Tue, 11 Mar 2008 12:55:54 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <47D6734A.5060103@uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using 
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => 
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
   my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
   my $blastresult=$writertxt->to_string($result);
   while (my $hit = $result->next_hit){
     print $hit->name."\n";
   }
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diriano.vcf
Type: text/x-vcard
Size: 324 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080311/85980f6c/attachment.vcf>

From diriano at uni-potsdam.de  Tue Mar 11 09:31:35 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 14:31:35 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205242295.47d689b7186ad@webmail.uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n";
  }
}
-- 

Diego Mauricio Ria?o-Pach?n
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/

From diriano at uni-potsdam.de  Tue Mar 11 10:25:19 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 15:25:19 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205245519.47d6964fcaa48@webmail.uni-potsdam.de>

Dear all,

Please excuse my previous e-mail, it was incomplete, here it is again:

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n"; #I will do further processing of the HSPs
  }
}

But I do not get any output.  It works if I comment the lines referencing the
$writertxt.  The problem is that I need to extract the whole report to later
insert it ($blastresult) into a database. But I also need to process each hit
and hsp.

Any idea how can I accomplish this?  Any help will be greatly appreciated.

Have a nice day,

Diego
-- 

Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/

From sac at bioperl.org  Tue Mar 11 15:04:48 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 11 Mar 2008 12:04:48 -0700
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
Message-ID: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>

Ditto. Thanks for biting the bullet, Hilmar.

BTW, I put together a little compendium/review of various bioinformatics
data models a few months ago, where I mention BioSQL among others, but I
never really announced it:

http://biodatamodel.org/

It thought about wikifying it to get the community involved in maintaining
it, but haven't gotten around to it yet.

Feedback is welcome.

Cheers,
Steve

On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:

> Same here.  Great news!
>
> chris
>
> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>
> > BIOSQL V1.0.0 RELEASED
> > http://news.open-bio.org/archives/2008_03.html#000094
> >
> >
> > Congratulations, Hilmar!
> >
> >               -Heikki
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cain.cshl at gmail.com  Tue Mar 11 15:29:41 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 11 Mar 2008 15:29:41 -0400
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D321A0.9010209@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>  <47D321A0.9010209@sendu.me.uk>
Message-ID: <1205263781.6220.37.camel@frissell>

Hi Sendu,

Sorry about that; I diffed the version I had with what was in svn, but
apparently didn't look closely at the results.  Do you happen to know
the best way of reverting with svn?

After it gets reverted one way or the other, I agree that overriding
prompt to include another argument is a good way to go.  That way the
value of $accept can be passed to it and it just does the right thing,
regardless of when/where it is getting called.  I'll do that.

Scott

On Sat, 2008-03-08 at 23:30 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > OK, I added my 'accept the defaults' option.  Use it like this:
> > 
> >   perl Build.PL --accept 1
> 
> Thanks for that Scott, but can you revert and have another go at that 
> commit, because you ended up wiping out the recent commits by Chris and 
> myself.
> 
> Also, rather than individually alter the Bioperl-specific methods like 
> choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
> perhaps by overriding prompt() itself? Other questions may get added in 
> the future, and some existing questions aren't immediately obvious, so 
> it would be nice to be sure an automated process like a cron job /never/ 
> gets asked a question.
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hlapp at gmx.net  Tue Mar 11 17:34:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 11 Mar 2008 17:34:30 -0400
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
In-Reply-To: <web-15341068@mail-be-01.mail01.ish.de>
References: <web-15341068@mail-be-01.mail01.ish.de>
Message-ID: <B4761ACF-F3D2-42A4-88D9-6C9790C4BAFB@gmx.net>

It won't be fast, as it will create about ~6 Mln bioentries in your  
database. However, it running since Friday sounds on the high end.

The first step I recommend doing when running into this kind of  
situation is checking the CPU load that the script generates,  
compared to the load generated by the database server. If the  
script's CPU load is significantly less than ~10% then it is likely  
that your database is too slow.

There are various possible reasons why it may be too slow, ranging  
from limited resources, to grossly suboptimal configuration. If your  
database is running on the same 15GB server then resources should not  
be an issue (assuming that you don't have a totally antiquated CPU  
there). You might still want to check the PostgreSQL config file,  
though. What I would suspect though is that you didn't VACUUM the  
database before and/or during the load. That will make the indexes  
used for lookup increasingly slow as a large amount of data accumulates.

Does this ring a bell?

	-hilmar

On Mar 11, 2008, at 7:08 AM, stephan.rosecker wrote:

> Dear list,
>
> I have started the "bp_load_seqdatabase.pl" script from the  
> "bioperl-db-1.5.2_100" package with the unigene
> "Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS  
> is postgres on a similar machine.
> BioSQL core schema is v1.0.0..
>
> The job runs since friday.
>
> ./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb -- 
> dbuser foo --dbpass bar --driver Pg --format ClusterIO::unigene ../ 
> ncbi/Hs.data
>
> Is it normal that it takes so long?
> What are your experiences?
>
> best regards
> stephan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From vuhlhorn at ramapo.edu  Wed Mar 12 14:51:50 2008
From: vuhlhorn at ramapo.edu (Victoria Lyn Uhlhorn)
Date: Wed, 12 Mar 2008 14:51:50 -0400 (EDT)
Subject: [Bioperl-l] HOWTO:Trees module
Message-ID: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>

The following script:

#!/usr/bin/perl -w

use CGI ':standard';
use Bio::Perl;
use Bio::Align::ProteinStatistics;
use Bio::Tree::DistanceFactory;
use Bio::TreeIO;

print header;

print start_html(-bgcolor=>"pink", -title=>('Phylogenetic Tree'), -style=>{-
src=>$style}, -class=>Ltitle), p(), 'Tree';

print start_form, hr;

my $alnio= Bio::AlignIO->new(-file => '/Users/glitterchix4u/Sites/CGI-
bin/HepatitisSerineProt.clustalw',
                             -format => 'clustalw');
my $profactory = Bio::Tree::DistanceFactory->new(-mnethod => 'NJ');
my $stats = Bio::Align::ProteinStatistics->new;
my $treeout = Bio::TreeIO->new(-format => 'newick');

my $tree;
while(my $aln = $alnio->next_aln)
{
	my $mat = $stats->distance(-method => 'Kimura',
	                           -align => $aln);
	$tree = $profactory->make_tree($mat);
	$treeout->write_tree($tree);
}

#$treeout->print_tree($tree);
print "Tree is: ", $tree->size;

print end_form;
print end_html;

How do I print the tree?  I'm having a hard time printing the tree out.

From bix at sendu.me.uk  Wed Mar 12 19:20:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 12 Mar 2008 23:20:01 +0000
Subject: [Bioperl-l] HOWTO:Trees module
In-Reply-To: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
References: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
Message-ID: <47D86521.1010705@sendu.me.uk>

Victoria Lyn Uhlhorn wrote:
> my $treeout = Bio::TreeIO->new(-format => 'newick');

> 	$treeout->write_tree($tree);

> How do I print the tree?  I'm having a hard time printing the tree out.

Your TreeIO will write its trees to the file or filehandle you give it. 
But you haven't given it one. Give it one are write_tree() will then 
cause the tree to be 'printed' there.

If you want the trees stored in a string so you can print() them, there 
are ways to open a filehandle onto a scalar variable.

From hlapp at gmx.net  Thu Mar 13 18:51:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 18:51:13 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
Message-ID: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>

(this is more of a bioperl question than a biosql one)

The load_ncbi_taxonomy.pl script is designed to update the taxon  
tables in a non-disruptive way, and if there weren't many changes  
shouldn't actually take that long (except that recalculating the  
nested set values may take a couple of minutes).

Bioperl-db will store the taxon information it finds in the  
Bio::Species object if it can't locate the taxon by lookup, and will  
not raise an error. The problem with this is that it relies on the  
Bio::SeqIO parser to have gotten the species and lineage information  
correct, which is sometimes a wrong assumption for exotic species.  
Most often the error will not manifest itself at the time of storing  
the erroneously parsed information, but when it is re-retrieved and  
used to populate a Bio::Species object.

For the SymAtlas project we had this situation (new species in  
sequence updates that the last NCBI taxonomy update hadn't yet  
brought in) quite regularly. I wrote a SQL script would fix those  
'haphazard' additions such that load_ncbi_taxonomy would update them  
to their correct values come the next NCBI taxonomy update. I can  
send you the script (it would be for the Oracle version), but I'm not  
sure this is a widely viable strategy.

	-hilmar

On Mar 13, 2008, at 11:06 AM, Peter wrote:

> Dear list,
>
> One of the unresolved issues with Biopython's BioSQL interface is
> dealing with the NCBI taxon ID when loading sequences into the
> database.
>
> As I understand it, ideally before loading any sequences, the user
> will have loaded in the entire NCBI taxonomy using the
> load_ncbi_taxonomy.pl script, as I described here:
> http://biopython.org/wiki/BioSQL#NCBI_Taxonomy
>
> When a new sequence is added to the database with a known taxon id,
> there is no problem.  But happens if its a recently sequenced organism
> which isn't defined yet in the BioSQL taxonomy tables?  Could/should
> the user re-run load_ncbi_taxonomy.pl, and then load in their new
> sequence?
>
> Right now in Biopython due what appears to have been intended as a
> short term hack, we simple don't record the taxon id at all (!), and I
> would like to fix this (bug 2422).
> http://bugzilla.open-bio.org/show_bug.cgi?id=2422
>
> How do BioPerl et al deal with this issue?  Do they try and update the
> taxonomy tables using the available information in the new record's
> annotation (i.e. the new taxon id and the species name)?  Do they
> lookup the NCBI taxonomy definition via the internet?  Do they throw
> an error and halt?
>
> Thanks,
>
> Peter
> (Biopython)
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Mar 13 19:41:43 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 19:41:43 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
Message-ID: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>


On Mar 13, 2008, at 7:13 PM, Peter wrote:

> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> [...]
>>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>>  tables in a non-disruptive way, and if there weren't many changes
>>  shouldn't actually take that long (except that recalculating the
>>  nested set values may take a couple of minutes).
>
> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> could write some minimal taxonomy entry (without any guess work based
> on the species name), in order to record the sequence's taxon

This is what Bioperl-db does. There isn't any guesswork. If  
Bio::Species has lineage information it will also insert the lineage  
information, though.

> - and then running an improved load_ncbi_taxonomy.pl at a later  
> date would
> sort out the proper taxonomy?

If I remember correctly, the script makes (and hence expects) the  
primary key and the NCBI taxonomy ID to be identical. If your loading  
procedure can achieve that already then load_ncbi_taxonomy.pl should  
pick them up and fix them. You can try that by loading the taxonomy  
through the script, then arbitrarily choose a taxon, create a stub  
bioentry for it and set its taxon_id foreign key to the chosen  
taxon,  change its taxon_name.name to some bogus value (for the  
'scientific name' class, for example) (and feel free to change the  
left_id and right_id values in taxon too), and rerun the script. It  
should fix the change you made, and your bioentry should still point  
to the same taxon (because its primary key did not change, and did  
not get deleted either; otherwise the bioentry would now have a null  
value in the foreign key).

The Bioperl-db way of storing things does not give control over  
primary key assignment to Bioperl-db, so the database will assign it.

> [...]
>>  For the SymAtlas project we had this situation (new species in
>>  sequence updates that the last NCBI taxonomy update hadn't yet
>>  brought in) quite regularly. I wrote a SQL script would fix those
>>  'haphazard' additions such that load_ncbi_taxonomy would update them
>>  to their correct values come the next NCBI taxonomy update. I can
>>  send you the script (it would be for the Oracle version), but I'm  
>> not
>>  sure this is a widely viable strategy.
>
> So this wasn't integrated with load_ncbi_taxonomy.pl at all?

No, but now that you say it I don't see any reason why I couldn't.  
Maybe that's just what I should do.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Thu Mar 13 19:13:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Mar 2008 23:13:32 +0000
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
Message-ID: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>

On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> (this is more of a bioperl question than a biosql one)

Well, yes and no.  And I'm not subscribed to the Bioperl list, nor the
BioJava one, nor the BioRuby one.

>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>  tables in a non-disruptive way, and if there weren't many changes
>  shouldn't actually take that long (except that recalculating the
>  nested set values may take a couple of minutes).

Do you think when faced with a novel taxon id, Biopython/BioPerl/...
could write some minimal taxonomy entry (without any guess work based
on the species name), in order to record the sequence's taxon - and
then running an improved load_ncbi_taxonomy.pl at a later date would
sort out the proper taxonomy?

>  Bioperl-db will store the taxon information it finds in the
>  Bio::Species object if it can't locate the taxon by lookup, and will
>  not raise an error. The problem with this is that it relies on the
>  Bio::SeqIO parser to have gotten the species and lineage information
>  correct, which is sometimes a wrong assumption for exotic species.
>  Most often the error will not manifest itself at the time of storing
>  the erroneously parsed information, but when it is re-retrieved and
>  used to populate a Bio::Species object.

This is what I would like to avoid with Biopython.

>  For the SymAtlas project we had this situation (new species in
>  sequence updates that the last NCBI taxonomy update hadn't yet
>  brought in) quite regularly. I wrote a SQL script would fix those
>  'haphazard' additions such that load_ncbi_taxonomy would update them
>  to their correct values come the next NCBI taxonomy update. I can
>  send you the script (it would be for the Oracle version), but I'm not
>  sure this is a widely viable strategy.

So this wasn't integrated with load_ncbi_taxonomy.pl at all?

Peter

From hlapp at gmx.net  Fri Mar 14 00:00:40 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Mar 2008 00:00:40 -0400
Subject: [Bioperl-l] bioperl basics
In-Reply-To: <20080313.195825.6855.0@webmail20.vgs.untd.com>
References: <20080313.195825.6855.0@webmail20.vgs.untd.com>
Message-ID: <0A104B1F-315F-418C-A6DA-84FF04CC438C@gmx.net>

John - you want to send this to the BioPerl mailing list at
bioperl-l at lists.open-bio.org.

Your problem really is a Perl problem though, not BioPerl. The most  
likely cause is that you don't have the Cache::FileCache module  
installed, so that's what I would do.

The answer to your question for how to change @INC is using -I on the  
command line, 'use lib' in your script, or set the PERL5LIB  
environment variable.

	-hilmar

On Mar 14, 2008, at 1:58 AM, mrphysh at juno.com wrote:

> I am a molecular biologist studying bioinformatics from a Perl  
> background and making progress.  I am realizing that without  
> tapping into the existing infrastructure, I will be writing code  
> for ever.  Bioperl is the path for me.  I am moving forward.
>
> the error I encounter is
>
> can't locate Cache/FileCache in @INC (@INC contains /etc/perl/ /usr/ 
> locaql/lib/perl/5.8.8 .....)    and so forth.
>
> I found the files in a home directory.  I must have told the  
> install to put them there...?
>
>
> anyway:  How do I edit this environmental variable..... @INC.  I  
> cannot find anything in my book.
>
> thanks
> john brigham
>
>
> I will be writing code for years and need to tap into the
> _____________________________________________________________
> Need cash? Click to get an emergency loan, bad credit ok
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> Ioyw6i3mKmyQsg01zMPK1Qa0178ZfajwTEBgEXdzlmb9zLLZc8pLOU/
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From markjschreiber at gmail.com  Fri Mar 14 09:48:38 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 14 Mar 2008 21:48:38 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
Message-ID: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>

>From memory BioJava will add it if it is not already in there. If the
taxid can be found then the system connects you with whatever is in
that taxid, it doesn't overwrite it.

This has two curious side effects. Because the details associated with
a taxid sometimes change (eg common name changes a lot) you can get
connected to an outdated version (if your record is newer than your
NCBI taxonomy) or you can get connected with a version that is newer
than your record which means when you round-trip you don't get
complete identity.

For compatibility across the projects some kind of consensus would be good.

- Mark

On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
>
> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>
> > On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >> [...]
>
> >>  The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>  tables in a non-disruptive way, and if there weren't many changes
> >>  shouldn't actually take that long (except that recalculating the
> >>  nested set values may take a couple of minutes).
> >
> > Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> > could write some minimal taxonomy entry (without any guess work based
> > on the species name), in order to record the sequence's taxon
>
> This is what Bioperl-db does. There isn't any guesswork. If
> Bio::Species has lineage information it will also insert the lineage
> information, though.
>
>
> > - and then running an improved load_ncbi_taxonomy.pl at a later
> > date would
> > sort out the proper taxonomy?
>
> If I remember correctly, the script makes (and hence expects) the
> primary key and the NCBI taxonomy ID to be identical. If your loading
> procedure can achieve that already then load_ncbi_taxonomy.pl should
> pick them up and fix them. You can try that by loading the taxonomy
> through the script, then arbitrarily choose a taxon, create a stub
> bioentry for it and set its taxon_id foreign key to the chosen
> taxon,  change its taxon_name.name to some bogus value (for the
> 'scientific name' class, for example) (and feel free to change the
> left_id and right_id values in taxon too), and rerun the script. It
> should fix the change you made, and your bioentry should still point
> to the same taxon (because its primary key did not change, and did
> not get deleted either; otherwise the bioentry would now have a null
> value in the foreign key).
>
> The Bioperl-db way of storing things does not give control over
> primary key assignment to Bioperl-db, so the database will assign it.
>
> > [...]
>
> >>  For the SymAtlas project we had this situation (new species in
> >>  sequence updates that the last NCBI taxonomy update hadn't yet
> >>  brought in) quite regularly. I wrote a SQL script would fix those
> >>  'haphazard' additions such that load_ncbi_taxonomy would update them
> >>  to their correct values come the next NCBI taxonomy update. I can
> >>  send you the script (it would be for the Oracle version), but I'm
> >> not
> >>  sure this is a widely viable strategy.
> >
> > So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>
> No, but now that you say it I don't see any reason why I couldn't.
> Maybe that's just what I should do.
>
>        -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
>
>
>
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

From cjfields at uiuc.edu  Fri Mar 14 10:31:09 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 14 Mar 2008 09:31:09 -0500
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
Message-ID: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>

The counter to that perspective (using new sequences with old tax  
info) would be to regularly update NCBI taxonomy, particularly in  
circumstances prior to adding new sequences.  Hilmar mentioned that  
once tax is loaded it doesn't take as long to update, so you could set  
up a cron job to update regularly.

I remember someone mentioning weekly or monthly updates on the list  
quite a while ago, but I'm unsure how often NCBI updates tax  
information (i.e. with every release, monthly, weekly, etc).  I can  
see instances popping up where you used the an up-to-date taxonomy but  
a new sequence contains a tax ID not present.  I think bioperl-db  
handles these but I'm not sure what other Bio* do.

chris

On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:

>> From memory BioJava will add it if it is not already in there. If the
> taxid can be found then the system connects you with whatever is in
> that taxid, it doesn't overwrite it.
>
> This has two curious side effects. Because the details associated with
> a taxid sometimes change (eg common name changes a lot) you can get
> connected to an outdated version (if your record is newer than your
> NCBI taxonomy) or you can get connected with a version that is newer
> than your record which means when you round-trip you don't get
> complete identity.
>
> For compatibility across the projects some kind of consensus would  
> be good.
>
> - Mark
> On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>>
>> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>>
>>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>> [...]
>>
>>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
>>>> tables in a non-disruptive way, and if there weren't many changes
>>>> shouldn't actually take that long (except that recalculating the
>>>> nested set values may take a couple of minutes).
>>>
>>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
>>> could write some minimal taxonomy entry (without any guess work  
>>> based
>>> on the species name), in order to record the sequence's taxon
>>
>> This is what Bioperl-db does. There isn't any guesswork. If
>> Bio::Species has lineage information it will also insert the lineage
>> information, though.
>>
>>
>>> - and then running an improved load_ncbi_taxonomy.pl at a later
>>> date would
>>> sort out the proper taxonomy?
>>
>> If I remember correctly, the script makes (and hence expects) the
>> primary key and the NCBI taxonomy ID to be identical. If your loading
>> procedure can achieve that already then load_ncbi_taxonomy.pl should
>> pick them up and fix them. You can try that by loading the taxonomy
>> through the script, then arbitrarily choose a taxon, create a stub
>> bioentry for it and set its taxon_id foreign key to the chosen
>> taxon,  change its taxon_name.name to some bogus value (for the
>> 'scientific name' class, for example) (and feel free to change the
>> left_id and right_id values in taxon too), and rerun the script. It
>> should fix the change you made, and your bioentry should still point
>> to the same taxon (because its primary key did not change, and did
>> not get deleted either; otherwise the bioentry would now have a null
>> value in the foreign key).
>>
>> The Bioperl-db way of storing things does not give control over
>> primary key assignment to Bioperl-db, so the database will assign it.
>>
>>> [...]
>>
>>>> For the SymAtlas project we had this situation (new species in
>>>> sequence updates that the last NCBI taxonomy update hadn't yet
>>>> brought in) quite regularly. I wrote a SQL script would fix those
>>>> 'haphazard' additions such that load_ncbi_taxonomy would update  
>>>> them
>>>> to their correct values come the next NCBI taxonomy update. I can
>>>> send you the script (it would be for the Oracle version), but I'm
>>>> not
>>>> sure this is a widely viable strategy.
>>>
>>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>>
>> No, but now that you say it I don't see any reason why I couldn't.
>> Maybe that's just what I should do.
>>
>>       -hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>>
>>
>>
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From markjschreiber at gmail.com  Fri Mar 14 20:56:37 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 15 Mar 2008 08:56:37 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
Message-ID: <93b45ca50803141756m3d7f022cnb57bd39f37270682@mail.gmail.com>

I agree. A regular update would be best.

Of course if your BioSQL db is limited to one or a few organisms you can
just keep a fragment of the db.

- Mark

On Fri, Mar 14, 2008 at 10:31 PM, Chris Fields <cjfields at uiuc.edu> wrote:

> The counter to that perspective (using new sequences with old tax
> info) would be to regularly update NCBI taxonomy, particularly in
> circumstances prior to adding new sequences.  Hilmar mentioned that
> once tax is loaded it doesn't take as long to update, so you could set
> up a cron job to update regularly.
>
> I remember someone mentioning weekly or monthly updates on the list
> quite a while ago, but I'm unsure how often NCBI updates tax
> information (i.e. with every release, monthly, weekly, etc).  I can
> see instances popping up where you used the an up-to-date taxonomy but
> a new sequence contains a tax ID not present.  I think bioperl-db
> handles these but I'm not sure what other Bio* do.
>
> chris
>
> On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:
>
> >> From memory BioJava will add it if it is not already in there. If the
> > taxid can be found then the system connects you with whatever is in
> > that taxid, it doesn't overwrite it.
> >
> > This has two curious side effects. Because the details associated with
> > a taxid sometimes change (eg common name changes a lot) you can get
> > connected to an outdated version (if your record is newer than your
> > NCBI taxonomy) or you can get connected with a version that is newer
> > than your record which means when you round-trip you don't get
> > complete identity.
> >
> > For compatibility across the projects some kind of consensus would
> > be good.
> >
> > - Mark
> > On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>
> >>
> >> On Mar 13, 2008, at 7:13 PM, Peter wrote:
> >>
> >>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>>> [...]
> >>
> >>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>>> tables in a non-disruptive way, and if there weren't many changes
> >>>> shouldn't actually take that long (except that recalculating the
> >>>> nested set values may take a couple of minutes).
> >>>
> >>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> >>> could write some minimal taxonomy entry (without any guess work
> >>> based
> >>> on the species name), in order to record the sequence's taxon
> >>
> >> This is what Bioperl-db does. There isn't any guesswork. If
> >> Bio::Species has lineage information it will also insert the lineage
> >> information, though.
> >>
> >>
> >>> - and then running an improved load_ncbi_taxonomy.pl at a later
> >>> date would
> >>> sort out the proper taxonomy?
> >>
> >> If I remember correctly, the script makes (and hence expects) the
> >> primary key and the NCBI taxonomy ID to be identical. If your loading
> >> procedure can achieve that already then load_ncbi_taxonomy.pl should
> >> pick them up and fix them. You can try that by loading the taxonomy
> >> through the script, then arbitrarily choose a taxon, create a stub
> >> bioentry for it and set its taxon_id foreign key to the chosen
> >> taxon,  change its taxon_name.name to some bogus value (for the
> >> 'scientific name' class, for example) (and feel free to change the
> >> left_id and right_id values in taxon too), and rerun the script. It
> >> should fix the change you made, and your bioentry should still point
> >> to the same taxon (because its primary key did not change, and did
> >> not get deleted either; otherwise the bioentry would now have a null
> >> value in the foreign key).
> >>
> >> The Bioperl-db way of storing things does not give control over
> >> primary key assignment to Bioperl-db, so the database will assign it.
> >>
> >>> [...]
> >>
> >>>> For the SymAtlas project we had this situation (new species in
> >>>> sequence updates that the last NCBI taxonomy update hadn't yet
> >>>> brought in) quite regularly. I wrote a SQL script would fix those
> >>>> 'haphazard' additions such that load_ncbi_taxonomy would update
> >>>> them
> >>>> to their correct values come the next NCBI taxonomy update. I can
> >>>> send you the script (it would be for the Oracle version), but I'm
> >>>> not
> >>>> sure this is a widely viable strategy.
> >>>
> >>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
> >>
> >> No, but now that you say it I don't see any reason why I couldn't.
> >> Maybe that's just what I should do.
> >>
> >>       -hilmar
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >>
> >> BioSQL-l mailing list
> >> BioSQL-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From charles-listes+bioperl at plessy.org  Mon Mar 17 00:13:11 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Mon, 17 Mar 2008 13:13:11 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
Message-ID: <20080317041311.GA3784@kunpuu.plessy.org>

Dear Bioperl developpers,

In the Debian Project, we distribute packages for Bioperl and need to
express their dependancy to other Perl modules with "Depends",
"Recommends" and "Suggests" levels. For the moment, everything that is
listed in the "recommends" hash of Build.PL is "Recommended" by our
Debian package. This means that they will be installed by default when
installing Bioperl, but that users can force their removal if needed.

Being "Recommended" also means in Debian that if the recommended module
is not available, then the Debian bioperl package will not reach our
internal quality criteria for being part of our stable release.
Therefore I would like to know if you think that some of the modules
recommeded by Bioperl through the "recommends" hash of Build.PL are less
important than others, i.e. that we can just "Suggest" them in our
dependancy system. "Suggested" packages are not installed by default.

The complete definition of the meaning of "Depends", "Recommends" and
"Suggests" for Debian packages can be found in the section 7.2 of the
Debian policy:

http://www.debian.org/doc/debian-policy/ch-relationships.html

Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
using 1.5.2 in its next stable release. We welcome your comments on this
as well.

The Debian package for Bioperl 1.4: http://packages.debian.org/lenny/bioperl
and for Bioperl 1.5.2: http://packages.debian.org/sid/bioperl

(A copy of this email has been sent to the mailing list of the
Debian-Med project).

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wak?, Saitama, Japan

From David.Messina at sbc.su.se  Mon Mar 17 11:38:28 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 17 Mar 2008 16:38:28 +0100
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <20080317041311.GA3784@kunpuu.plessy.org>
References: <20080317041311.GA3784@kunpuu.plessy.org>
Message-ID: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>

Hi Charles,

Thanks for your note.

All of the BioPerl 'recommended' modules involve optional functionality, so
I would think all of them would map to 'suggested' under Debian so they
won't be installed by default.

For everyone else, this is the list of recommended modules he's talking
about:
Ace
Class::AutoClass
Clone
Convert::Binary::C
Data::Stag::XMLWriter
GD
GD::SVG
Graph
HTML::Entities
HTML::Parser
HTTP::Request::Common
LWP::UserAgent
PostScript::TextBlock
Set::Scalar
SOAP::Lite
Spreadsheet::ParseExcel
Storable
SVG
SVG::Graph
Text::Shellwords
URI::Escape
XML::DOM::XPath
XML::Parser
XML::Parser::PerlSAX
XML::SAX
XML::SAX::Writer
XML::Twig
XML::Writer


Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
> using 1.5.2 in its next stable release. We welcome your comments on this
> as well.
>

I think the consensus here would be that 1.5.2 is the appropriate version of
Bioperl to use in the next stable release of Debian. Although we've started
to work toward Bioperl 1.6, that release will be at least a few months off,
and 1.4, while technically our most recent 'stable' release, is waaay out of
date.


Dave

From mrphysh at juno.com  Mon Mar 17 18:27:21 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Mon, 17 Mar 2008 22:27:21 GMT
Subject: [Bioperl-l] bioperl email list
Message-ID: <20080317.162721.27257.1@webmail19.vgs.untd.com>

Hello bioperl people.

I am a Perl programmer/molecular biologist/nice guy.  I am wandering around within the bioinformatics arena and making progress.

I am realizing that I will be writing code forever unless I can tap into the existing infrastructure.  For me that appears to be bioperl.

I would like to be part of the bioperl community.  I subscribed to the bioperl list and got a return email, but have never received an email.  Did I do something wrong?  Could you look into this please?

The truth is:  these object are blowing me away and I need help.

John S. Brigham
13810 Braun Drive
Golden, Colorado 80401
303-216-0994
mrphysh2juno.com 
_____________________________________________________________
Make money while staying at home.  Click here for information on top-notch home businesses.
http://thirdpartyoffers.juno.com/TGL2121/fc/Ioyw6i3l5e2nlxbjZebFePkm5lBTaKaRlgHsk8Xt4yjn3c9lhQhoRW/


From hlapp at gmx.net  Mon Mar 17 23:44:11 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 17 Mar 2008 23:44:11 -0400
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
Message-ID: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>

On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:

> Hi Charles,
>
> Thanks for your note.
>
> All of the BioPerl 'recommended' modules involve optional  
> functionality, so
> I would think all of them would map to 'suggested' under Debian so  
> they
> won't be installed by default.

I would probably elevate LWP to 'recommended.' Other than that I agree.

	-hilmar

>
> For everyone else, this is the list of recommended modules he's  
> talking
> about:
> Ace
> Class::AutoClass
> Clone
> Convert::Binary::C
> Data::Stag::XMLWriter
> GD
> GD::SVG
> Graph
> HTML::Entities
> HTML::Parser
> HTTP::Request::Common
> LWP::UserAgent
> PostScript::TextBlock
> Set::Scalar
> SOAP::Lite
> Spreadsheet::ParseExcel
> Storable
> SVG
> SVG::Graph
> Text::Shellwords
> URI::Escape
> XML::DOM::XPath
> XML::Parser
> XML::Parser::PerlSAX
> XML::SAX
> XML::SAX::Writer
> XML::Twig
> XML::Writer
>
>
>
> Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
>> using 1.5.2 in its next stable release. We welcome your comments  
>> on this
>> as well.
>>
>
> I think the consensus here would be that 1.5.2 is the appropriate  
> version of
> Bioperl to use in the next stable release of Debian. Although we've  
> started
> to work toward Bioperl 1.6, that release will be at least a few  
> months off,
> and 1.4, while technically our most recent 'stable' release, is  
> waaay out of
> date.
>
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Mar 18 05:29:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 09:29:10 +0000
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
References: <20080317041311.GA3784@kunpuu.plessy.org>	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
Message-ID: <47DF8B66.8020509@sendu.me.uk>

Hilmar Lapp wrote:
> On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:
> 
>> Hi Charles,
>>
>> Thanks for your note.
>>
>> All of the BioPerl 'recommended' modules involve optional 
>> functionality, so
>> I would think all of them would map to 'suggested' under Debian so they
>> won't be installed by default.
> 
> I would probably elevate LWP to 'recommended.' Other than that I agree.

I looked at the most used external modules. Used 6 times or more:

   Data::Dumper => used 55 times
   Carp => used 51 times
   IO::String => used 25 times
   Symbol => used 19 times
   File::Spec => used 17 times
   HTTP::Request::Common => used 17 times
   POSIX => used 12 times
   DB_File => used 11 times
   Fcntl => used 11 times
   IO::File => used 11 times
   Exporter => used 10 times
   File::Temp => used 9 times
   Dumpvalue => used 8 times
   LWP::UserAgent => used 8 times
   Scalar::Util => used 8 times
   URI::Escape => used 8 times
   File::Basename => used 6 times
   File::Path => used 6 times
   XML::Writer => used 6 times

I can never remember how to figure out which of those is included with 
perl 5.6.1.

Except maybe XML::Writer, if we do want to promote anything to 
recommended, I suppose it would be those above.

I also agree with everything Dave said; if it's easier everything can be 
'suggested'.


(I reckon most if not all of the Data::Dumper and Carp usages should be 
removed)

From David.Messina at sbc.su.se  Tue Mar 18 10:30:02 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 15:30:02 +0100
Subject: [Bioperl-l] bioperl email list
In-Reply-To: <20080317.162721.27257.1@webmail19.vgs.untd.com>
References: <20080317.162721.27257.1@webmail19.vgs.untd.com>
Message-ID: <628aabb70803180730g2aae7ae0u18a9e6a360c725fa@mail.gmail.com>

Hi John,

Welcome to BioPerl!


>  I subscribed to the bioperl list and got a return email, but have never
> received an email.


As I'm sure you know, most of the time when email doesn't show up, it's
because it's been filtered as spam. I assuming you looked for this already
though.

If that's not it, then I suggest trying to log in to the mailing list server
here:
http://bioperl.org/mailman/listinfo/bioperl-l

Log in by entering your email address (the one you subscribed with) in the
last field on that page and clicking the "Unsubscribe or edit options"
button.

On the next page that comes up, type your password in the first field on the
page.

If for some reason your haven't been subscribed to the list properly, then
you will get an error here. Otherwise, you will be taken to your membership
configuration page. There you can verify. among other options, that mail
delivery is enabled.


The truth is:  these object are blowing me away and I need help.


BioPerl does have a bit of a learning curve, but fortunately there are some
good tutorials that should help you to get started. If you haven't already,
visit the HOWTO section of bioperl.org. Check out the one on BioPerl for
beginners, and then you might follow up with the SeqIO and SearchIO HOWTOs
which cover how to read and write sequences and sequence alignment program
output.

Also, there's lots of great example code in the examples folder of the
BioPerl distribution. I find looking at how other people use BioPerl is very
helpful in understanding what objects are used for what.

Finally, I'll plug the BioPerl Deobfuscator, which is a class browser for
BioPerl and available at: http://bioperl.org/cgi-bin/deob_interface.cgi

BioPerl classes tend to have multlple levels of inheritance, and the
Deobfuscator lets you see all of the methods available to objects of a given
class.


Dave

From bix at sendu.me.uk  Tue Mar 18 11:32:25 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 15:32:25 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
Message-ID: <47DFE089.1070304@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
>> Or is the split intended to be 'core' == "anything and everything
>> that was in 1.4", '????' == "everything else"? In which case,
>> what's a good name for "modules created after 1.4"? 'crust'? ;)
> 
> Nah, "icing".
> 
> a module "use" map might be very useful to help identify "core" vs.
> other layers of mantle/crust/icing.
> 
> http://www.perlmonks.org/?node_id=87329 
> http://search.cpan.org/src/NEILB/pmusage-1.2/

Thanks for those. Neither could quite cope with BioPerl, but I've munged
them together and hacked up 'module_usage.pl' which I've just committed
to the maintenance directory of bioperl-live.

module_usage.pl ../Bio

Produces:
  *warning, may crash your browser; download it and view in a dedicated
image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg
http://bix.sendu.me.uk/files/module_usage.txt

First I considered what modules each BioPerl package (aka class, module)
'uses' (what modules does it load via 'use', 'require' or inherit from
via 'use base', excluding external (non-BioPerl) modules), then grouped
together packages that have identical usage. The graph shows all the
groups with more than one member as nodes and edges from them pointing
to the individual packages that they use. The set of those individual
packages pointed to by groups also have edges showing their
use-relationship to other members of the set (only). Members of the set
are also shaded in red. The saturation of the shade indicates how many
packages use that package (so dark red packages are used a lot).

(I had to simplify in this way because otherwise GraphViz bailed on me.
If anyone can come with nicer simplification/visualisation systems,
please do! It's important to note that there is lots of information loss
in my scheme, so you can't rely on the graph alone.)

Getting to the question on how to decide what is 'core' and on what
basis to split things up, first consider the darker red packages. Next
consider how many groups point to it. Finally consider the membership of
those groups: are they all highly related, or are they from different
'parts' of BioPerl?

For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
pointing to it, but all the members of those groups are
Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
Bio::Graphics?) could be split off cleanly if desired and not kept in
core. Bio::SimpleAlign, on the other hand, whilst not being quite as
dark a red, has 7 attached groups with members from Bio::AlignIO,
Bio::Search and Bio::Tools. You could easily argue it is more
fundamental to BioPerl and should be in core. In turn, the things that
Bio::SimpleAlign points to would also have to be in core.

I haven't done any full analysis along these lines and leave as an
exercise for the interested reader for now ;)


Chris Fields wrote:
> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> 
> I'm pretty flexible on any of that; it's a proposal only and I think
> some of it may be wrongheaded, but hey, I'm willing to take a few
> rotten tomatoes.  The key issue is we should try to work out what we
> mean by 'core' or the core library.  I have a rather extreme view of
> it as being the bare essentials without external, non-perl core
> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> and required modules for those classes) but I'm sure others would
> lump in parsers, DB functionality, etc.  I basically suggest placing
> those (and any stable but potentially non-core code) in a
> 'bioperl-main', with any unstable or untested code going into a
> 'bioperl-unstable'.

My thoughts are along these lines:
# I agree that core should have no external dependencies
# I agree that it might mostly be interfaces
# It should represent a framework with all the interfaces (that have
   stable APIs), directory structure and base classes that everything
   else relies on
# It might not do much useful bioinformatics, but provides just about
   everything needed for a dev to create a new module that does


> In essence, bioperl-main would require core and resemble a stable
> release; bioperl-unstable would require bioperl-main (and core) and
> resemble a dev release.  Not sure how versioning would go or if this
> is a viable option at all, but it's worth discussing.

# I agree that this 3-way split seems reasonable
# bioperl-main would consist primarily of the 'leaves' of the module
   tree, mostly parsers and the like which, whilst 'stable' and tested
   should still be split away from core because the data sources they
   parse could change format slightly
# bioperl-unstable, better bioperl-bleed, would feature brand-new
   stuff, be it new parsers for totally new formats, new APIs that do
   something not thought of before etc. When they are complete, bug-free
   and have stood the test of time they get moved into bioperl-main.
   (It is not a place for all new commits; bug fixes to something in
   bioperl-main would be committed to bioperl-main)
# The current splits (bioperl-run, bioperl-network etc.) do not get
   their own core and bleed variant. Anything they need for core
   functionality would enter the single bioperl-core, anything new
   would enter the single bioperl-bleed, and anything stable would
   be in their own bioperl-[package]

Discuss :)

From snoze.pa at gmail.com  Tue Mar 18 14:27:47 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 13:27:47 -0500
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
Message-ID: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>

Thanks hilmar. I am still wondering if my old problem was fixed. It is
related to when NR databases mixes files from different databases.

On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org> wrote:

> Ditto. Thanks for biting the bullet, Hilmar.
>
> BTW, I put together a little compendium/review of various bioinformatics
> data models a few months ago, where I mention BioSQL among others, but I
> never really announced it:
>
> http://biodatamodel.org/
>
> It thought about wikifying it to get the community involved in maintaining
> it, but haven't gotten around to it yet.
>
> Feedback is welcome.
>
> Cheers,
> Steve
>
> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>
> > Same here.  Great news!
> >
> > chris
> >
> > On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
> >
> > > BIOSQL V1.0.0 RELEASED
> > > http://news.open-bio.org/archives/2008_03.html#000094
> > >
> > >
> > > Congratulations, Hilmar!
> > >
> > >               -Heikki
> > >
> > > --
> > > ______ _/      _/_____________________________________________________
> > >      _/      _/
> > >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> > >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> > >  _/  _/  _/  University of Western Cape, South Africa
> > >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From darin.london at duke.edu  Tue Mar 18 14:16:58 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 18 Mar 2008 13:16:58 -0500
Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200803181816.m2IIGwOL007248@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. Many Open Source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation and dissemination of new computational methods as well as informatics infrastructure. Since the work of the Open Source Bioinformatics Community represents some of the most cutting edge of Bioinformatics in general, the overall theme for the conference this year is "Tackling Hard Problems with Emerging Technologies". Topics under this umbrella include cyberinfrastructure, grid computing and workflow management and discovery, and visualization. We will also have a series of update talks about the main Open Source Bioinformatics Software suites.

One of the hallmarks of BOSC is the coming together of the open source developer community in one location. A face-to-face meeting of this community creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done.

This year, BOSC is accepting abstract submissions on the conference theme "Tackling Hard Problems with Emerging Technologies". The conference theme reflects that there are new technologies emerging on both the scientific front (new sequencing technologies, etc.) and the IT front (workflows, mashup/web 2.0, improvements in all of the major programming languages, etc.), which may allow the open source community to solve problems that were previously intractable. Abstracts may be submitted for the following topics.

1. Cyberinfrastructure - We are interested in presentations on topics dealing with the development of infrastructure on the web to facilitate software and data re-use (mashups, or traditional), interoperability and inter-process communication, system/service discovery, and data movement and modeling in distributed systems. This may include peer-to-peer systems of data transfer, Web Services, various flavors of data representation (SOAP, JSON, XML, others), and technologies commonly referred to under the Web 2.0 paradigm (e.g. folksonomies/tagging, user-based content generation, content feeds, and Social Networking).

2. Grid Computing and Workflow Management and Discovery - We particularly invite talks that report progress in making workflow systems easier to use and on how to do distributed-collaborative research , e.g. workflows that encompass the coordination of systems running in different parts of the world.

3. Visualization - Visualization is a maturing area of open source software development. We particularly invite talks that demonstrate innovative visualization systems in the context of workflows.

4. Open Source Software - Speakers will present talks on the use, development, or philosophy of open source software in bioinformatics.

5. Bio* Open Source Project Updates - We invite abstracts from the representatives of the open source projects sponsored by or affiliated to the O|B|F (see Projects).


Please consult the official BOSC 2008 website at http://www.open-bio.org/wiki/Upcoming_BOSC_conference  for all updates and extra information.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From hlapp at gmx.net  Tue Mar 18 15:07:54 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 18 Mar 2008 15:07:54 -0400
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
	<10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
Message-ID: <0B1635C3-0BD4-449C-9C52-FD8D07E9D669@gmx.net>

Can you point me to the bug report or a mailing list thread?

	-hilmar

On Mar 18, 2008, at 2:27 PM, snoze pa wrote:

> Thanks hilmar. I am still wondering if my old problem was fixed. It is
> related to when NR databases mixes files from different databases.
>
> On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org>  
> wrote:
>
>> Ditto. Thanks for biting the bullet, Hilmar.
>>
>> BTW, I put together a little compendium/review of various  
>> bioinformatics
>> data models a few months ago, where I mention BioSQL among others,  
>> but I
>> never really announced it:
>>
>> http://biodatamodel.org/
>>
>> It thought about wikifying it to get the community involved in  
>> maintaining
>> it, but haven't gotten around to it yet.
>>
>> Feedback is welcome.
>>
>> Cheers,
>> Steve
>>
>> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu>  
>> wrote:
>>
>>> Same here.  Great news!
>>>
>>> chris
>>>
>>> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>>>
>>>> BIOSQL V1.0.0 RELEASED
>>>> http://news.open-bio.org/archives/2008_03.html#000094
>>>>
>>>>
>>>> Congratulations, Hilmar!
>>>>
>>>>               -Heikki
>>>>
>>>> --
>>>> ______ _/      _/ 
>>>> _____________________________________________________
>>>>      _/      _/
>>>>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>>>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>>>   _/  _/  _/  SANBI, South African National Bioinformatics  
>>>> Institute
>>>>  _/  _/  _/  University of Western Cape, South Africa
>>>>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>>>> ___ _/_/_/_/_/ 
>>>> ________________________________________________________
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From snoze.pa at gmail.com  Tue Mar 18 16:33:08 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 15:33:08 -0500
Subject: [Bioperl-l] NCBI taxonomy database
Message-ID: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>

Dear Users,
 How can i use NCBI taxonomy database in bioperl? any suggestions!!!
thanks in advance
s

From aaron.j.mackey at gsk.com  Tue Mar 18 12:23:41 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 18 Mar 2008 12:23:41 -0400
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
Message-ID: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>

Very cool.  I can envision this being printed as a laminated poster to put 
up next to the periodic table of Perl Elements (
http://www.ozonehouse.com/mark/blog/code/PeriodicTable.html)

One GraphViz trick you could try would be to group Bio::X::* (nodes and 
your collection groups sharing common Bio::X:: prefixes) together as 
subgraphs; that should quickly show you which edges go outside of the 
various "domains", and which are entirely self contained.

you could also try to distinguish "use base" relationships (i.e. 
inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
Bio::X" wrapped in an eval (optional use if available) by various edge 
colorings -- this might help to further break things up if we can guess at 
the intended "use" of any Bio::X by Bio::Y.

-Aaron

"Sendu Bala" <bix at sendu.me.uk> wrote on 03/18/2008 11:32:25 AM:

> aaron.j.mackey at gsk.com wrote:
> >> Or is the split intended to be 'core' == "anything and everything
> >> that was in 1.4", '????' == "everything else"? In which case,
> >> what's a good name for "modules created after 1.4"? 'crust'? ;)
> > 
> > Nah, "icing".
> > 
> > a module "use" map might be very useful to help identify "core" vs.
> > other layers of mantle/crust/icing.
> > 
> > http://www.perlmonks.org/?node_id=87329 
> > http://search.cpan.org/src/NEILB/pmusage-1.2/
> 
> Thanks for those. Neither could quite cope with BioPerl, but I've munged
> them together and hacked up 'module_usage.pl' which I've just committed
> to the maintenance directory of bioperl-live.
> 
> module_usage.pl ../Bio
> 
> Produces:
>   *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
> 
> First I considered what modules each BioPerl package (aka class, module)
> 'uses' (what modules does it load via 'use', 'require' or inherit from
> via 'use base', excluding external (non-BioPerl) modules), then grouped
> together packages that have identical usage. The graph shows all the
> groups with more than one member as nodes and edges from them pointing
> to the individual packages that they use. The set of those individual
> packages pointed to by groups also have edges showing their
> use-relationship to other members of the set (only). Members of the set
> are also shaded in red. The saturation of the shade indicates how many
> packages use that package (so dark red packages are used a lot).
> 
> (I had to simplify in this way because otherwise GraphViz bailed on me.
> If anyone can come with nicer simplification/visualisation systems,
> please do! It's important to note that there is lots of information loss
> in my scheme, so you can't rely on the graph alone.)
> 
> Getting to the question on how to decide what is 'core' and on what
> basis to split things up, first consider the darker red packages. Next
> consider how many groups point to it. Finally consider the membership of
> those groups: are they all highly related, or are they from different
> 'parts' of BioPerl?
> 
> For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
> pointing to it, but all the members of those groups are
> Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
> Bio::Graphics?) could be split off cleanly if desired and not kept in
> core. Bio::SimpleAlign, on the other hand, whilst not being quite as
> dark a red, has 7 attached groups with members from Bio::AlignIO,
> Bio::Search and Bio::Tools. You could easily argue it is more
> fundamental to BioPerl and should be in core. In turn, the things that
> Bio::SimpleAlign points to would also have to be in core.
> 
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)
> 
> 
> Chris Fields wrote:
> > http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> > 
> > I'm pretty flexible on any of that; it's a proposal only and I think
> > some of it may be wrongheaded, but hey, I'm willing to take a few
> > rotten tomatoes.  The key issue is we should try to work out what we
> > mean by 'core' or the core library.  I have a rather extreme view of
> > it as being the bare essentials without external, non-perl core
> > dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> > and required modules for those classes) but I'm sure others would
> > lump in parsers, DB functionality, etc.  I basically suggest placing
> > those (and any stable but potentially non-core code) in a
> > 'bioperl-main', with any unstable or untested code going into a
> > 'bioperl-unstable'.
> 
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>    stable APIs), directory structure and base classes that everything
>    else relies on
> # It might not do much useful bioinformatics, but provides just about
>    everything needed for a dev to create a new module that does
> 
> 
> > In essence, bioperl-main would require core and resemble a stable
> > release; bioperl-unstable would require bioperl-main (and core) and
> > resemble a dev release.  Not sure how versioning would go or if this
> > is a viable option at all, but it's worth discussing.
> 
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>    tree, mostly parsers and the like which, whilst 'stable' and tested
>    should still be split away from core because the data sources they
>    parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>    stuff, be it new parsers for totally new formats, new APIs that do
>    something not thought of before etc. When they are complete, bug-free
>    and have stood the test of time they get moved into bioperl-main.
>    (It is not a place for all new commits; bug fixes to something in
>    bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>    their own core and bleed variant. Anything they need for core
>    functionality would enter the single bioperl-core, anything new
>    would enter the single bioperl-bleed, and anything stable would
>    be in their own bioperl-[package]
> 
> Discuss :)
> 


From David.Messina at sbc.su.se  Tue Mar 18 17:23:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 22:23:18 +0100
Subject: [Bioperl-l] NCBI taxonomy database
In-Reply-To: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
References: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
Message-ID: <628aabb70803181423g305db155r9d66c114f38c64b6@mail.gmail.com>

Hi snoze,

I think you will want to take a look at the docs for the Bio::Taxon module
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html


and these scripts:
scripts/taxa/local_taxonomydb_query.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/local_taxonomydb_query.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Script
that accesses a local taxonomy database and retrieves species or
TaxonIDs<http://www.bioperl.org/wiki/TaxonID>.
 scripts/taxa/query_entrez_taxa.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/query_entrez_taxa.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Demonstrate
how to retrieve the NCBI <http://www.bioperl.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species. Also
retrieve
TaxonID <http://www.bioperl.org/wiki/TaxonID> for a given accession
number<http://www.bioperl.org/wiki/Accession_number>.
 scripts/taxa/taxid4species.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/taxid4species.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Retrieve
the NCBI <http://en.wikipedia.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species.

Dave

From alexl at users.sourceforge.net  Wed Mar 19 04:32:38 2008
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 19 Mar 2008 01:32:38 -0700
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk> (Sendu Bala's message of "Tue\,
	18 Mar 2008 15\:32\:25 +0000")
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <42eja7azbt.fsf@allele2.eebweb.arizona.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> # I agree that this 3-way split seems reasonable # bioperl-main
SB> would consist primarily of the 'leaves' of the module tree, mostly
SB> parsers and the like which, whilst 'stable' and tested should
SB> still be split away from core because the data sources they parse
SB> could change format slightly # bioperl-unstable, better
SB> bioperl-bleed, would feature brand-new stuff, be it new parsers
SB> for totally new formats, new APIs that do something not thought of
SB> before etc. When they are complete, bug-free and have stood the
SB> test of time they get moved into bioperl-main.  (It is not a place
SB> for all new commits; bug fixes to something in bioperl-main would
SB> be committed to bioperl-main) # The current splits (bioperl-run,
SB> bioperl-network etc.) do not get their own core and bleed
SB> variant. Anything they need for core functionality would enter the
SB> single bioperl-core, anything new would enter the single
SB> bioperl-bleed, and anything stable would be in their own
SB> bioperl-[package]

SB> Discuss :)

While on the subject of how to split up the bioperl package, spare a
thought for upstream package maintainers.  The Fedora package for the
bioperl "core" that I now maintain is currently a single package which
makes it easy to get reviewed, included in the distribution and
updated/maintained.  (bioperl-run is a separate package).

While I agree that bioperl is now perhaps a little too monolithic, I
thinking splitting it up in a too fine-grained manner like CPAN might
go too far the other way.  For Fedora, each package would then need to
be reviewed and updated separately.  Similar issues might apply for
other distros (such as Debian/Ubuntu).

I think something similar to the three-way split proposed sounds like
a good compromise, so long as everything that a "basic" user of
Bioperl can install most of the functionality in the current "bioperl"
package in (at most) 2-3 packages.  

One model to look at might be the gstreamer model which has a "core"
(gstreamer) and "gstreamer-plugins-base", "gstreamer-plugins-good",
"gstreamer-plugins-bad" and "gstreamer-plugins-ugly" modules for
plugins, see:

http://gstreamer.net/

Alex

From charles-listes+bioperl at plessy.org  Wed Mar 19 06:01:59 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 19 Mar 2008 19:01:59 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <47DF8B66.8020509@sendu.me.uk>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
	<47DF8B66.8020509@sendu.me.uk>
Message-ID: <20080319100159.GD29304@kunpuu.plessy.org>

Le Tue, Mar 18, 2008 at 09:29:10AM +0000, Sendu Bala a ?crit :
> 
> I looked at the most used external modules. Used 6 times or more:
> 
>   Data::Dumper => used 55 times
>   Carp => used 51 times
>   IO::String => used 25 times
>   Symbol => used 19 times
>   File::Spec => used 17 times
>   HTTP::Request::Common => used 17 times
>   POSIX => used 12 times
>   DB_File => used 11 times
>   Fcntl => used 11 times
>   IO::File => used 11 times
>   Exporter => used 10 times
>   File::Temp => used 9 times
>   Dumpvalue => used 8 times
>   LWP::UserAgent => used 8 times
>   Scalar::Util => used 8 times
>   URI::Escape => used 8 times
>   File::Basename => used 6 times
>   File::Path => used 6 times
>   XML::Writer => used 6 times

Dear Sendu,

thanks a lot for this analysis ! We will downgrade all modules except
those you listed to the priority 'Suggested'. In terms of Debian package,
it means keeping only libio-string-perl, libwww-perl, liburi-perl and
libxml-writer-perl in our 'Recommends' field, as the others are provided
by our perl package itself.

Thanks a lot for the advice,

-- 
Charles Plessy
Debian-Med packaging team
Wak?, Saitama, Japan

From bix at sendu.me.uk  Wed Mar 19 09:27:11 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 19 Mar 2008 13:27:11 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
References: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
Message-ID: <47E114AF.8030801@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
> One GraphViz trick you could try would be to group Bio::X::* (nodes and 
> your collection groups sharing common Bio::X:: prefixes) together as 
> subgraphs; that should quickly show you which edges go outside of the 
> various "domains", and which are entirely self contained.

Not quite sure if I used the 'trick' you were thinking of, but I now 
'cluster' them as you describe. It's no longer quite as attractively 
proportioned, but I suppose it's more useful :)


> you could also try to distinguish "use base" relationships (i.e. 
> inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
> Bio::X" wrapped in an eval (optional use if available) by various edge 
> colorings -- this might help to further break things up if we can guess at 
> the intended "use" of any Bio::X by Bio::Y.

I haven't distinguished the eval require cases, but now edges are green 
for inheritance and blue for use/require.


I updated the jpeg:
*warning, may crash your browser; download it and view in a dedicated
  image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg


If someone wants to mess with the script so it will output a sane ps 
file for conversion to pdf, please do so. I can't figure out how to get 
it to work correctly.

From Jorge.DUARTE at biogemma.com  Wed Mar 19 11:32:44 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:32:44 +0100
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
Message-ID: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>

Dear Bioperl-users,

could someone give me a hint on how to find SNPs in alignments using 
bioperl objects ?

I found several modules capable of representing Sequence Variations,
but could not understand how to go from an "Align" object to a "Variation" 
object.

Any help would be much appreciated,

Thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com

From avilella at gmail.com  Wed Mar 19 12:59:47 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 19 Mar 2008 16:59:47 +0000
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
In-Reply-To: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
References: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
Message-ID: <358f4d650803190959h744f33f5ha345897565a071b0@mail.gmail.com>

Hi Jorge,

(good to see an ex-EBI in the bioperl-ml :-) )

You can use the method aln_to_population in Bio::PopGen::Utilities:

my $pop = Bio::PopGen::Utilities->aln_to_population($aln);

http://www.bioperl.org/wiki/HOWTO:PopGen#Allele_data_from_Alignments_using_Bio::AlignIO_and_Bio::PopGen::Utilities

Cheers,

    Albert.

On Wed, Mar 19, 2008 at 3:32 PM,  <Jorge.DUARTE at biogemma.com> wrote:
> Dear Bioperl-users,
>
>  could someone give me a hint on how to find SNPs in alignments using
>  bioperl objects ?
>
>  I found several modules capable of representing Sequence Variations,
>  but could not understand how to go from an "Align" object to a "Variation"
>  object.
>
>  Any help would be much appreciated,
>
>  Thanks,
>
>  Jorge.
>
>  ---
>  Jorge Duarte
>  Bioinformatics Software Engineer
>  BIOGEMMA
>  Z.I. Du Br?zet
>  8, Rue des Fr?res Lumi?re
>  63028 CLERMONT FERRAND Cedex 2
>  FRANCE
>  Tel : +33 (0)4 73 39 60 73
>  Fax : +33 (0)4 73 39 60 71
>  E-mail : jorge.duarte at biogemma.com
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jorge.DUARTE at biogemma.com  Wed Mar 19 11:22:57 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:22:57 +0100
Subject: [Bioperl-l] Using Bioperl book
Message-ID: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>

Hello,

i just found on amazon something about a book "Using Bioperl", published 
on the 1st of March 2008 but which is no more available.

Does anyone know how to get it ?

Many thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com

*****************************************************************
       Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
         dans les destinataires lors du premier contact
*****************************************************************

From jason at bioperl.org  Wed Mar 19 13:54:16 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Mar 2008 10:54:16 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>

it's probably more than 6 months out. We still haven't finished  
writing it as life and work continues to intrude on book writing.

-jason
On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:

> Hello,
>
> i just found on amazon something about a book "Using Bioperl",  
> published
> on the 1st of March 2008 but which is no more available.
>
> Does anyone know how to get it ?
>
> Many thanks,
>
> Jorge.
>
> ---
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
>
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From smarkel at accelrys.com  Wed Mar 19 13:53:53 2008
From: smarkel at accelrys.com (Scott Markel)
Date: Wed, 19 Mar 2008 10:53:53 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <OF4972A434.60C53CF4-ON88257411.006214B5-88257411.00625152@accelrys.com>

Jorge,

This is a book that Jason Stajich, Ewan Birney, and I are writing.
We're behind.  So it's not that the book is no longer available,
but that it's not yet available.  Hopefully later this year or
early in 2009.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Board of Directors: International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


bioperl-l-bounces at lists.open-bio.org wrote on 19.03.2008 08:22:57:

> Hello,
> 
> i just found on amazon something about a book "Using Bioperl", published 

> on the 1st of March 2008 but which is no more available.
> 
> Does anyone know how to get it ?
> 
> Many thanks,
> 
> Jorge.
> 
> --- 
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
> 
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From me at hongyu.org  Thu Mar 20 14:54:53 2008
From: me at hongyu.org (Hongyu Zhang)
Date: Thu, 20 Mar 2008 11:54:53 -0700 (PDT)
Subject: [Bioperl-l] Bio::DB::GenBank module
Message-ID: <501654.68882.qm@web51412.mail.re2.yahoo.com>

Dear all,

It seems that some of the important methods in Bio::DB::GenBank module was discontinued right now, such as get_Seq_by_acc(). The corresponding methods have empty content underneath its names. How come?
 
Best,

Hongyu Zhang, Ph.D.
 Ceres Inc., Thousand Oaks, CA
Cell: 805-405-5394
Fax: 866-447-8750


From joseph.fass at gmail.com  Thu Mar 20 18:10:33 2008
From: joseph.fass at gmail.com (Joseph Fass)
Date: Thu, 20 Mar 2008 15:10:33 -0700
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality?
Message-ID: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>

I've written code to trim a certain number of bases (and, possibly,
associated qualities) from fasta (or fastq) format sequences, using:

$seq->seq($seq->subseq($a+1,$len-$b));
and, if it's fastq:
$seq->qual($seq->subqual($a+1,$len-$b));
where:
$len = $seq->length; # defined before changing $seq->seq
$a is the number of bases to trim off the beginning of the sequence
$b is the number of bases to trim off the end of the sequence

The code works for sequences, but for qualities I get a trimmed series of
quality characters that is the correct length and is at the correct
position, but has a number of characters (equal to $a) at the *end* of the
series changed to '!' ... i.e.:

@fake header 1
tcggacaatatatat
+
fjasfiojeq%!@%@

becomes:

@fake header 1 trimmed by 4 at beginning and 3 at end
acaatata
+fake header 1 trimmed by 4 at beginning and 3 at end
fioj!!!!

Since the relevant section of code is short, I'll post it:

my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format);
my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format);
my $seq_length;
while (my $seq = $in->next_seq()) {
  $seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at
end");
  $seq_length = $seq->length;
  $seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e));
  if ($format eq 'fastq') { # if fastq, trim qualities then write out in
fastq format
    $seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e));
    $out->write_fastq($seq); }
  else {$out->write_seq($seq);} # just write out sequence in fasta format
}

Why should the same process work for ->seq and ->subseq but not ->qual and
->subqual?  Please enlighten me ...


-- 
Joseph Fass
jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional)
970.227.5928 (c) || 530.752.2698 (w)

From hlapp at gmx.net  Thu Mar 20 18:49:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 18:49:41 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
References: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
Message-ID: <0F80B40B-0232-4367-8433-992588B6E71B@gmx.net>

Hi Erik, thanks for the report. Given the error message, it looks  
more like the integer (which in reality is a string) can't be  
automatically converted to a string.

That would be equally interesting, though. DBI I thought used to bind  
all parameters as string by default, but maybe that has changed?

The parameter values are indeed all bound generically (and the query  
is created dynamically too), and I'm leaving it up to the DBD drivers  
to do the "Right Thing". I could obviously force everything into type  
string, but that is likely to have it's own repercussions on various  
RDBMSs.

So could you file this as a bug report on bugzilla.open-bio.org  
(category bioperl-db, this is actually not a BioSQL problem), and run  
the following test on your 8.3 instance (which minor version actually?):

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

SELECT * from t1 WHERE a = 1::text;
SELECT * from t1 WHERE b = 1::text;
SELECT * from t1 WHERE c = integer '1';

DROP TABLE t1;

These work all fine on my 8.1.4 instance.

	-hilmar

On Mar 20, 2008, at 3:24 PM, Erik wrote:
> Hi,
>
> (latest BioSQL, bioperl-db, and bioperl-live installed.)
>
> Postgres 8.3 will not auto-cast text (='character
> varying') to integer any longer, which causes test
> t/16odba.t to fail:
>
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: error while executing query in
> Bio::DB::BioSQL::SeqAdaptor::find_by_query: ERROR:
> operator does not exist: character varying = integer
> LINE 1: ...eq.taxon_id FROM bioentry seq WHERE
> seq.identifier = 5456929
>
> It seems likely to cause many similar statements to fail;
> how should this be solved?
>
> I tried to fix it but I couldn't find the place where the
> statement/clauses are put together.
>
>
> Thanks,
>
> Erik Rijkers
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 19:30:03 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 00:30:03 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
 integer any longer
Message-ID: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>

On Thu, March 20, 2008 23:49, Hilmar Lapp wrote:
> Hi Erik, thanks for the report. Given the error message,
> it looks
> more like the integer (which in reality is a string) can't
> be automatically converted to a string.

you are right, of course :)


Here is the postgres 8.3.1 result of your sql statements:

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

SELECT * from t1 WHERE a = 1::text;     -- ok
SELECT * from t1 WHERE b = 1::text;     -- ok
SELECT * from t1 WHERE c = integer '1'; -- ok

The failure is always (virtually) the same:
ERROR:  operator does not exist: character varying = integer
LINE 1: SELECT * from t1 WHERE a = 1;
                                 ^
HINT:  No operator matches the given name and argument
type(s). You might need to add explicit type casts.


Then there is the cast function: for instance, I can let
the test in t/16odba.t proceed faultlessly with

 $seq = $biodb->get_Seq_by_id( "cast(5456929 as text)" );


I am also doubtful/curious as to how this would affect the
various loading scripts which I was going to use - I want
to set up a GBrowse with human/mouse/flybase sequence
annotation to show ChipSeq data against.

But one thing at a time, I guess...


> So could you file this as a bug report on
> bugzilla.open-bio.org
> (category bioperl-db, this is actually not a BioSQL
> problem),

I'll make an entry in bugzilla/bioperl-db.


Thanks for you quick reply!


Erik Rijkers


From David.Messina at sbc.su.se  Thu Mar 20 19:39:49 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 00:39:49 +0100
Subject: [Bioperl-l] Bio::DB::GenBank module
In-Reply-To: <501654.68882.qm@web51412.mail.re2.yahoo.com>
References: <501654.68882.qm@web51412.mail.re2.yahoo.com>
Message-ID: <628aabb70803201639y33df19a6ib83967c33dd90b7f@mail.gmail.com>

Hi Hongyu,

Those methods are inherited. get_Seq_by_acc(), for example, comes from
Bio::DB::WebDBSeqI.

The BioPerl Deobfuscator is one way to see where the methods a given class
has are actually coded. Here's the Deobfuscator view of
Bio::DB::GenBank.<http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ADB%3A%3AGenBank&sort_order=by+method&search_string=Bio%3A%3ADB%3A%3AGenBank>


Dave

From hlapp at gmx.net  Thu Mar 20 20:34:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 20:34:42 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
References: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
Message-ID: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>


On Mar 20, 2008, at 7:30 PM, Erik wrote:
> Here is the postgres 8.3.1 result of your sql statements:
>
> CREATE TABLE t1 (a varchar(10), b text, c integer);
>
> SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
> SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
> SELECT * from t1 WHERE c = '1'; -- ok
>
> [...]
> The failure is always (virtually) the same:
> ERROR:  operator does not exist: character varying = integer
> LINE 1: SELECT * from t1 WHERE a = 1;
>                                  ^
> HINT:  No operator matches the given name and argument
> type(s). You might need to add explicit type casts.


So it's indeed the backend that changed behavior. It's actually  
documented as I see now:

http://www.postgresql.org/docs/8.3/static/release-8-3.html

scroll to section E.2.2. Migration to Version 8.3, E.2.2.1. General,  
and the first item there:

<quote>
Non-character data types are no longer automatically cast to TEXT  
(Peter, Tom)

Previously, if a non-character value was supplied to an operator or  
function that requires text input, it was automatically cast to text,  
for most (though not all) built-in data types. This no longer  
happens: an explicit cast to text is now required for all non- 
character-string types.
</quote>

I can see the arguments there but this will prevent upgrading to 8.3  
for many many applications, and the comments from the Pg developers  
('fix your SQL to use casts') that I've seen there on the mailing  
lists are just not helpful. Fixing SQL is for many legacy  
applications is just not an option.

In the case of Bioperl-db it's very non-trivial, because all of a  
sudden we would be changing from a hands-off and let-the-driver- 
figure-it-out approach to forcing types everywhere.

So I think at this point with this change I have to declare Bioperl- 
db officially incompatible with PostgreSQL 8.3+ until we've found a  
solution to this, which is too bad because it seems 8.3 has some  
really nice performance features added.

One possible solution might be to create a CAST in the database  
(namely the one that was taken away, restoring behavior to pre-8.3).  
Another possibility is to move the parameter binding method into the  
driver adaptor which would then delegate to the DBI method but would  
be overridden for the PostgreSQL adapter to force all bindings to  
type string.

Which leads me back to the surprise observation that the parameter  
was bound as an integer in the first place, when DBD::Pg used to bind  
everything as string unless you told it otherwise. Which DBD::Pg  
version is it that you are using? I would suspect (or hope) that  
maybe there is soon an update release of DBD::Pg that fixes this  
problem by going back to binding everything as string by default (and  
as the tests show PostgreSQL will still convert strings to integer if  
necessary).

Depending on what I (or can someone else update us on this?) find out  
for the DBD::Pg plans, I'll probably start looking into moving the  
parameter binding into the driver adapters. Though it does feel  
pathetic that this is now also not transparent between drivers.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 20:51:43 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 01:51:43 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
Message-ID: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>

On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>
> So I think at this point with this change I have to
> declare Bioperl-
> db officially incompatible with PostgreSQL 8.3+ until
> we've found a
> solution to this, which is too bad because it seems 8.3
> has some
> really nice performance features added.

Pg 8.3 is indeed very noticably faster, and it has other
excellent new features like full text indexing. (This also
makes that downgrading is not really an option)


> Which DBD::Pg version is it that you are using?

DBD::Pg 2.3.0


Thanks,

Erik Rijkers


From hlapp at gmx.net  Thu Mar 20 21:36:50 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 21:36:50 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
References: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
Message-ID: <071CB899-AB3E-40B8-9477-82AE98DB88B1@gmx.net>


On Mar 20, 2008, at 8:51 PM, Erik wrote:
> On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>>
>> So I think at this point with this change I have to declare  
>> Bioperl-db officially incompatible with PostgreSQL 8.3+ until  
>> we've found a solution to this, which is too bad because it seems  
>> 8.3 has some really nice performance features added.
>
> Pg 8.3 is indeed very noticably faster, and it has other
> excellent new features like full text indexing. (This also
> makes that downgrading is not really an option)

Right, I saw that too. It is, however, just migrated from what was a  
contrib module before, so downgrading and using the contrib module is  
an option.

Furthermore, folding these new features together with a behavior  
change that is backwards incompatible was a choice the PostgreSQL  
people made, not we.

We also aren't doing poor typing that deserves fixing; we're just not  
doing any typing by treating everything as a string. This is the Perl  
paradigm.

At this point it's actually unclear to me how this new behavior is  
compatible with untyped scripting languages unless you know the type  
of each column that you're binding a value for, because if you  
actually force typecasts to string for everything you get an error if  
an integer is indeed what's needed.

I'm wondering what I'm missing.

	-hilmar

BTW what does the following query yield on your 8.3.1 database:

select s.typname as source, t.typname as target, f.proname as  
function, c.castcontextfrom pg_cast c, pg_type s, pg_type t, pg_proc  
f where c.castsource = s.oid and c.casttarget = t.oid and c.castfunc  
= f.oidand t.typname = 'text';

On my 8.1.4 database I get:

   source    | target | function | castcontext
-------------+--------+----------+-------------
  bpchar      | text   | text     | i
  char        | text   | text     | i
  name        | text   | text     | i
  int8        | text   | text     | i
  int2        | text   | text     | i
  int4        | text   | text     | i
  oid         | text   | text     | i
  float4      | text   | text     | i
  float8      | text   | text     | i
  macaddr     | text   | text     | e
  cidr        | text   | text     | e
  inet        | text   | text     | e
  date        | text   | text     | i
  time        | text   | text     | i
  timestamp   | text   | text     | i
  timestamptz | text   | text     | i
  interval    | text   | text     | i
  timetz      | text   | text     | i
  numeric     | text   | text     | i
(19 rows)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From greg at turnstep.com  Thu Mar 20 22:41:10 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Fri, 21 Mar 2008 02:41:10 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>
Message-ID: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Which leads me back to the surprise observation that the parameter
> was bound as an integer in the first place, when DBD::Pg used to bind
> everything as string unless you told it otherwise. Which DBD::Pg
> version is it that you are using? I would suspect (or hope) that
> maybe there is soon an update release of DBD::Pg that fixes this
> problem by going back to binding everything as string by default (and
> as the tests show PostgreSQL will still convert strings to integer if
> necessary).
>
> Depending on what I (or can someone else update us on this?) find out
> for the DBD::Pg plans, I'll probably start looking into moving the
> parameter binding into the driver adapters. Though it does feel
> pathetic that this is now also not transparent between drivers.

What you are probably looking for is already there, namely:

$dbh->{pg_server_prepare} = 0;

There's good reasons for the casting enforcement in 8.3, although I've
been a sharp critic of the change, and certainly of the suddeness
of it. Another solution to consider is adding the casts back in:

http://people.planetpostgresql.org/peter/index.php?/archives/2008/03.html
(the March 4th entry)

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803202237
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfjIBYACgkQvJuQZxSWSsiamwCdEbNrC4F4oU7AGHrbHAm1YNXG
HbUAoIRJtGW4brvMKklxZYG6pusbcTqf
=Zawx
-----END PGP SIGNATURE-----


From David.Messina at sbc.su.se  Fri Mar 21 04:36:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 09:36:16 +0100
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or
	Bio::Seq::SeqWithQuality?
In-Reply-To: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
References: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
Message-ID: <628aabb70803210136p11de495p26d0ffaebbc3370e@mail.gmail.com>

Hi Joseph,

This looks like a bug; I saw the same thing here. Could you please submit
this to the bug tracker <http://www.bioperl.org/wiki/Bugs> along with your
test code?


Thanks,
Dave

From hlapp at gmx.net  Fri Mar 21 08:52:39 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 21 Mar 2008 08:52:39 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
References: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
Message-ID: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>

Hi Greg - thanks for your email, it's very helpful.

On Mar 20, 2008, at 10:41 PM, Greg Sabino Mullane wrote:
>>
>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

So disabling server-side prepares will leave values quoted? Having  
server-side prepares would be very useful though, especially for  
Bioperl-db with its many lookup queries that all use similar  
parameter values.

>
> There's good reasons for the casting enforcement in 8.3

I do understand that, but it's also a sharp contrast to other RDBMSs  
that doesn't it make it easier for people to choose Pg when they  
should, and doesn't help writing cross-platform database applications  
either.

> although I've been a sharp critic of the change, and certainly of  
> the suddeness
> of it. Another solution to consider is adding the casts back in:
>
> http://people.planetpostgresql.org/peter/index.php?/archives/ 
> 2008/03.html
> (the March 4th entry)


Thanks for this, that helps a lot.

Do you have links to some of the key threads showing what rationale  
went into the decision? (Or should I just search for your name?) I'd  
like to read up on that first before pouring more oil into the fire.  
I suspect that many of those who made the decision are never faced  
with needing to write cross-RDBMS code.

Also, I wonder why this wasn't made a configurable option so it can  
be disabled by a simple config file change (such as the move away  
from automatic OID columns). But obviously this is the wrong list for  
discussing this (though Bioperl-db *is* one of those pieces of  
software that must be cross-RDBMS).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From baucom at msg.ucsf.edu  Fri Mar 21 16:13:00 2008
From: baucom at msg.ucsf.edu (Albion Baucom)
Date: Fri, 21 Mar 2008 13:13:00 -0700
Subject: [Bioperl-l] SearchIO Performance
Message-ID: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>

Hi. I am pretty new to BioPerl, and have a question about performance  
with regard to Blast (nucleotide) file parsing. My Blast result files  
usually have close to 100 or more sequence hits. Each sequence is  
about 1400 nucleotides long.

After profiling code I wrote, I find that calling the next_result()  
function after creating a search object takes substantially longer  
than non-OO, quick and dirty code I am using to parse the same Blast  
files.

What is substantially longer? Well, the existing code takes about 0.25  
seconds, and the BioPerl call takes about 4.5 seconds. I find that to  
be a dramatic difference, and that kind of time difference becomes  
significant when I have to parse 30 Blast files in a row. I understand  
that SearchIO is parsing the entire file and storing it all for easy  
retrieval later, and maybe this time penalty is what I have to pay for  
that convenience and organization.

I am just wondering if there is anything other than writing custom  
code based on BioPerl to speed this up. Something I might not be aware  
of that I can do ahead of time, or during parsing, to limit what is  
parsed, or facilitate the parsing process. For instance, is there a  
way to "look ahead" and simply parse alignments that meet a specific  
expectancy cutoff?

I confess I have not read the documentation thoroughly (although  
obviously enough to make it do what I want), but am certainly willing  
to do so if someone can point me in the right direction.

Thanks

Albion

From jason at bioperl.org  Fri Mar 21 17:40:00 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 21 Mar 2008 14:40:00 -0700
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
Message-ID: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>


On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:

> Hi. I am pretty new to BioPerl, and have a question about  
> performance with regard to Blast (nucleotide) file parsing. My  
> Blast result files usually have close to 100 or more sequence hits.  
> Each sequence is about 1400 nucleotides long.
>
> After profiling code I wrote, I find that calling the next_result()  
> function after creating a search object takes substantially longer  
> than non-OO, quick and dirty code I am using to parse the same  
> Blast files.
>
> What is substantially longer? Well, the existing code takes about  
> 0.25 seconds, and the BioPerl call takes about 4.5 seconds. I find  
> that to be a dramatic difference, and that kind of time difference  
> becomes significant when I have to parse 30 Blast files in a row. I  
> understand that SearchIO is parsing the entire file and storing it  
> all for easy retrieval later, and maybe this time penalty is what I  
> have to pay for that convenience and organization.
>
> I am just wondering if there is anything other than writing custom  
> code based on BioPerl to speed this up. Something I might not be  
> aware of that I can do ahead of time, or during parsing, to limit  
> what is parsed, or facilitate the parsing process. For instance, is  
> there a way to "look ahead" and simply parse alignments that meet a  
> specific expectancy cutoff?
>
> I confess I have not read the documentation thoroughly (although  
> obviously enough to make it do what I want), but am certainly  
> willing to do so if someone can point me in the right direction.
>
We are quite aware of the speed issues.  This is discussed on the  
wiki in brief detail.
http://bioperl.org/wiki/Why_BioPerl_is_slow

It boils down to the object creation not the parsing (relatively  
speaking).  It takes a while because we're creating a lot of objects  
under the hood for each alignment.  Sendu has written a pull parser  
that doesn't require creation of all the objects until the user  
requests them.
As I've said in the past, if someone wrote SearchIO event-listener  
that created lightweight objects (or just hashes) instead this would  
also provide a substantial speedup.

In the fall I did some experimentation with array-based instead of  
hash-based feature objects got a pretty decent speedup as well, but  
just haven't had any time to roll out a more substantial  
prototyping.  For the inner-loops of things it may make sense to  
substitute a less-flexible but super-fast object.

I always advocate thinking about what your needs are - if you just  
want start/stop of alignments, you can grab this out of a blast  
format  table with the -m9 (NCBI) or --mformat =3 (WUBLAST) and you  
can write a fast parser that uses 'split'.


> Thanks
>
> Albion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From er at xs4all.nl  Fri Mar 21 17:43:47 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 22:43:47 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
 swissprot
Message-ID: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>

Hi,

PostgreSQL 8.3.1
DBD::Pg 2.3.0
perl 5.8.8

(The following error may have to do with the 8.3 problems
that I reported yesterday (bug 2472) - I don't know)

 I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
problem.

Then I ran scripts/biosql/load_seqdatabase.pl as:

perl scripts/biosql/load_seqdatabase.pl \
  -driver Pg \
  -dbuser xxxxxxx \
  -dbname bioseqdb \
  -namespace swissprot \
  -format swiss \
   /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat

It took two hours to load 26504 records (7%) of
uniprot_sprot.dat (is it expected to be so slow?), then
failed with:

Could not store Q2UXW0:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Species) failed to insert or to
be found by unique key
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK: Bio::DB::Persistent::PersistentObject::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: scripts/biosql/load_seqdatabase.pl:630
-----------------------------------------------------------


I don't know if this is directly related to the 8.3
casting problems I reported yesterday (bug 2472), or a
separate Bio::Species issue


regards,

Erik Rijkers


From bix at sendu.me.uk  Fri Mar 21 19:17:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 21 Mar 2008 23:17:59 +0000
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
	<8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
Message-ID: <47E44227.3050002@sendu.me.uk>

Jason Stajich wrote:
> 
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
> 
>> Hi. I am pretty new to BioPerl, and have a question about performance 
>> with regard to Blast (nucleotide) file parsing.
[...]
>> What is substantially longer? Well, the existing code takes about 0.25 
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to 
>> be a dramatic difference, and that kind of time difference becomes 
>> significant when I have to parse 30 Blast files in a row. I understand 
>> that SearchIO is parsing the entire file and storing it all for easy 
>> retrieval later, and maybe this time penalty is what I have to pay for 
>> that convenience and organization.
[...]
> Sendu has written a pull parser that 
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that 
> created lightweight objects (or just hashes) instead this would also 
> provide a substantial speedup.

Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the 
format to 'blast_pull'. Depending on the cirumstance and thoughtful 
usage, you can see orders of magnitude speed up.

http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html

The only disadvantage to the normal parser is that the pull parser 
currently only supports NCBI BLASTN and BLASTP.

From hlapp at gmx.net  Sat Mar 22 14:18:45 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 14:18:45 -0400
Subject: [Bioperl-l] Call for Student Applications - NESCent participates in
	the Google Summer of Code
In-Reply-To: <0025B440-EF1E-4632-9DB4-B98489BF3550@duke.edu>
Message-ID: <5AC4F213-8D88-41C6-B380-59B2EF7831F0@gmx.net>

Hi all - just wanted to draw your attention to our Google Summer of  
Code participation this year. One of the projects deals directly with  
BioPerl, another one builds on BioSQL (and could be implemented  
taking advantage of BioPerl or Bio::Phylo, or Biojava).

Cheers,

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================

Phyloinformatics Summer of Code 2008
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

*** Please disseminate this announcement widely to appropriate students
at your institution ***

The National Evolutionary Synthesis Center (NESCent: http:// 
www.nescent.org/) is participating in 2008 for the second year as a  
mentoring organization in the Google Summer of Code (http:// 
code.google.com/soc). Through this program, Google provides  
undergraduate, masters, and PhD students with a unique opportunity to  
obtain hands-on experience writing and extending open-source software  
under the mentorship of experienced developers from around the world.

Our goal in participating is to train future researchers and  
developers to not only have awareness and understanding of the value  
of open-source and collaboratively developed software, but also to  
gain the programming and remote collaboration skills needed to  
successfully contribute to such projects. Students will receive a  
stipend from Google, and may work from their home, or home  
institution, for the duration of the 3 month program. Students will  
each have one or more dedicated mentors with expertise in  
phylogenetic methods and open-source software development.

NESCent is particularly targeting students interested in both  
evolutionary biology and software development. Project ideas (see URL  
below) range from visualizing phylogenetic data in R, to development  
of a Mesquite module, web-services for phylogenetic data providers or  
geophylogeny mashups, implementing phyloXML support, navigating  
databases of networks, topology queries for PhyloCode registries, to  
phylogenetic tree mining in a MapReduce framework, and more.

The project ideas are flexible and many can be adjusted in scope to  
match the skills of the student. If the program sounds interesting to  
you but you are unsure whether you have the necessary skills, please  
email the mentors at the address below.  We will work with you to  
find a project that fits your interests and skills.

INQUIRIES:
Email any questions, including self-proposed project ideas, to  
phylosoc {at}
nescent {dot} org.

TO APPLY:
Apply on-line at the Google Summer of Code website
(http://code.google.com/soc/2008), where you will also find GSoC program
rules and eligibility requirements.  The 1-week application period for
students opens on Monday March 24th and runs through Monday, March  
31st, 2008.

Hilmar Lapp and Todd Vision
US National Evolutionary Synthesis Center

=====
URLs:
=====

2008 NESCent Phyloinformatics Summer of Code:
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

Eligibility requirements:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_eligibility

Stipends:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_administrivia

To sign up for quarterly NESCent newsletters: with announcements about
upcoming programs at the Center:
http://www.nescent.org/about/contact.php


From hlapp at gmx.net  Sat Mar 22 15:30:07 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 15:30:07 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <14191FB6-A8DF-4F60-9FF7-FDAA8F3974B8@gmx.net>

Hi Erik,

I suspect that's a seperate Bio::Species issue. If you query your  
BioSQL database for the existence of the taxon:

SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

do you get a result? If not, then for some reason the taxon wasn't  
yet included in the NCBI taxonomy that you loaded.

If yes, then somehow BioPerl didn't properly parse out the taxonID  
from the record. There should have been another message preceding the  
error below, could you post that too? Otherwise, can you rerun with -- 
printerror as command line argument?

Note also that you can always specify --safe to go past any loading  
error. In fact that's what I recommend doing unless you want to debug  
why a particular record doesn't load.

BTW I would recommend that you restore the CASTs that were removed in  
Pg 8.3; otherwise you may hit random issues in Bioperl-db whenever a  
parameter value for a string-type column happens to be a number.  
(taxon.ncbi_taxon_id is of type integer)

See http://people.planetpostgresql.org/peter/index.php?/archives/18- 
Readding-implicit-casts-in-PostgreSQL-8.3.html

as per Greg's email.

	-hilmar

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> Hi,
>
> PostgreSQL 8.3.1
> DBD::Pg 2.3.0
> perl 5.8.8
>
> (The following error may have to do with the 8.3 problems
> that I reported yesterday (bug 2472) - I don't know)
>
>  I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
> problem.
>
> Then I ran scripts/biosql/load_seqdatabase.pl as:
>
> perl scripts/biosql/load_seqdatabase.pl \
>   -driver Pg \
>   -dbuser xxxxxxx \
>   -dbname bioseqdb \
>   -namespace swissprot \
>   -format swiss \
>    /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat
>
> It took two hours to load 26504 records (7%) of
> uniprot_sprot.dat (is it expected to be so slow?), then
> failed with:
>
> Could not store Q2UXW0:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: create: object (Bio::Species) failed to insert or to
> be found by unique key
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:206
> STACK: Bio::DB::Persistent::PersistentObject::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:244
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:169
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:251
> STACK: Bio::DB::Persistent::PersistentObject::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:271
> STACK: scripts/biosql/load_seqdatabase.pl:630
> -----------------------------------------------------------
>
>
> I don't know if this is directly related to the 8.3
> casting problems I reported yesterday (bug 2472), or a
> separate Bio::Species issue
>
>
> regards,
>
> Erik Rijkers
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Mar 22 16:01:51 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 16:01:51 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
	swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <69D3EA33-810B-40EA-8687-752FA1A34FBF@gmx.net>

Forgot to respond to this:

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> It took two hours to load 26504 records (7%) of uniprot_sprot.dat  
> (is it expected to be so slow?)


The last time I used to load those regularly it was a bit faster (~ 5  
seqs/s) but it is in a ballpark that wouldn't raise a red flag for me.

BTW you can make it print statistics using the --logchunk N option,  
where N is the number of seqs after which you want the current count  
and the #recs/s printed.

You may get it to be faster if you tune the database (e.g., make sure  
there is enough memory for index reorganization, transaction log and  
tablespace datafile are on separate disks, etc; fiddling with the  
query optimizer has probably little effect as almost all queries are  
simple lookups or inserts).

That all said, the strength of load_seqdatabase.pl isn't speed. It  
doesn't make use of any bulk upload optimizations, and therefore the  
initial load of a very large database will take its time. The power  
is more in subsequent updates where you can configure what you want  
to happen, and during which the database is never in an inconsistent  
state, so it can run in the background.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Sat Mar 22 16:34:14 2008
From: er at xs4all.nl (Erik)
Date: Sat, 22 Mar 2008 21:34:14 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>

On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:

> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

No, I don't seem to have that particular id, although I
ran the ncbi load script yesterday just before the sprot. 
Btw, in the meantime I figured out that it was a parsing
error choking on an unexpected period.

You asked for preceding errors, but there were none.

I have now restarted the same uniprot_sprot.dat load with
--safe, which if I understand you correctly will just skip
any non-parsable records.


And wrt the postgres 8.3 casting:

I only added the first cast of the list of Peter Eisentraut:

CREATE FUNCTION pg_catalog.text(integer) RETURNS text
STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT
textin(int4out($1));'; --added 20080322
CREATE CAST (integer AS text) WITH FUNCTION
pg_catalog.text(integer) AS IMPLICIT; --added 20080322

I hope eventually a more durable solution will be found -
I fear this reinstalling of old casting functionality will
generate unexpected problems of it's own.

But it seems a good intermediary solution; with it, the
previously failing t/16odba.t succeeds...


Thank you,

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 17:16:18 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 17:16:18 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
References: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
Message-ID: <3C253027-5A2B-4C0C-9DF6-A0DA84CC96A8@gmx.net>


On Mar 22, 2008, at 4:34 PM, Erik wrote:
> On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:
>
>> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;
>
> No, I don't seem to have that particular id, although I
> ran the ncbi load script yesterday just before the sprot.

Odd. It's on the NCBI taxonomy browser. Maybe just was added the  
other day?

> Btw, in the meantime I figured out that it was a parsing
> error choking on an unexpected period.

Do you want to report that to the BioPerl category on bugzilla.open- 
bio.org?

>
> You asked for preceding errors, but there were none.
>
> I have now restarted the same uniprot_sprot.dat load with
> --safe, which if I understand you correctly will just skip
> any non-parsable records.

And all records that cause some other database error when inserting.  
Note that if you didn't erase the previously loaded records, you will  
either need to choose a new namespace, or, better, use the --lookup  
and --noupdate flags.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mrphysh at juno.com  Sat Mar 22 18:39:22 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Sat, 22 Mar 2008 22:39:22 GMT
Subject: [Bioperl-l] these objects are pretty cool
Message-ID: <20080322.163922.21808.1@webmail01.vgs.untd.com>


I am starting to understand how to use the objects.  

I am the sort who wants to understand how things work, at least on some level.  I think my understanding wold be increased with knowledge of the actual contents of the object.  My book says they are hashes and that makes sense:  field-value.......field-value.......field-value 

But as far as I can tell they cannot be taken apart like a regular hash.

How can I print out the contents?  this makes sense to me:
xxxxxxxxxxxxxxxx
use Bio::SeqIO;

#these objects were made for file input...conversion...file output

while ( my $seq = $out->next_seq() ) {print "$seq\n"; }                        

#or..how about this?

foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }   
xxxxxxxxxxxxxx
this gives back:

Bio::Seq::RichSeq=HASH(0x860dcdc)
Bio::Seq=HASH(0x85f5a20)

Is there some way to slice up the object and look at the parts?

John 
_____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2121/fc/REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/


From jason at bioperl.org  Sat Mar 22 18:46:46 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 22 Mar 2008 15:46:46 -0700
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <20080322.163922.21808.1@webmail01.vgs.untd.com>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
Message-ID: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>

it's got methods that you need to call to get the data.  Did you try  
looking at any of the howtos - they discuss this sort of thing.

http://bioperl.org/wiki/HOWTOs

-jason
On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:

>
> I am starting to understand how to use the objects.
>
> I am the sort who wants to understand how things work, at least on  
> some level.  I think my understanding wold be increased with  
> knowledge of the actual contents of the object.  My book says they  
> are hashes and that makes sense:  field-value.......field- 
> value.......field-value
>
> But as far as I can tell they cannot be taken apart like a regular  
> hash.
>
> How can I print out the contents?  this makes sense to me:
> xxxxxxxxxxxxxxxx
> use Bio::SeqIO;
>
> #these objects were made for file input...conversion...file output
>
> while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>
> #or..how about this?
>
> foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
> xxxxxxxxxxxxxx
> this gives back:
>
> Bio::Seq::RichSeq=HASH(0x860dcdc)
> Bio::Seq=HASH(0x85f5a20)
>
> Is there some way to slice up the object and look at the parts?
>
> John
> _____________________________________________________________
> Click to get a free auto insurance quotes from top companies.
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robfsouza at gmail.com  Sat Mar 22 19:11:49 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 22 Mar 2008 20:11:49 -0300
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
	<DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
Message-ID: <af6a4f100803221611w393e2dd1r965c11afd45978aa@mail.gmail.com>

Hi John,

The methods usually are designed to provide simple means to access the
object's internal data structure. They are intended to hide the
internal data structure, replacing the need to understand it by a
binding to the documented class methods.

If the documentation does not satisfy you and you still want to take a
look at its data structure, try dumping the object with a code like

use Bio::SeqIO;
use Data::Dumper;
while ( my $seq = $out->next_seq() ) { print Dumper($seq),"\n"; }

and check ou chapters four and five of Programming Perl.
Best,
Robson

PS: watch out for lots of printed output...

2008/3/22, Jason Stajich <jason at bioperl.org>:
> it's got methods that you need to call to get the data.  Did you try
>  looking at any of the howtos - they discuss this sort of thing.
>
>  http://bioperl.org/wiki/HOWTOs
>
>
>  -jason
>
> On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:
>
>  >
>  > I am starting to understand how to use the objects.
>  >
>  > I am the sort who wants to understand how things work, at least on
>  > some level.  I think my understanding wold be increased with
>  > knowledge of the actual contents of the object.  My book says they
>  > are hashes and that makes sense:  field-value.......field-
>  > value.......field-value
>  >
>  > But as far as I can tell they cannot be taken apart like a regular
>  > hash.
>  >
>  > How can I print out the contents?  this makes sense to me:
>  > xxxxxxxxxxxxxxxx
>  > use Bio::SeqIO;
>  >
>  > #these objects were made for file input...conversion...file output
>  >
>  > while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>  >
>  > #or..how about this?
>  >
>  > foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
>  > xxxxxxxxxxxxxx
>  > this gives back:
>  >
>  > Bio::Seq::RichSeq=HASH(0x860dcdc)
>  > Bio::Seq=HASH(0x85f5a20)
>  >
>  > Is there some way to slice up the object and look at the parts?
>  >
>  > John
>  > _____________________________________________________________
>  > Click to get a free auto insurance quotes from top companies.
>  > http://thirdpartyoffers.juno.com/TGL2121/fc/
>  > REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>  >
>  >
>  >
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From er at xs4all.nl  Sat Mar 22 19:36:13 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 00:36:13 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>

Hi Hilmar,

> either need to choose a new namespace, or, better, use the
> --lookup and --noupdate flags.

scripts/biosql/load_seqdatabase.pl is now churning along
to load uniprot_sprot.dat.  I'll try to gather up the
rejected records for further inspection / parser
improvement.

The next thing is performance, it's really intolerably
slow, and I don't think the database is the bottleneck -
isn't it more likely bioperl object heaviness?  I get
continuous near 100% load for 1 cpu (this machine has 2
cpus).

I could give it 10 or more processors; I am thinking I
could cut up the input into 10 (or more) chunks.

Is there anything specific in bioperl/biosql that knows
how to use multiple cores?

thank you very much for your help

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 21:40:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 21:40:55 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
References: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
Message-ID: <1B55060F-534D-4732-B428-4863DD098654@gmx.net>


On Mar 22, 2008, at 7:36 PM, Erik wrote:
> The next thing is performance, it's really intolerably
> slow, and I don't think the database is the bottleneck -
> isn't it more likely bioperl object heaviness?  I get
> continuous near 100% load for 1 cpu (this machine has 2
> cpus).


Is the database on the same machine? If yes, and a significant  
fraction (~30-50% or even more) of the load are generated by the perl  
script, rather than almost everything coming from the postmaster,  
then indeed the database is not the bottleneck.

Of course, the bioperl object creation overhead takes a toll too. I  
would be surprised though if BioPerl can't parse more than 3.6  
records/s on a modern CPU; you can convince yourself of that though  
by writing a simple script along the lines of the following and see  
how fast that goes:

my $seqio = Bio::SeqIO->new(-file => '<uniprot_sprot.dat', -format =>  
'swiss);
my $n = 0;
while (my $seq = $seqio->next_seq) {
	$n++;
	# print something every 5,000 sequences or so
}

But maybe load_seqdatabase.pl or even BioSQL or BioPerl aren't  
suitable for your use-case?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Mar 23 10:09:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:09:56 -0500
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
	<0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
Message-ID: <4C401D4F-064C-43F9-A37C-14FA65A96657@uiuc.edu>

Maybe something to discuss at BOSC?

chris

On Mar 19, 2008, at 12:54 PM, Jason Stajich wrote:

> it's probably more than 6 months out. We still haven't finished  
> writing it as life and work continues to intrude on book writing.
>
> -jason
> On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:
>
>> Hello,
>>
>> i just found on amazon something about a book "Using Bioperl",  
>> published
>> on the 1st of March 2008 but which is no more available.
>>
>> Does anyone know how to get it ?
>>
>> Many thanks,
>>
>> Jorge.
>>
>> ---
>> Jorge Duarte
>> Bioinformatics Software Engineer
>> BIOGEMMA
>> Z.I. Du Br?zet
>> 8, Rue des Fr?res Lumi?re
>> 63028 CLERMONT FERRAND Cedex 2
>> FRANCE
>> Tel : +33 (0)4 73 39 60 73
>> Fax : +33 (0)4 73 39 60 71
>> E-mail : jorge.duarte at biogemma.com
>>
>> *****************************************************************
>>       Pour toute demande de support merci d'inclure
>> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>>         dans les destinataires lors du premier contact
>> *****************************************************************
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Mar 23 10:17:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:17:56 -0500
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <FC757880-6FD8-44A5-97BE-F5F37A738C2B@uiuc.edu>


On Mar 18, 2008, at 10:32 AM, Sendu Bala wrote:

> aaron.j.mackey at gsk.com wrote:
>>> Or is the split intended to be 'core' == "anything and everything
>>> that was in 1.4", '????' == "everything else"? In which case,
>>> what's a good name for "modules created after 1.4"? 'crust'? ;)
>> Nah, "icing".
>> a module "use" map might be very useful to help identify "core" vs.
>> other layers of mantle/crust/icing.
>> http://www.perlmonks.org/?node_id=87329 http://search.cpan.org/src/NEILB/pmusage-1.2/
>
> Thanks for those. Neither could quite cope with BioPerl, but I've  
> munged
> them together and hacked up 'module_usage.pl' which I've just  
> committed
> to the maintenance directory of bioperl-live.
>
> module_usage.pl ../Bio
>
> Produces:
> *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
>
> ...
>
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)

I'm coming into this late (just got back) but I agree, this would be  
very useful.  Your updates based on Aaron's comments help quite a bit.

> Chris Fields wrote:
>> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
>> I'm pretty flexible on any of that; it's a proposal only and I think
>> some of it may be wrongheaded, but hey, I'm willing to take a few
>> rotten tomatoes.  The key issue is we should try to work out what we
>> mean by 'core' or the core library.  I have a rather extreme view of
>> it as being the bare essentials without external, non-perl core
>> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
>> and required modules for those classes) but I'm sure others would
>> lump in parsers, DB functionality, etc.  I basically suggest placing
>> those (and any stable but potentially non-core code) in a
>> 'bioperl-main', with any unstable or untested code going into a
>> 'bioperl-unstable'.
>
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>  stable APIs), directory structure and base classes that everything
>  else relies on
> # It might not do much useful bioinformatics, but provides just about
>  everything needed for a dev to create a new module that does

Yes, that's essentially the idea.

>> In essence, bioperl-main would require core and resemble a stable
>> release; bioperl-unstable would require bioperl-main (and core) and
>> resemble a dev release.  Not sure how versioning would go or if this
>> is a viable option at all, but it's worth discussing.
>
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>  tree, mostly parsers and the like which, whilst 'stable' and tested
>  should still be split away from core because the data sources they
>  parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>  stuff, be it new parsers for totally new formats, new APIs that do
>  something not thought of before etc. When they are complete, bug-free
>  and have stood the test of time they get moved into bioperl-main.
>  (It is not a place for all new commits; bug fixes to something in
>  bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>  their own core and bleed variant. Anything they need for core
>  functionality would enter the single bioperl-core, anything new
>  would enter the single bioperl-bleed, and anything stable would
>  be in their own bioperl-[package]
>
> Discuss :)

We can work on updating the plan via the wiki as well as the mail  
list.  I find it easier to track; we can always link back to the mail  
list when needed.

http://www.bioperl.org/wiki/Proposed_1.6_core_modules
http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules

chris

From er at xs4all.nl  Sun Mar 23 14:16:05 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 19:16:05 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
> aren't suitable for your use-case?

well, that may turn out to be the case, but I'm not quite
deterred yet.

I am in a situation like many others, I think: microarray,
mass spec, and chipseq (Solexa) data all need
annotation,and while it is easy to retrieve some useful
records from public data sources (entrez, ensembl &
biomart, etc.), it is not so easy to have such high
atomicity in the locally stored annotation data that
fine-grained filtering and sorting on a sql level becomes
possible.  I hope the bioperl parsers, together with the
biosql schema, will give SQL access to all or most data
bits.

And I understand GBrowse can run on top of BioSQL/Pg too,
albeit somewhat preliminary; this is another usage I will
need.

btw, should not all those references to postgres 7.3 be
upgraded to something newer, like 8.2.7 (maybe not yet 8.3
heh) ?  7.3 is not supported anymore by the pg project.

Sprot loaded in 20 hours. Only 170 were rejected - not too
bad.

Thanks,

Erik Rijkers


From hlapp at gmx.net  Sun Mar 23 15:22:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 23 Mar 2008 15:22:46 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
References: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
Message-ID: <EF4031EF-FBB5-493E-8D3C-8530E2ADFC20@gmx.net>


On Mar 23, 2008, at 2:16 PM, Erik wrote:
> On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
>> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
>> aren't suitable for your use-case?
>
> well, that may turn out to be the case, but I'm not quite
> deterred yet.
>
> I am in a situation like many others, I think: microarray,
> mass spec, and chipseq (Solexa) data all need
> annotation,and while it is easy to retrieve some useful
> records from public data sources (entrez, ensembl &
> biomart, etc.), it is not so easy to have such high
> atomicity in the locally stored annotation data that
> fine-grained filtering and sorting on a sql level becomes
> possible.  I hope the bioperl parsers, together with the
> biosql schema, will give SQL access to all or most data
> bits.

If you mean annotation by data bits then yes, it should be fairly  
normalized (possibly more normalized than you want, in fact).

Also, using BioSQL as the sequence and sequence annotation model add- 
on to some other database holding your lab data is what many others  
have used it for too.

>
> And I understand GBrowse can run on top of BioSQL/Pg too,
> albeit somewhat preliminary; this is another usage I will
> need.

It can, though keep in mind that that's not the use-case it (BioSQL)  
was built for. If you need to have rapid access to genome intervals  
with 10s of thousands of features and their annotation, you'll have  
start thinking about a more de-normalized data store to run this off  
of, such as populating a native GBrowse GFF store.

>
> btw, should not all those references to postgres 7.3 be
> upgraded to something newer, like 8.2.7 (maybe not yet 8.3
> heh) ?  7.3 is not supported anymore by the pg project.

Oops, indeed. Where are they?

>
> Sprot loaded in 20 hours. Only 170 were rejected - not too
> bad.

That's great. Would be nice if you can provide some rough summary as  
to why they were rejected (if that's obvious), such as taxon errors,  
or other errors.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From sac at bioperl.org  Sun Mar 23 18:20:43 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Sun, 23 Mar 2008 15:20:43 -0700
Subject: [Bioperl-l] HitTableWriter error
In-Reply-To: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
References: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
Message-ID: <8f200b4c0803231520o5082f9f5mf7be8cd061faa98f@mail.gmail.com>

Hi Owen

Sorry, I don't have time to look into this right now, but two thoughts:

1) The ResultTableWriter is intended to convert standard blast reports into
a tabular format. So if you already have tabular results, you are sort of
using it "off label", but in principle, it should work.

2) The iteration method is only available to BlastHit objects, since it is
only relevant to PSI-blast results. The fact that you got this error when
working with blast results indicates that the parser did not generate the
correct object type for your hits, using the GenericHit when it should have
used BlastHit. This is just a hunch and would be worth following up on. Can
you submit this as a bug report?

Thanks,
Steve


On Thu, Mar 20, 2008 at 4:56 AM, Owen Lancaster <ol8 at leicester.ac.uk> wrote:

> Hello
>
> I hope you don't mind me emailing you but I have come across a problem
> when trying to use HitTableWriter. The error can be seen below - the
> input for the script is the BLAST tabular output (specified with the -
> m 8 option) from a blastn search.
>
> If you have any idea what the problem might be I would much appreciate
> it! Hope you can help...
>
> Thanks
>
> Owen
>
>
> Using default column map.
>
> ------------- EXCEPTION  -------------
> MSG: Trouble in ResultTableWriter::_set_row_data_func() eval: Can't
> locate object method "iteration" via package
> "Bio::Search::Hit::GenericHit" at (eval 97) line 1, <GEN1> line 2.
>
>
>
> STACK Bio::SearchIO::Writer::ResultTableWriter::__ANON__ /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/ResultTableWriter.pm:328
> STACK Bio::SearchIO::Writer::HitTableWriter::to_string /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/HitTableWriter.pm:268
> STACK Bio::SearchIO::write_result /Library/Perl/5.8.8/Bio/SearchIO.pm:
> 331
> STACK Bio::SearchIO::blast::write_result /Library/Perl/5.8.8/Bio/
> SearchIO/blast.pm:2208
> STACK toplevel ./generate_discordant_tails.pl:62
>
> --------------------------------------
>
>

From greg at turnstep.com  Sun Mar 23 20:42:36 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Mon, 24 Mar 2008 00:42:36 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>
Message-ID: <4ab14dcc59d7566b55ba87027055e9fd@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

> So disabling server-side prepares will leave values quoted? Having
> server-side prepares would be very useful though, especially for
> Bioperl-db with its many lookup queries that all use similar
> parameter values.

Yes, it forces DBD::Pg to do the quoting itself, which basically means
that everything is shipped to the server as a single SQL string, and
no placeholders are used. In the grand scheme of things, the speed
difference is not large for most queries. Certainly one way would be
to turn this on for 8.3 and above, and slowly migrate the queries/schema
over time.

>> There's good reasons for the casting enforcement in 8.3

> I do understand that, but it's also a sharp contrast to other RDBMSs
> that doesn't it make it easier for people to choose Pg when they
> should, and doesn't help writing cross-platform database applications
> either.

I'm not overly familiar with how other databases treat this, but I've
heard DB2 can be a stickler about this too. I've not dug into the bioperl
code in a while, to be honest, so I'm not sure what sort of queries we're
talking about. Certainly long-term the code and schema should move away
from implicit casting. Maybe a better short-term solution is addind
the more obvious casts (e.g. text<->int) back in.

> Do you have links to some of the key threads showing what rationale
> went into the decision? (Or should I just search for your name?) I'd
> like to read up on that first before pouring more oil into the fire.
> I suspect that many of those who made the decision are never faced
> with needing to write cross-RDBMS code.
>
> Also, I wonder why this wasn't made a configurable option so it can
> be disabled by a simple config file change (such as the move away
> from automatic OID columns). But obviously this is the wrong list for
. discussing this (though Bioperl-db *is* one of those pieces of
> software that must be cross-RDBMS).

I did ask about that, and was told it would not have been easy to do so.
But I agree, a phasing in period (heck, even a warning) would have been
nice. Feel free to pour some oil on the fire, I think this is one of
many apps that has been affected. (I've run across two other major
cross-DB apps (Interchange and MediaWiki) that are struggling with the
same pain. I managed to painfully fix the latter, but the former is way
too complex to tackle at the moment).

I could not find the thread(s?) I weighed in on, but you can find some
relevant discussions by googling "strict-typing benefits grokbase"

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803232039
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfm+NAACgkQvJuQZxSWSsi4ogCdGNWvCJIzXxb+YKzdm6wwxQMv
p3AAnizkWXoo/rvxv4KVdC8tD0vF87k3
=dNYi
-----END PGP SIGNATURE-----


From er at xs4all.nl  Sun Mar 23 20:45:49 2008
From: er at xs4all.nl (Erik)
Date: Mon, 24 Mar 2008 01:45:49 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <19067.156.83.1.157.1206319549.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 20:22, Hilmar Lapp wrote:
>
> On Mar 23, 2008, at 2:16 PM, Erik wrote:
>> Sprot loaded in 20 hours. Only 170 were rejected - not
>> too bad.
>
> That's great. Would be nice if you can provide some rough
> summary as  to why they were rejected (if that's
obvious), such as taxon errors,
> or other errors.

see

   http://bugzilla.open-bio.org/show_bug.cgi?id=2474

So I think one easy improvement will be to enlarge that 
varchar(40) column, dbxref.accession. See the following:

select
	dbname
,	accession
,	length(accession)
from 	dbxref
where accession ~ 'Cyc'
order by length(accession) desc
limit 100

patch attached. (which will probably get bug 2389 resolved)

It seems to me bioentry.accession (maybe identifier too?)
needs a similar enlargement.


thanks,

Erikjan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biosqldb-pl.sql.diff
Type: application/octet-stream
Size: 535 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080324/a564d7ee/attachment.obj>

From Russell.Smithies at agresearch.co.nz  Wed Mar 26 22:13:16 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 27 Mar 2008 15:13:16 +1300
Subject: [Bioperl-l] Bioinformatician wanted
Message-ID: <D5DBA313349A4B458528BE63B387F36C06A09345@imail.agresearch.co.nz>

Dear colleagues,

 
It would be appreciated if you could bring the following position to the
attention of potential candidates.

 
Bioinformatician wanted


Many of the world's most amazing scientific discoveries are the result
of someone 'taking a closer look'. It's this inquisitive nature and
relentless search for answers that fuels scientific advancement. And
it's also what we'd like you to apply to us, right now. Mind you, you
won't need to look too hard to discover that AgResearch is the best
place to break all new ground in your career.  


At first glance you'll see we are New Zealand's largest research
institute -world leaders in pastoral research working at the
leading-edge of innovation. Scratch the surface a little more and you'll
find all the diversity and intellectual challenge a Bioinformatician
could ask for. This is a highly collaborative role where you'll be
involved in everything from the analysis of genomic data to the design,
development, implementation and testing of bioinformatics tools.
Knowledge sharing is a pivotal component of our success, so you can also
look forward to acting in consultant capacity (both internally and
externally) and the autonomy to contribute to scientific publications. 

 
Our people are at the pinnacle in their professions, so with your
biological background, higher qualification in bioinformatics or
computing and your experience in contributing bioinformatics expertise
to research groups, you'll not only fit right in, you'll hit the ground
running. An outstanding communicator, time manager and relationship
builder, you'll also come to us with a thorough knowledge of Unix,
pipeline-development, web based technologies and scripting and
programming languages. 

 
AgResearch is a unique organisation at the forefront of our field, and
as far as your future's concerned, that makes us well worth a closer
look. There are many benefits waiting to be discovered here, so isn't it
time you experienced them? The job description is available online and
applications are invited at www.agresearch.co.nz/recruitment/ Reference
AGR661, or contact Nauman Maqbool for further information.

 
Applications close 11 April 2008.

 
Regards,
Russell


Russell Smithies 
Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 
AgResearch Farming Food and Health. First 
Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi 
Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From Marc.Logghe at ablynx.com  Thu Mar 27 09:26:24 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 14:26:24 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
Message-ID: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>

Hi all,

I am a little bit confused about the above mentioned seq_inds() method.
At first, I had the impression that the method returns an array of
positions in the hsp (hit or query) sequence.

At least that is what one would expect looking at the example usage in
the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
second code block).

Am I correct in believing you can only do this if your hsp query stretch
starts at position 1 of the query sequence?

I think seq_inds() returns a list of positions relative to the query/hit
sequence. So, the code shown in the HOWTO is a kind of special case.

However, I do not understand how seq_inds() is dealing with gaps.

An example. If you blast the worm protein ZK822.4 against swissprot
using blastp at ncbi you get this hsp as top:

 
>sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461
 
 Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
stats.
 Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
 
Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

 
Now, if you call seq_inds(query => 'gap') on that particular hsp object,
you get these positions: 417, 431, 432. Obviously, there is no gap in
the original query sequence at these positions. 
How do you have to read these numbers ? Remark also that for instance
417 is the res just in front of the gap.

Regards,

Marc

 
From bix at sendu.me.uk  Thu Mar 27 10:46:35 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 27 Mar 2008 14:46:35 +0000
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <47EBB34B.8010606@sendu.me.uk>

Marc Logghe wrote:
> Hi all,
> 
> I am a little bit confused about the above mentioned seq_inds() method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.

Yes...


> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
> 
> Am I correct in believing you can only do this if your hsp query stretch
> starts at position 1 of the query sequence?

No...


> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
>             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions. 
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.

Its purpose is to let you know the position in query or subject 
coordinates where something interesting happened in the alignment. So 
seq_inds(query => 'gap') is telling you all the places that a gap starts 
in the alignment in terms of the query coordinates. Hence 417 etc.


(Actually, does 432 make sense? Shouldn't it be 431 twice?)

From Marc.Logghe at ablynx.com  Thu Mar 27 11:09:56 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 16:09:56 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <47EBB34B.8010606@sendu.me.uk>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
Message-ID: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>

Hi Sendu, Chris

> > At least that is what one would expect looking at the example usage
in
> > the HOWTOs
(http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> > second code block).
> >
> > Am I correct in believing you can only do this if your hsp query
stretch
> > starts at position 1 of the query sequence?
> 
> No...

<HOWTO>
# put all the conserved matches in query strand into an array
my @str_array = split "",$hsp->query_string;
foreach ( $hsp->seq_inds('query','conserved') ){
  push @conserved,$str_array[$_ - 1];
}
</HOWTO>

$hsp->query_string will return
'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'

In my example using the 'gap' class (instead of 'conserved'), @str_array
will contain 417, 431 and 432. The off-by-one indices do not exist in
that array.
Therefore, I still think the howto shows a special case where the hsp
query sequence starts at 1 (compared to 402 in my particular example). 


> 
> 
> > Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
453
> >             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> > Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
193
> >
> > Now, if you call seq_inds(query => 'gap') on that particular hsp
object,
> > you get these positions: 417, 431, 432. Obviously, there is no gap
in
> > the original query sequence at these positions.
> > How do you have to read these numbers ? Remark also that for
instance
> > 417 is the res just in front of the gap.
> 
> Its purpose is to let you know the position in query or subject
> coordinates where something interesting happened in the alignment. So
> seq_inds(query => 'gap') is telling you all the places that a gap
starts
> in the alignment in terms of the query coordinates. Hence 417 etc.

So, this means you have to interpret that as a gap is coming after 417 ?

> 
> 
> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
Don't know, depends on how you have to 'read' this.
Thanks for looking into this.
Regards,
Marc


From cjfields at uiuc.edu  Thu Mar 27 11:05:59 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 10:05:59 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <489252B3-2255-45C3-9219-A8F8A0978B89@uiuc.edu>

According to the GenericHSP::seq_inds() POD, seq_inds() reports  
residue positions (indices) for the query/subject based on identity/ 
conservation, i.e. these are fro the original sequence positions as  
determined by the HSP data, not alignment column positions.  'gaps'  
should be reported at the position prior to where a gap is inserted.   
However I think something is getting borked when the gap length is  
longer than one, so I would partially qualify this as a bug.

Example: When I ran this using bioperl-live it gives a different set  
of gaps indices which appear to be correct.  I reran the BLASTP using  
the web form using your query against swissprot and parsed it.  I got  
slightly different results for the BLAST report (probably differences  
in the query sequence):

 >gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461

  Score = 35.8 bits (81),  Expect = 0.47, Method: Composition-based  
stats.
  Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)

Query  394  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  445
             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

.....

seq_inds('query' => 'gaps') reports 409,423, and 424, which is  
partially correct, e.g. there is a gap inserted after position 409 and  
423 in the query.  However, no gap is present after 424; I think this  
occurs b/c the gap length is 2.  The other HSPs report similar problems.

chris

P.S. Just saw than Sendu posted; I agree, seq. positions with gap  
lengths > 1 should be repeated.  Should be easy to fix that.

On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote:

> Hi all,
>
> I am a little bit confused about the above mentioned seq_inds()  
> method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.
>
> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
>
> Am I correct in believing you can only do this if your hsp query  
> stretch
> starts at position 1 of the query sequence?
>
> I think seq_inds() returns a list of positions relative to the query/ 
> hit
> sequence. So, the code shown in the HOWTO is a kind of special case.
>
> However, I do not understand how seq_inds() is dealing with gaps.
>
> An example. If you blast the worm protein ZK822.4 against swissprot
> using blastp at ncbi you get this hsp as top:
>
>
>
>> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
> Length=1461
>
> Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
> stats.
> Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
>
> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL   
> 453
>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL   
> 193
>
>
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp  
> object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions.
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.
>
> Regards,
>
> Marc
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar 27 12:04:20 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 11:04:20 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
	<03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
Message-ID: <FF47D72B-7CBA-491B-9C9B-9B32FD2AA8BE@uiuc.edu>


On Mar 27, 2008, at 10:09 AM, Marc Logghe wrote:

> Hi Sendu, Chris
>
>>> At least that is what one would expect looking at the example usage
> in
>>> the HOWTOs
> (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> second code block).
>>>
>>> Am I correct in believing you can only do this if your hsp query
> stretch
>>> starts at position 1 of the query sequence?
>>
>> No...
>
> <HOWTO>
> # put all the conserved matches in query strand into an array
> my @str_array = split "",$hsp->query_string;
> foreach ( $hsp->seq_inds('query','conserved') ){
>  push @conserved,$str_array[$_ - 1];
> }
> </HOWTO>
>
> $hsp->query_string will return
> 'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'
>
> In my example using the 'gap' class (instead of 'conserved'),  
> @str_array
> will contain 417, 431 and 432. The off-by-one indices do not exist in
> that array.
> Therefore, I still think the howto shows a special case where the hsp
> query sequence starts at 1 (compared to 402 in my particular example).

We'll have to look at it; it should probably be clarified particularly  
in reference to 'gaps' and use of seq positions vs. HSP (or alignment)  
positions.

Think of it this way; seq_inds() takes 'identical', 'conserved', etc.,  
all of which refer to the original positions (indices) of the sequence  
which fall into the particular category asked for.  In these cases we  
are using the coordinates for query/hit directly from the HSP info in  
the report.  This is done with the express purpose of mapping  
attributes back to the original sequence, be it the query or subject.

Gaps, however, are tricky, since sequence coordinates refer to  
residues (not gaps) when using BLAST.  In this case we use the  
sequence position prior to the gap to note where a gap is inserted.   
The previous results, then, would be wrong as there is no gap inserted  
after 432.  I just committed a fix which just repeats the position  
based on the number of gaps.

>>> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
> 453
>>>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
>>> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
> 193
>>>
>>> Now, if you call seq_inds(query => 'gap') on that particular hsp
> object,
>>> you get these positions: 417, 431, 432. Obviously, there is no gap
> in
>>> the original query sequence at these positions.
>>> How do you have to read these numbers ? Remark also that for
> instance
>>> 417 is the res just in front of the gap.
>>
>> Its purpose is to let you know the position in query or subject
>> coordinates where something interesting happened in the alignment. So
>> seq_inds(query => 'gap') is telling you all the places that a gap
> starts
>> in the alignment in terms of the query coordinates. Hence 417 etc.
>
> So, this means you have to interpret that as a gap is coming after  
> 417 ?

Yes.

>> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
> Don't know, depends on how you have to 'read' this.
> Thanks for looking into this.
> Regards,
> Marc

Repeating the position based on the number of gaps is now the default  
in bioperl-live.  Just working on fixing problems with collapsing  
numbers and tests and everything should be fine.

chris

From hiekeen at gmail.com  Sat Mar 29 12:09:18 2008
From: hiekeen at gmail.com (Jinyan Huang)
Date: Sun, 30 Mar 2008 00:09:18 +0800
Subject: [Bioperl-l] Gene Id converts.
Message-ID: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>

Hi,

I have a list of gene bank accession id. I want to convert these ids to NCBI id.

For example:

>From NM_011917 to 2919914.

How can I do it?

Thanks


-- 
Best regards,
Jinyan Huang (ekeen)
School of Life Sciences and Technology, 1302 Room
Tongji University
Siping Road 1239, Shanghai 200092
P.R. China
Tel :0086-21-65981041
Msn: hiekeen at hotmail.com
eMail: hiekeen at gmail.com

From cjfields at uiuc.edu  Sat Mar 29 13:42:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 29 Mar 2008 12:42:50 -0500
Subject: [Bioperl-l] Gene Id converts.
In-Reply-To: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
References: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
Message-ID: <FC7E8E14-83E7-4611-868B-0D1390488C82@uiuc.edu>

There are the GenBank LiveLists (updated every Sunday), which has  
accession/version/UID mappings for nuc and protein GenBank records.  I  
haven't used it personally but it's worth a look:

ftp://ftp.ncbi.nih.gov/genbank/livelists/
There is also gene2accession, which contains mappings between  
accession and UID (though this is more EntrezGene-related, I believe):
ftp://ftp.ncbi.nih.gov/gene/DATA/

Both have documentation detailing formats.  I would recommend using  
one of the above two on a local database setup if you plan on  
converting a large number of accessions.

Bio::DB::EUtilities can also do this but is web-based via eutils.   
There are a couple of stub examples in the Cookbook HOWTO under  
'efetch' on converting accessions to UID (and vice versa), though note  
there is no one-to-one correspondence.  You can also convert UIDs to  
accessions using 'esummary' but the converse (accession to GI)  
requires, strangely, using efetch to grab the UIDs first, then re- 
retrieving the acccessions via esummary for one-to-one correspondence.
http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

chris

On Mar 29, 2008, at 11:09 AM, Jinyan Huang wrote:

> Hi,
>
> I have a list of gene bank accession id. I want to convert these ids  
> to NCBI id.
>
> For example:
>
>> From NM_011917 to 2919914.
>
> How can I do it?
>
> Thanks
>
>
> -- 
> Best regards,
> Jinyan Huang (ekeen)
> School of Life Sciences and Technology, 1302 Room
> Tongji University
> Siping Road 1239, Shanghai 200092
> P.R. China
> Tel :0086-21-65981041
> Msn: hiekeen at hotmail.com
> eMail: hiekeen at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pradel.d at gmail.com  Mon Mar 31 10:35:26 2008
From: pradel.d at gmail.com (Damien Pradel)
Date: Mon, 31 Mar 2008 16:35:26 +0200
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
Message-ID: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>

Hello,

I use the SeqIO module in order to  parse EMBL files.
Unfortunately I got a problem: the ID was not recognised because instead of
the ID value  I get the answer "unknown_id" ...

So to solve this problem I have modified the file embl.pm located in
directory SeqIO at the line 189 as follow :

if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

in :
if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

With this modification, the ID value is correctly collected.

Hope it will help.

Damien

From golharam at umdnj.edu  Mon Mar 31 15:31:56 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 31 Mar 2008 15:31:56 -0400
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module
Message-ID: <47F13C2C.4070909@umdnj.edu>

I have a (very) basic SAX implementation of a SeqIO module to parse 
GenBank XML records.  Right now, it only reads in basic information 
regarding the sequence and the sequence itself.

It does not yet parse the features table.  Should I submit it to be 
included in bioperl or wait until I implement more for the features 
table?  I'm not sure when I'll get around to it though

Ryan

From cjfields at uiuc.edu  Mon Mar 31 16:05:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 15:05:51 -0500
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO
	module
In-Reply-To: <47F13C2C.4070909@umdnj.edu>
References: <47F13C2C.4070909@umdnj.edu>
Message-ID: <4A3D5CD8-13D7-4CBF-B89A-CE81B8804C61@uiuc.edu>

You can submit it either to me directly or to bugzilla (start a new  
bug report as an enhancement request, then attach the relevant files).

Does it have a test suite available?  If not, you should try setting  
one up:

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

chris

On Mar 31, 2008, at 2:31 PM, Ryan Golhar wrote:

> I have a (very) basic SAX implementation of a SeqIO module to parse  
> GenBank XML records.  Right now, it only reads in basic information  
> regarding the sequence and the sequence itself.
>
> It does not yet parse the features table.  Should I submit it to be  
> included in bioperl or wait until I implement more for the features  
> table?  I'm not sure when I'll get around to it though
>
> Ryan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Mar 31 19:58:44 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 18:58:44 -0500
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
In-Reply-To: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
References: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
Message-ID: <629B8FAF-3A1F-41E2-BFF1-A709DDE56A09@uiuc.edu>

The parser no longer has this line; it has been updated to work with  
both old and new format EMBL.  You might want to try updating from  
Subversion or install the nightly build.

http://bioperl.org/DIST/nightly_builds/

chris

On Mar 31, 2008, at 9:35 AM, Damien Pradel wrote:
> Hello,
>
> I use the SeqIO module in order to  parse EMBL files.
> Unfortunately I got a problem: the ID was not recognised because  
> instead of
> the ID value  I get the answer "unknown_id" ...
>
> So to solve this problem I have modified the file embl.pm located in
> directory SeqIO at the line 189 as follow :
>
> if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> in :
> if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> With this modification, the ID value is correctly collected.
>
> Hope it will help.
>
> Damien
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dfog22 at hotmail.com  Wed Mar 26 10:13:25 2008
From: dfog22 at hotmail.com (MathGon)
Date: Wed, 26 Mar 2008 07:13:25 -0700 (PDT)
Subject: [Bioperl-l]  File concatenation
Message-ID: <16301515.post@talk.nabble.com>


For my first post, I will introduce myself. I'm a PhD student in microbiology
focusing in horizontal gene transfer in hyperthermophilic Archaea;

I retrieve a genbank file for each contig of an unfinished genome. I want to
produce a unique genbank file by concatenation. I didn't manage to find a
such script and I'm not enough trained in perl to write it...
Have you got an other solution or a script for me?

Best regards...
-- 
View this message in context: http://www.nabble.com/File-concatenation-tp16301515p16301515.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From anjan.purkayastha at gmail.com  Mon Mar  3 12:31:11 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 12:31:11 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
Message-ID: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>

hi
i am tried to use the perl wrappers for EMBOSS with:

use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
use Bio::Factory::EMBOSS;

however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
directory mentioned above.

so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
attached error message.

any ideas on what i need to do to  make this work?
all advice will be appreciated.

tia,

anjan


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emboss_install_error_message.rtf
Type: application/rtf
Size: 123212 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080303/08ee41b2/attachment-0002.rtf>

From cjfields at uiuc.edu  Mon Mar  3 13:54:06 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 12:54:06 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <AC7CFEBA-5699-45A0-9426-8885AEA78216@uiuc.edu>

You'll need to install bioperl-run.  Bio::Factory::EMBOSS is in  
bioperl-run, not the main bioperl distribution (aka bioperl-core).

chris

On Mar 3, 2008, at 11:31 AM, ANJAN PURKAYASTHA wrote:

> hi
> i am tried to use the perl wrappers for EMBOSS with:
>
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
>
> however it seems that Bio::Factory::EMBOSS cannot be found in the  
> bioperl
> directory mentioned above.
>
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i  
> got the
> attached error message.
>
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
>
> tia,
>
> anjan
>
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> < 
> emboss_install_error_message 
> .rtf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Mon Mar  3 14:34:20 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 3 Mar 2008 20:34:20 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>

Hi Anjan,

Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
rather part of bioperl-run. For some reason CPAN went for the old (1.4)
version of bioperl-run rather than the current 1.5.2.

And indeed, I seem to run into the same problem:
cpan> d /bioperl/

Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found

but when I ask in a different way the right distributions show up. [Sendu,
any idea what's going on here?]

cpan> ls
SENDU
 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz

So try doing

cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz

Or if CPAN refuses to cooperate, you can grab it from here:
http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release


Dave


From arareko at campus.iztacala.unam.mx  Mon Mar  3 14:25:14 2008
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 03 Mar 2008 13:25:14 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <47CC509A.10306@campus.iztacala.unam.mx>

Hi Anjan,

It looks like you are using the latest BioPerl developer release
(bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
then you should try installing the latest BioPerl-run as well
(bioperl-run-1.5.2_100). After you install it, you'll have to modify
your 'use lib' pragma for your script to work as you expect:

use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
use Bio::Factory::EMBOSS;

Hope this helps.

Regards,
Mauricio.


ANJAN PURKAYASTHA wrote:
> hi
> i am tried to use the perl wrappers for EMBOSS with:
> 
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
> 
> however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
> directory mentioned above.
> 
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
> attached error message.
> 
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
> 
> tia,
> 
> anjan
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Mon Mar  3 15:05:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 14:05:16 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
Message-ID: <43EC247B-EC01-483D-82B1-D861590A141A@uiuc.edu>


On Mar 3, 2008, at 1:34 PM, Dave Messina wrote:

> Hi Anjan,
>
> Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
> rather part of bioperl-run. For some reason CPAN went for the old  
> (1.4)
> version of bioperl-run rather than the current 1.5.2.
>
> And indeed, I seem to run into the same problem:
> cpan> d /bioperl/
>
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
>
> but when I ask in a different way the right distributions show up.  
> [Sendu,
> any idea what's going on here?]

It's marked as a developer release, which I think requires a full path  
(as you have below) and not just the package name.

chris

> cpan> ls
> SENDU
> 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
>  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
>   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
>  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz
>
> So try doing
>
> cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz
>
> Or if CPAN refuses to cooperate, you can grab it from here:
> http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release
>
>
> Dave


From anjan.purkayastha at gmail.com  Mon Mar  3 14:57:33 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 14:57:33 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <47CC509A.10306@campus.iztacala.unam.mx>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
Message-ID: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>

guys,
thanks! i got bioperl-run to work.
next question, let's say i want to run the palindrome program in emboss
using the bioperl wrapper.   now, palindrome takes in a list of parameter
values- these are fed into emboss as a key-value hash. where do i find the
correct names of the keys to create the input hash?
tia.

anjan

On Mon, Mar 3, 2008 at 2:25 PM, Mauricio Herrera Cuadra <
arareko at campus.iztacala.unam.mx> wrote:

> Hi Anjan,
>
> It looks like you are using the latest BioPerl developer release
> (bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
> then you should try installing the latest BioPerl-run as well
> (bioperl-run-1.5.2_100). After you install it, you'll have to modify
> your 'use lib' pragma for your script to work as you expect:
>
> use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
> use Bio::Factory::EMBOSS;
>
> Hope this helps.
>
> Regards,
> Mauricio.
>
>
> ANJAN PURKAYASTHA wrote:
> > hi
> > i am tried to use the perl wrappers for EMBOSS with:
> >
> > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> > use Bio::Factory::EMBOSS;
> >
> > however it seems that Bio::Factory::EMBOSS cannot be found in the
> bioperl
> > directory mentioned above.
> >
> > so i tried to install Bio::Factory::EMBOSS from the cpan website. i got
> the
> > attached error message.
> >
> > any ideas on what i need to do to  make this work?
> > all advice will be appreciated.
> >
> > tia,
> >
> > anjan
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 03:48:15 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 09:48:15 +0100
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
Message-ID: <47CD0CCF.4060306@medecine.unige.ch>

Hello,

Trying to run Bio::TreeIO by this command:

perl -e 'use Bio::TreeIO'

I get the following error:

Weak references are not implemented in the version of perl at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
recent version of bioperl around 5 month ago. Any suggestions of why 
this module can't be loaded correctly?

Greetings, Daniel


From bix at sendu.me.uk  Tue Mar  4 06:55:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 11:55:32 +0000
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
In-Reply-To: <47CD0CCF.4060306@medecine.unige.ch>
References: <47CD0CCF.4060306@medecine.unige.ch>
Message-ID: <47CD38B4.1070200@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Trying to run Bio::TreeIO by this command:
> 
> perl -e 'use Bio::TreeIO'
> 
> I get the following error:
> 
> Weak references are not implemented in the version of perl 
> [...]
> I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
> recent version of bioperl around 5 month ago. Any suggestions of why 
> this module can't be loaded correctly?

Redhat/Fedora apparently has Perl issues.

First try installing the latest version of Scalar::Util yourself:

perl -MCPAN -e shell
force install Scalar::Util

If that doesn't work, you'll have to download and compile Perl yourself 
from source (don't use Fedora's installation system).


From apapanicolaou at ice.mpg.de  Tue Mar  4 07:03:27 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:03:27 +0100
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
Message-ID: <47CD3A8F.9050902@ice.mpg.de>

hello all,

1) I was wondering if you would you know what this error means and had 
time to help...

Use of uninitialized value in concatenation (.) or string at 
/usr/local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287

line 287 is
            else {
                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
            }

this is the header
# $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
#
# BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm


this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8, 
bioperl: tried with both 1.5.2_102 from cvs and checked out svn version 
today)

use Bio::SearchIO::Writer::GbrowseGFF;
use Bio::SearchIO;
if ($program eq "blastn"){
#my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                        -file   => ">$infile.$query.blast.gff");
#my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                           -file   => ">>$infile.blast.gff");
$out_gff->write_result($result);
$out_gff_whole->write_result($result);
}


Where $result is a blast result...

The aim is to parse a multi-query blast report and split it into 
different queries and make another file with all the queries. I'm sure 
i'm forgetting something but I can't figure what...

The GFF file is produced, but I do get the error above...

2) Finally, there is a small bug but I don't think it comes from this 
module? The id attribute is printed out e.g iD=match_sequence31 with iD 
wrongly capitalised...

many thanks for your time
alexie

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From apapanicolaou at ice.mpg.de  Tue Mar  4 07:04:16 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:04:16 +0100
Subject: [Bioperl-l] Gbrowse.pm followup
Message-ID: <47CD3AC0.4080801@ice.mpg.de>

Oh the iD bug is fixed in the svn developer branch.

ta
a

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From cjfields at uiuc.edu  Tue Mar  4 08:16:04 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 07:16:04 -0600
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <4A68AA28-E508-4257-86E1-393CA9B74082@uiuc.edu>

I have run into a number of problems with the GbrowseGFF module myself  
(I think I committed the ID fix, actually).  It works but needs  
revision and needs better conformity with GFF3.

You can post (1) as a bug and well look into it when we can.  It's  
possible (depending on how extensive the fix is) this may have to wait  
until 1.7.

chris

On Mar 4, 2008, at 6:03 AM, Alexie Papanicolaou wrote:

> hello all,
>
> 1) I was wondering if you would you know what this error means and  
> had time to help...
>
> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>          else {
>              $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>          }
>
> this is the header
> # $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
> #
> # BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm
>
>
> this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8,  
> bioperl: tried with both 1.5.2_102 from cvs and checked out svn  
> version today)
>
> use Bio::SearchIO::Writer::GbrowseGFF;
> use Bio::SearchIO;
> if ($program eq "blastn"){
> #my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                      -file   => ">$infile.$query.blast.gff");
> #my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                         -file   => ">>$infile.blast.gff");
> $out_gff->write_result($result);
> $out_gff_whole->write_result($result);
> }
>
>
>
> Where $result is a blast result...
>
> The aim is to parse a multi-query blast report and split it into  
> different queries and make another file with all the queries. I'm  
> sure i'm forgetting something but I can't figure what...
>
> The GFF file is produced, but I do get the error above...
>
> 2) Finally, there is a small bug but I don't think it comes from  
> this module? The id attribute is printed out e.g iD=match_sequence31  
> with iD wrongly capitalised...
>
> many thanks for your time
> alexie
>
> -- 
> --
> Alexie Papanicolaou
> Entomology
> Max Planck Institute for Chemical Ecology
> Hans Knoell Str 8
> Jena 07745
> Germany
> Email apapanicolaou at ice.mpg.de
> Tel +493641571561
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 07:35:03 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 13:35:03 +0100
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
	alignment
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
Message-ID: <47CD41F7.2000401@medecine.unige.ch>

Hello,

Is it possible to remove only columns containing e.g. more than 75% gaps 
from an alignment? I was thinking at

$aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])

This would allow me to remove all gaps or gap-only columns but not using 
a threshold.

Greetings, Daniel


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 08:46:33 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 14:46:33 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
Message-ID: <47CD52B9.5060906@medecine.unige.ch>

Hello,

I would like to use bioperl to calculate a branch length score for a 
given set of nodes and a tree. I know how to get the total branch length 
by using $tree->total_branch_length, but how could I get the length of 
the subtree spanning some given nodes which are dispersed over the whole 
tree (a subset of nodes from the tree which are not monophyletic)?

Greetings, Daniel


From bix at sendu.me.uk  Tue Mar  4 09:37:53 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:37:53 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD52B9.5060906@medecine.unige.ch>
References: <47CD52B9.5060906@medecine.unige.ch>
Message-ID: <47CD5EC1.2020103@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> I would like to use bioperl to calculate a branch length score for a 
> given set of nodes and a tree. I know how to get the total branch length 
> by using $tree->total_branch_length, but how could I get the length of 
> the subtree spanning some given nodes which are dispersed over the whole 
> tree (a subset of nodes from the tree which are not monophyletic)?

One 'cheat' way of doing it might be to use splice(-keep_ids => 
\@node_ids) or similar, then run total_branch_length() on that. No idea 
if it will actually give you the right answer though. Let us know! :)


From bix at sendu.me.uk  Tue Mar  4 09:26:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:26:10 +0000
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
 alignment
In-Reply-To: <47CD41F7.2000401@medecine.unige.ch>
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
	<47CD41F7.2000401@medecine.unige.ch>
Message-ID: <47CD5C02.8060306@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Is it possible to remove only columns containing e.g. more than 75% gaps 
> from an alignment? I was thinking at
> 
> $aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])
> 
> This would allow me to remove all gaps or gap-only columns but not using 
> a threshold.

Well, you can use gap_col_matrix() to decide which columns you don't 
want, and then use remove_columns().


From hlapp at gmx.net  Tue Mar  4 10:24:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 10:24:13 -0500
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <87808BE4-B6A3-4C7F-A6DC-42ED2686375B@gmx.net>


On Mar 4, 2008, at 7:03 AM, Alexie Papanicolaou wrote:

> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>            else {
>                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>            }


Note that this is a warning, not an error. However, if none of  
$prefix, $seqname, $qpmax, $qpmin can be undefined (or be equal to an  
empty string, which they will default to if undefined) at this  
position, then there is a problem (and it is before the above line).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Mar  4 11:02:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 11:02:02 -0500
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>


On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:

> Daniel Gerlach wrote:
>> Hello,
>> I would like to use bioperl to calculate a branch length score for  
>> a given set of nodes and a tree. I know how to get the total  
>> branch length by using $tree->total_branch_length, but how could I  
>> get the length of the subtree spanning some given nodes which are  
>> dispersed over the whole tree (a subset of nodes from the tree  
>> which are not monophyletic)?
>
> One 'cheat' way of doing it might be to use splice(-keep_ids =>  
> \@node_ids) or similar, then run total_branch_length() on that. No  
> idea if it will actually give you the right answer though. Let us  
> know! :)

Related to that, will contract_linear_paths() actually do the right  
thing and adjust branch lengths if it removes internal nodes with  
outdegree 1?

Rutger - does Bio::Phylo handle this correctly?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 11:12:53 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 17:12:53 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <47CD7505.5080105@medecine.unige.ch>

Hello,

Thanks for the quick answer. I tried:

use Bio::TreeIO;
my $treeio = Bio::TreeIO->new(-format => 'newick',
                   -fh => \*DATA);
my $tree = $treeio->next_tree;
print $tree->total_branch_length,"\n";
$tree->splice(-keep_id => [A,B,E]);
print $tree->total_branch_length,"\n";

__DATA__
(((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);

Which gives me the message "MSG: After splicing, the original root was 
removed but there are multiple candidates for the new root!" however the 
root E was not removed.

If I do it the complementary way by splicing out all unwanted nodes - 
splice(-remove_id => [C,D]) - I get what I want:

34
25

Greetings, Daniel

Sendu Bala wrote:
> Daniel Gerlach wrote:
>> Hello,
>>
>> I would like to use bioperl to calculate a branch length score for a 
>> given set of nodes and a tree. I know how to get the total branch 
>> length by using $tree->total_branch_length, but how could I get the 
>> length of the subtree spanning some given nodes which are dispersed 
>> over the whole tree (a subset of nodes from the tree which are not 
>> monophyletic)?
> 
> One 'cheat' way of doing it might be to use splice(-keep_ids => 
> \@node_ids) or similar, then run total_branch_length() on that. No idea 
> if it will actually give you the right answer though. Let us know! :)


From bix at sendu.me.uk  Tue Mar  4 11:37:47 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 16:37:47 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
	<BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
Message-ID: <47CD7ADB.6050808@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:
> 
>> Daniel Gerlach wrote:
>>> Hello,
>>> I would like to use bioperl to calculate a branch length score for a 
>>> given set of nodes and a tree. I know how to get the total branch 
>>> length by using $tree->total_branch_length, but how could I get the 
>>> length of the subtree spanning some given nodes which are dispersed 
>>> over the whole tree (a subset of nodes from the tree which are not 
>>> monophyletic)?
>>
>> One 'cheat' way of doing it might be to use splice(-keep_ids => 
>> \@node_ids) or similar, then run total_branch_length() on that. No 
>> idea if it will actually give you the right answer though. Let us 
>> know! :)
> 
> Related to that, will contract_linear_paths() actually do the right 
> thing and adjust branch lengths if it removes internal nodes with 
> outdegree 1?

I think ultimately it boils down to remove_Descendent() being called as 
appropriate which does the branch length alteration. From a glance I 
can't answer your question with certainly, but it 'should' do the right 
thing. It needs to be tested; when I implemented these things I was only 
concerned with tree topology, not branch lengths or anything else.


From David.Messina at sbc.su.se  Tue Mar  4 15:47:06 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 4 Mar 2008 21:47:06 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
Message-ID: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>

> where do i find the
> correct names of the keys to create the input hash?


I've never used this module, but from a quick look at the code it appears to
pass on any parameters to palindrome.

I'm guessing you've already done this, but have you tried using the
parameter names and values that palindrome itself asks for?


Dave


From cjfields at uiuc.edu  Tue Mar  4 16:34:21 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 15:34:21 -0600
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
Message-ID: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>

I don't know what the current status is for OBDA, but we have several  
bugs listed for Bio::DB::Flat which need someone versed in OBDA to  
look at them (they are all interrelated):

http://bugzilla.open-bio.org/show_bug.cgi?id=2336
http://bugzilla.open-bio.org/show_bug.cgi?id=2337
http://bugzilla.open-bio.org/show_bug.cgi?id=2338
http://bugzilla.open-bio.org/show_bug.cgi?id=2339

If anyone has any input I would greatly appreciate it.  I have been  
trying to stomp as many bugs as possible so we can work on a new  
release.

chris


From bosborne11 at verizon.net  Tue Mar  4 16:42:05 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 04 Mar 2008 16:42:05 -0500
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
In-Reply-To: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
References: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
Message-ID: <BAD3A0C8-69C4-45FF-A666-527289428A58@verizon.net>

Chris,

I'll take a look at them this weekend.

Brian O.


On Mar 4, 2008, at 4:34 PM, Chris Fields wrote:

> I don't know what the current status is for OBDA, but we have  
> several bugs listed for Bio::DB::Flat which need someone versed in  
> OBDA to look at them (they are all interrelated):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2336
> http://bugzilla.open-bio.org/show_bug.cgi?id=2337
> http://bugzilla.open-bio.org/show_bug.cgi?id=2338
> http://bugzilla.open-bio.org/show_bug.cgi?id=2339
>
> If anyone has any input I would greatly appreciate it.  I have been  
> trying to stomp as many bugs as possible so we can work on a new  
> release.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anjan.purkayastha at gmail.com  Tue Mar  4 18:52:09 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Tue, 4 Mar 2008 18:52:09 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
	<628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
Message-ID: <f03f40920803041552u49a68a64gf31b90cd0159ce19@mail.gmail.com>

guys,
thanks for all your inputs.
i went to the following site:
http://www.koders.com/perl/fid5F28A3DDD453F0DB4995B7DDF304B02DBBACE0A0.aspx?s=calculate

they have the key names for most of the emboss programs.

thanks,
anjan

On Tue, Mar 4, 2008 at 3:47 PM, Dave Messina <David.Messina at sbc.su.se>
wrote:

>
> where do i find the
> > correct names of the keys to create the input hash?
>
>
>
> I've never used this module, but from a quick look at the code it appears
> to pass on any parameters to palindrome.
>
> I'm guessing you've already done this, but have you tried using the
> parameter names and values that palindrome itself asks for?
>
>
> Dave
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From staffa at niehs.nih.gov  Wed Mar  5 18:43:30 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 18:43:30 -0500
Subject: [Bioperl-l] SeqIO
Message-ID: <C3F49A52.88AD%staffa@niehs.nih.gov>

So the Howto says that Bio::SeqIO will read almost any known format
including GCG. 
So I create a GCG file with Seqlab and try to printout its sequence as a
string. ( I did guess at the way to get the sequence string:

#!/usr/bin/perl -w
use strict;
$| = 1;
use Bio::SeqIO;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;
my $sequence = $seq_object->seq;
print "$sequence\n";
my $status = &windowscore($sequence);
}

But what it returned was the entire contents of the file with no format
decoding. Have I been deluded?

NewDNALength:810March5,200818:26Type:NCheck:3368..1TGTTCGAATTCCGTGCGGTCCACCT
CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From cjfields at uiuc.edu  Wed Mar  5 21:22:53 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 20:22:53 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>

I thought GCG format changed somewhere along the way but I maybe I'm  
wrong?  Regardless, you'll have to post this as a bug (along with an  
example file).

Also, kind of odd that the sequence data wasn't checked...

chris

On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Wed Mar  5 21:33:48 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Mar 2008 18:33:48 -0800
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>

probably you should try specifying the format explicitly first- as in  
(-format => 'gcg')

-j
On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:

> I thought GCG format changed somewhere along the way but I maybe  
> I'm wrong?  Regardless, you'll have to post this as a bug (along  
> with an example file).
>
> Also, kind of odd that the sequence data wasn't checked...
>
> chris
>
> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>
>> So the Howto says that Bio::SeqIO will read almost any known format
>> including GCG.
>> So I create a GCG file with Seqlab and try to printout its  
>> sequence as a
>> string. ( I did guess at the way to get the sequence string:
>>
>> #!/usr/bin/perl -w
>> use strict;
>> $| = 1;
>> use Bio::SeqIO;
>> my $number_of_files = @ARGV;
>> if(!$number_of_files){print "no files entered\n";exit:}
>> foreach my $file (@ARGV){
>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>> my $seq_object = $seqio_object->next_seq;
>> my $sequence = $seq_object->seq;
>> print "$sequence\n";
>> my $status = &windowscore($sequence);
>> }
>>
>> But what it returned was the entire contents of the file with no  
>> format
>> decoding. Have I been deluded?
>>
>> NewDNALength:810March5,200818:26Type:NCheck: 
>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG 
>> CGAAGGT
>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC 
>> GGCTGCT
>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT 
>> GCAGAGC
>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG 
>> GCCAGCG
>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG 
>> TCCCCTG
>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 
>> 51GGCAG
>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG 
>> AGACATC
>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG 
>> CCGCCC6
>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT 
>> TCATGCG
>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG 
>> CAGCCGC
>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA 
>> GGG
>>
>>
>>
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Wed Mar  5 21:01:07 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:01:07 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <19DC527F-3D34-4F3E-9B4C-D2C6011A2C8F@verizon.net>

Nick,

Take a look at the GCG files that are used in the SeqIO tests:

bioperl-live//t/data/test.gcg
bioperl-live//t/data/test_badlf.gcg

Does the file that you created have a format like the format in those  
files? I'm guessing you're going to say 'yes', from the looks of your  
output.

Brian O.


On Mar 5, 2008, at 6:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From staffa at niehs.nih.gov  Wed Mar  5 22:09:11 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 22:09:11 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>
Message-ID: <C3F4CA87.8074%staffa@niehs.nih.gov>

Verily,
One interpretation of the docs might be: will read any format if the format
is specified.
I was hoping that I could write a program that one needn't specify format.
It'd be more user-friendly and useful.


On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:

> probably you should try specifying the format explicitly first- as in
> (-format => 'gcg')
> 
> -j
> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> 
>> I thought GCG format changed somewhere along the way but I maybe
>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>> with an example file).
>> 
>> Also, kind of odd that the sequence data wasn't checked...
>> 
>> chris
>> 
>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>> 
>>> So the Howto says that Bio::SeqIO will read almost any known format
>>> including GCG.
>>> So I create a GCG file with Seqlab and try to printout its
>>> sequence as a
>>> string. ( I did guess at the way to get the sequence string:
>>> 
>>> #!/usr/bin/perl -w
>>> use strict;
>>> $| = 1;
>>> use Bio::SeqIO;
>>> my $number_of_files = @ARGV;
>>> if(!$number_of_files){print "no files entered\n";exit:}
>>> foreach my $file (@ARGV){
>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>> my $seq_object = $seqio_object->next_seq;
>>> my $sequence = $seq_object->seq;
>>> print "$sequence\n";
>>> my $status = &windowscore($sequence);
>>> }
>>> 
>>> But what it returned was the entire contents of the file with no
>>> format
>>> decoding. Have I been deluded?
>>> 
>>> NewDNALength:810March5,200818:26Type:NCheck:
>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>> CGAAGGT
>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>> GGCTGCT
>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>> GCAGAGC
>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>> GCCAGCG
>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>> TCCCCTG
>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>> 51GGCAG
>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>> AGACATC
>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>> CCGCCC6
>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>> TCATGCG
>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>> CAGCCGC
>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>> GGG
>>> 
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Mar  5 22:44:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:44:14 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
	<1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
Message-ID: <9146DF9D-C0D6-4F18-9B7E-7BB42FCE0737@uiuc.edu>

Heh, good one!  Though Jason may have worked out the issue (not  
indicating the format explicitly).  Would be worth looking at the  
tested files.

As for dinosaurs, well I can't talk ...

chris

On Mar 5, 2008, at 8:49 PM, Brian Osborne wrote:

> Chris,
>
> Many many years ago, when dinosaurs roamed the earth, only about  
> half of the formats had their own tests. A primitive being saw this  
> and created simple tests for all the 'missing' formats. His thought  
> probably was 'this is better than nothing'. In fact this being  
> assumed that GCG was an outdated and unused format, even as long ago  
> as that time was.
>
> The origins of so much of what we now know as 'Bioperl' are  
> frequently mysterious, or incomprehensible to modern day humans...
>
> Brian O.
>
> On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:
>
>> Also, kind of odd that the sequence data wasn't checked...


From bosborne11 at verizon.net  Wed Mar  5 21:49:26 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:49:26 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>

Chris,

Many many years ago, when dinosaurs roamed the earth, only about half  
of the formats had their own tests. A primitive being saw this and  
created simple tests for all the 'missing' formats. His thought  
probably was 'this is better than nothing'. In fact this being assumed  
that GCG was an outdated and unused format, even as long ago as that  
time was.

The origins of so much of what we now know as 'Bioperl' are frequently  
mysterious, or incomprehensible to modern day humans...

Brian O.

On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:

> Also, kind of odd that the sequence data wasn't checked...


From cjfields at uiuc.edu  Wed Mar  5 22:54:15 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:54:15 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <67C6AE9D-3934-4717-A97A-4C31DB4F7E33@uiuc.edu>

You can leave off the format, but you must append the correct file  
extension for the parser to determine the correct format ('.gcg' for  
GCG, for example).  There is also Bio::Tools::GuessSeqFormat though it  
doesn't cover all formats.

chris

On Mar 5, 2008, at 9:09 PM, Staffa, Nick (NIH/NIEHS) wrote:

> Verily,
> One interpretation of the docs might be: will read any format if the  
> format
> is specified.
> I was hoping that I could write a program that one needn't specify  
> format.
> It'd be more user-friendly and useful.
>
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>
>> probably you should try specifying the format explicitly first- as in
>> (-format => 'gcg')
>>
>> -j
>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>
>>> I thought GCG format changed somewhere along the way but I maybe
>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>> with an example file).
>>>
>>> Also, kind of odd that the sequence data wasn't checked...
>>>
>>> chris
>>>
>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>
>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>> including GCG.
>>>> So I create a GCG file with Seqlab and try to printout its
>>>> sequence as a
>>>> string. ( I did guess at the way to get the sequence string:
>>>>
>>>> #!/usr/bin/perl -w
>>>> use strict;
>>>> $| = 1;
>>>> use Bio::SeqIO;
>>>> my $number_of_files = @ARGV;
>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>> foreach my $file (@ARGV){
>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>> my $seq_object = $seqio_object->next_seq;
>>>> my $sequence = $seq_object->seq;
>>>> print "$sequence\n";
>>>> my $status = &windowscore($sequence);
>>>> }
>>>>
>>>> But what it returned was the entire contents of the file with no
>>>> format
>>>> decoding. Have I been deluded?
>>>>
>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>> CGAAGGT
>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>> GGCTGCT
>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>> GCAGAGC
>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>> GCCAGCG
>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>> TCCCCTG
>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>> 51GGCAG
>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>> AGACATC
>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>> CCGCCC6
>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>> TCATGCG
>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>> CAGCCGC
>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>> GGG
>>>>
>>>>
>>>>
>>>> Nick Staffa
>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>> Scientific Computing Support Group
>>>> NIEHS Information Technology Support Services Contract
>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>> National Institute of Environmental Health Sciences
>>>> National Institutes of Health
>>>> Research Triangle Park, North Carolina
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at gmail.com  Thu Mar  6 03:16:25 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 6 Mar 2008 16:16:25 +0800
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
Message-ID: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>

Dear experts,

Is there any? The TRANSFAC text file which contain entry like this.
Especially we wich to capture the PWM for each of the Transcription
factor.

Regards,
Edward

__BEGIN__
VV  TRANSFAC MATRIX TABLE, Release 11.1 - licensed - 2007-03-31, (C)
Biobase GmbH
XX
//
AC  M00001
XX
ID  V$MYOD_01
XX
DT  19.10.1992 (created); ewi.
DT  22.10.1997 (updated); dbo.
CO  Copyright (C), Biobase GmbH.
XX
NA  MyoD
XX
DE  myoblast determination gene product
XX
BF  T00526; MyoD; Species: mouse, Mus musculus.
BF  T09177; MyoD; Species: mouse, Mus musculus.
XX
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G

....etc....


From watashi at post.com  Thu Mar  6 07:06:42 2008
From: watashi at post.com (Masa Masa)
Date: Thu, 6 Mar 2008 07:06:42 -0500
Subject: [Bioperl-l] failure of add_seqfeature
Message-ID: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>

Dear experts,

Would anybody know why the following codes generate an error of:


------------- EXCEPTION  -------------
MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained within parent feature, and expansion is not valid
STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
STACK toplevel test.pl:118

--------------------------------------
15616 15693
79568 83016

=================


use Bio::Graphics;                                                                    
use Bio::SeqFeature::Generic;                                                         
use Bio::SeqIO;                         


my $bsg = 'Bio::SeqFeature::Generic';                                                 
                                                                                      
my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], -display_name=>'U');      
                                                                                      
for (my $i=0; $i < @from; $i++) {                                                     
print "$from[$i] $to[$i]\n";                                                            
    $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to[$i]));            
    if ($i > 10) {                                                                    
       exit;                                                                                 
    }
}                

-- 
Want an e-mail address like mine?
Get a free e-mail account today at www.mail.com!


From heikki at sanbi.ac.za  Thu Mar  6 07:20:03 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 6 Mar 2008 14:20:03 +0200
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <200803061420.04123.heikki@sanbi.ac.za>


Nick,

This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:

/Length: .*Type: .*Check: .*\.\.$/

It is the second  line in GCG file. If first line matches to some other format 
regex, this will not not be evaluated.

Let us know,

	-Heikki

On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> Verily,
> One interpretation of the docs might be: will read any format if the format
> is specified.
> I was hoping that I could write a program that one needn't specify format.
> It'd be more user-friendly and useful.
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> > probably you should try specifying the format explicitly first- as in
> > (-format => 'gcg')
> >
> > -j
> >
> > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >> I thought GCG format changed somewhere along the way but I maybe
> >> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >> with an example file).
> >>
> >> Also, kind of odd that the sequence data wasn't checked...
> >>
> >> chris
> >>
> >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>> So the Howto says that Bio::SeqIO will read almost any known format
> >>> including GCG.
> >>> So I create a GCG file with Seqlab and try to printout its
> >>> sequence as a
> >>> string. ( I did guess at the way to get the sequence string:
> >>>
> >>> #!/usr/bin/perl -w
> >>> use strict;
> >>> $| = 1;
> >>> use Bio::SeqIO;
> >>> my $number_of_files = @ARGV;
> >>> if(!$number_of_files){print "no files entered\n";exit:}
> >>> foreach my $file (@ARGV){
> >>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>> my $seq_object = $seqio_object->next_seq;
> >>> my $sequence = $seq_object->seq;
> >>> print "$sequence\n";
> >>> my $status = &windowscore($sequence);
> >>> }
> >>>
> >>> But what it returned was the entire contents of the file with no
> >>> format
> >>> decoding. Have I been deluded?
> >>>
> >>> NewDNALength:810March5,200818:26Type:NCheck:
> >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>> CGAAGGT
> >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>> GGCTGCT
> >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>> GCAGAGC
> >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>> GCCAGCG
> >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>> TCCCCTG
> >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>> 51GGCAG
> >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>> AGACATC
> >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>> CCGCCC6
> >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>> TCATGCG
> >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>> CAGCCGC
> >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>> GGG
> >>>
> >>>
> >>>
> >>> Nick Staffa
> >>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>> Scientific Computing Support Group
> >>> NIEHS Information Technology Support Services Contract
> >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>> National Institute of Environmental Health Sciences
> >>> National Institutes of Health
> >>> Research Triangle Park, North Carolina
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bix at sendu.me.uk  Thu Mar  6 08:07:21 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 13:07:21 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
Message-ID: <47CFEC89.1000705@sendu.me.uk>

Edward Wijaya wrote:
> Dear experts,
> 
> Is there any? The TRANSFAC text file which contain entry like this.
> Especially we wich to capture the PWM for each of the Transcription
> factor.

Yes; I've written a module to do this, I just haven't committed it yet 
because certain things aren't quite right in terms of the API. But to 
just grab the PWM it should work fine. If you want I can email you the 
modules.


From sdavis2 at mail.nih.gov  Thu Mar  6 08:40:25 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 6 Mar 2008 08:40:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47CFEC89.1000705@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
Message-ID: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>

On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> Edward Wijaya wrote:
>  > Dear experts,
>  >
>  > Is there any? The TRANSFAC text file which contain entry like this.
>  > Especially we wich to capture the PWM for each of the Transcription
>  > factor.
>
>  Yes; I've written a module to do this, I just haven't committed it yet
>  because certain things aren't quite right in terms of the API. But to
>  just grab the PWM it should work fine. If you want I can email you the
>  modules.

I believe there are a set of non-bioperl modules called TFBS.  See
here (although I'm not sure this is the most up-to-date site):

http://tfbs.genereg.net/

Sean


From David.Messina at sbc.su.se  Thu Mar  6 09:55:24 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 6 Mar 2008 15:55:24 +0100
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <628aabb70803060655k5245296etf5ee2f31755230d3@mail.gmail.com>

Hi Masa,

Could you give us a little more information?

A complete test case (the code you included doesn't run because for example
the @from array doesn't exist) and input file would be helpful, as well as
the version of BioPerl you are using.

Dave


From staffa at niehs.nih.gov  Thu Mar  6 10:23:34 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 10:23:34 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <200803061420.04123.heikki@sanbi.ac.za>
Message-ID: <C3F576A6.88C9%staffa@niehs.nih.gov>

Here's the scoop:
When I use Jason's suggestion, (-format => 'gcg'),
My program works without complaint on the original file that looks like:
!!NA_SEQUENCE 1.0
   NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..

       1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
et c.

BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
(which should be retro-gcg format (before version 11?)),
my program runs, but there IS a complaint:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line 1.
BUT 
If I remove (-format => 'gcg'),  I get no complaint, but the sequence
returned still has its numbers imbedded. This effects my calculations.

Thanks, at least i know what my options are.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:

> 
> Nick,
> 
> This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:
> 
> /Length: .*Type: .*Check: .*\.\.$/
> 
> It is the second  line in GCG file. If first line matches to some other format
> regex, this will not not be evaluated.
> 
> Let us know,
> 
> -Heikki
> 
> On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
>> Verily,
>> One interpretation of the docs might be: will read any format if the format
>> is specified.
>> I was hoping that I could write a program that one needn't specify format.
>> It'd be more user-friendly and useful.
>> 
>> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>>> probably you should try specifying the format explicitly first- as in
>>> (-format => 'gcg')
>>> 
>>> -j
>>> 
>>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>>> I thought GCG format changed somewhere along the way but I maybe
>>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>>> with an example file).
>>>> 
>>>> Also, kind of odd that the sequence data wasn't checked...
>>>> 
>>>> chris
>>>> 
>>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>>> including GCG.
>>>>> So I create a GCG file with Seqlab and try to printout its
>>>>> sequence as a
>>>>> string. ( I did guess at the way to get the sequence string:
>>>>> 
>>>>> #!/usr/bin/perl -w
>>>>> use strict;
>>>>> $| = 1;
>>>>> use Bio::SeqIO;
>>>>> my $number_of_files = @ARGV;
>>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>>> foreach my $file (@ARGV){
>>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>>> my $seq_object = $seqio_object->next_seq;
>>>>> my $sequence = $seq_object->seq;
>>>>> print "$sequence\n";
>>>>> my $status = &windowscore($sequence);
>>>>> }
>>>>> 
>>>>> But what it returned was the entire contents of the file with no
>>>>> format
>>>>> decoding. Have I been deluded?
>>>>> 
>>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>>> CGAAGGT
>>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>>> GGCTGCT
>>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>>> GCAGAGC
>>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>>> GCCAGCG
>>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>>> TCCCCTG
>>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>>> 51GGCAG
>>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>>> AGACATC
>>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>>> CCGCCC6
>>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>>> TCATGCG
>>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>>> CAGCCGC
>>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>>> GGG
>>>>> 
>>>>> 
>>>>> 
>>>>> Nick Staffa
>>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>>> Scientific Computing Support Group
>>>>> NIEHS Information Technology Support Services Contract
>>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>>> National Institute of Environmental Health Sciences
>>>>> National Institutes of Health
>>>>> Research Triangle Park, North Carolina
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From hlapp at gmx.net  Thu Mar  6 10:26:52 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 10:26:52 -0500
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <6BD917FC-803E-471B-A0C4-219286E53C47@gmx.net>

It seems you are adding subfeatures with a location that is not  
within their parent feature location. If that's indeed what you want  
to do, add the 'EXPAND' argument.

Excerpted from the POD of Bio::SeqFeature::Generic:

         Usage   : $feat->add_SeqFeature($subfeat);
                   $feat->add_SeqFeature($subfeat,'EXPAND')
         Function: adds a SeqFeature into the subSeqFeature array.
                   with no 'EXPAND' qualifer, subfeat will be tested
                   as to whether it lies inside the parent, and throw
                   an exception if not.

                   If EXPAND is used, the parent's start/end/strand will
                   be adjusted so that it grows to accommodate the new
                   subFeature


On Mar 6, 2008, at 7:06 AM, Masa Masa wrote:

> Dear experts,
>
> Would anybody know why the following codes generate an error of:
>
>
> ------------- EXCEPTION  -------------
> MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained  
> within parent feature, and expansion is not valid
> STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/ 
> site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
> STACK toplevel test.pl:118
>
> --------------------------------------
> 15616 15693
> 79568 83016
>
> =================
>
>
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use Bio::SeqIO;
>
>
> my $bsg = 'Bio::SeqFeature::Generic';
>
> my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], - 
> display_name=>'U');
>
> for (my $i=0; $i < @from; $i++) {
> print "$from[$i] $to[$i]\n";
>     $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to 
> [$i]));
>     if ($i > 10) {
>        exit;
>     }
> }
>
> -- 
> Want an e-mail address like mine?
> Get a free e-mail account today at www.mail.com!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Mar  6 10:41:49 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 15:41:49 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>	
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
Message-ID: <47D010BD.4000801@sendu.me.uk>

Sean Davis wrote:
> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>> Edward Wijaya wrote:
>>  > Dear experts,
>>  >
>>  > Is there any? The TRANSFAC text file which contain entry like this.
>>  > Especially we wich to capture the PWM for each of the Transcription
>>  > factor.
>>
>>  Yes; I've written a module to do this, I just haven't committed it yet
>>  because certain things aren't quite right in terms of the API. But to
>>  just grab the PWM it should work fine. If you want I can email you the
>>  modules.
> 
> I believe there are a set of non-bioperl modules called TFBS.  See
> here (although I'm not sure this is the most up-to-date site):
> 
> http://tfbs.genereg.net/

I believe it's out of date enough to not work on the latest Transfac 
data, though I haven't used tried to confirm.

At any rate, the Transfac (Pro) database is pretty strange and 
complicated, and the TFBS modules certainly don't let you access 
everything in the way you might want or expect.


From cain.cshl at gmail.com  Thu Mar  6 11:43:35 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:43:35 -0500
Subject: [Bioperl-l] anonymous cvs?
Message-ID: <1204821815.6689.7.camel@frissell>

Hi All,

So now that the transition to svn is complete (and I like it), should
anonymous cvs still be working?  I believe there was discussion about
keeping it going via mirroring, and I hope that is the case.  It will
make life a little easier for people who want to do automated installs
of GBrowse and would like to use the installer script to get bioperl via
anon cvs.  If anon cvs is no longer available, does anyone have
suggestions for the best route to take for getting command line svn on
Windows?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 11:48:08 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:48:08 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <1204822088.6689.8.camel@frissell>

I should have mentioned that I tried it and it is not currently working:

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl checkout bioperl-live 
can't create temporary directory /tmp/cvs-serv32067
No space left on device


On Thu, 2008-03-06 at 11:43 -0500, Scott Cain wrote:
> Hi All,
> 
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
> 
> Thanks,
> Scott
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From Marc.Logghe at ablynx.com  Thu Mar  6 11:22:10 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 6 Mar 2008 17:22:10 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F576A6.88C9%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>

Hi Nick,
I don't think you should leave out the -format option. You have to leave
it in but the format should be provided by the B::T::GuessSeqFormat
object.
Something like:

#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::Tools::GuessSeqFormat;

$| = 1;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
  my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
  my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
$guesser->guess);
  my $seq_object = $seqio_object->next_seq;
  my $sequence = $seq_object->seq;
  print "$sequence\n";
}

HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> Sent: donderdag 6 maart 2008 16:24
> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Here's the scoop:
> When I use Jason's suggestion, (-format => 'gcg'),
> My program works without complaint on the original file that looks
like:
> !!NA_SEQUENCE 1.0
>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
> 
>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> et c.
> 
> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> (which should be retro-gcg format (before version 11?)),
> my program runs, but there IS a complaint:
> Use of uninitialized value in scalar chomp at
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
1.
> BUT
> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
> returned still has its numbers imbedded. This effects my calculations.
> 
> Thanks, at least i know what my options are.
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:
> 
> >
> > Nick,
> >
> > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a
gcg
> file:
> >
> > /Length: .*Type: .*Check: .*\.\.$/
> >
> > It is the second  line in GCG file. If first line matches to some
other
> format
> > regex, this will not not be evaluated.
> >
> > Let us know,
> >
> > -Heikki
> >
> > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> >> Verily,
> >> One interpretation of the docs might be: will read any format if
the
> format
> >> is specified.
> >> I was hoping that I could write a program that one needn't specify
> format.
> >> It'd be more user-friendly and useful.
> >>
> >> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> >>> probably you should try specifying the format explicitly first- as
in
> >>> (-format => 'gcg')
> >>>
> >>> -j
> >>>
> >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >>>> I thought GCG format changed somewhere along the way but I maybe
> >>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >>>> with an example file).
> >>>>
> >>>> Also, kind of odd that the sequence data wasn't checked...
> >>>>
> >>>> chris
> >>>>
> >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>>>> So the Howto says that Bio::SeqIO will read almost any known
format
> >>>>> including GCG.
> >>>>> So I create a GCG file with Seqlab and try to printout its
> >>>>> sequence as a
> >>>>> string. ( I did guess at the way to get the sequence string:
> >>>>>
> >>>>> #!/usr/bin/perl -w
> >>>>> use strict;
> >>>>> $| = 1;
> >>>>> use Bio::SeqIO;
> >>>>> my $number_of_files = @ARGV;
> >>>>> if(!$number_of_files){print "no files entered\n";exit:}
> >>>>> foreach my $file (@ARGV){
> >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>>>> my $seq_object = $seqio_object->next_seq;
> >>>>> my $sequence = $seq_object->seq;
> >>>>> print "$sequence\n";
> >>>>> my $status = &windowscore($sequence);
> >>>>> }
> >>>>>
> >>>>> But what it returned was the entire contents of the file with no
> >>>>> format
> >>>>> decoding. Have I been deluded?
> >>>>>
> >>>>> NewDNALength:810March5,200818:26Type:NCheck:
> >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>>>>
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>>>> CGAAGGT
> >>>>>
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>>>> GGCTGCT
> >>>>>
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>>>> GCAGAGC
> >>>>>
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>>>> GCCAGCG
> >>>>>
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>>>> TCCCCTG
> >>>>>
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>>>> 51GGCAG
> >>>>>
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>>>> AGACATC
> >>>>>
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>>>> CCGCCC6
> >>>>>
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>>>> TCATGCG
> >>>>>
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>>>> CAGCCGC
> >>>>>
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>>>> GGG
> >>>>>
> >>>>>
> >>>>>
> >>>>> Nick Staffa
> >>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>>>> Scientific Computing Support Group
> >>>>> NIEHS Information Technology Support Services Contract
> >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>>>> National Institute of Environmental Health Sciences
> >>>>> National Institutes of Health
> >>>>> Research Triangle Park, North Carolina
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher
> >>>> Lab of Dr. Robert Switzer
> >>>> Dept of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stefan.kirov at bms.com  Thu Mar  6 10:51:25 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 06 Mar 2008 10:51:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <47D012FD.7090600@bms.com>

Sendu Bala wrote:
> Sean Davis wrote:
>> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>>> Edward Wijaya wrote:
>>> > Dear experts,
>>> >
>>> > Is there any? The TRANSFAC text file which contain entry like this.
>>> > Especially we wich to capture the PWM for each of the Transcription
>>> > factor.
>>>
>>> Yes; I've written a module to do this, I just haven't committed it yet
>>> because certain things aren't quite right in terms of the API. But to
>>> just grab the PWM it should work fine. If you want I can email you the
>>> modules.
>>
>> I believe there are a set of non-bioperl modules called TFBS. See
>> here (although I'm not sure this is the most up-to-date site):
>>
>> http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest Transfac
> data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange and
> complicated, and the TFBS modules certainly don't let you access
> everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
Also be careful: there is a difference between PFM and PWM. Getting PWM
through most programs I have encountered will assume random distribution
(0.25 per each position in the background), unless you specify your own.
This could be something you may be comfortable with, but you definitely
should be aware of.


From jay at jays.net  Thu Mar  6 12:03:51 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 06 Mar 2008 11:03:51 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <47D023F7.4000803@jays.net>

Scott Cain wrote:
> It will make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.

Those installer scripts can't use anon SVN instead?

> If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>   

At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
there isn't an SVN equivalent?

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From whs at ebi.ac.uk  Thu Mar  6 12:08:51 2008
From: whs at ebi.ac.uk (William Spooner)
Date: Thu, 6 Mar 2008 17:08:51 +0000
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <07E3119E-0354-4E93-9980-3CB2B26DF2BE@ebi.ac.uk>

This will be important for Ensembl as well. As far as I know all of  
their install docs refer to BioPerl's anonymous CVS.

On 6 Mar 2008, at 16:43, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
William Spooner
Visiting Scientist
whs at ebi.ac.uk


From MEC at stowers-institute.org  Thu Mar  6 11:58:57 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 6 Mar 2008 10:58:57 -0600
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <BD62CBAC4395B94096109020651BE2EC11CE393C@exchmb-02.stowers-institute.org>

we use TFBS all the time against data coming from a recent local install of TRANSFAC(r) Professional 11.1 (2007-03-31)

the most recent is 11.4 (2007-12-14)

TFBS::* has the nice advantage that you can interoperate Transfac pwms with other (say, Jaspar) matrices and/or simple consesus sequence patterns; and it COULD be fairly easily extended to allow interoperation with other sources, say cisRED.  "One interface to rule them all" - bwa ha ha.

However, if you DO have locally installed Transfac (Pro) ($$), and want to use just it, then you should know that you can also call their `match` routines from the unix command line (though this is not documented to my knowledge).  I can supply my cheat sheet or otherwise advise if desired.  Also, if you go this way, I've written the requisite TFMatchOut2GFF to convert TRANSFAC match's output to GFF, if it suits your purpose, which I could release if asked.

If you want to use TFBS::**, I have written a command-line wrapper for the TFBS perl modules that might give you a leg up if you decide to use TFBS::**.  I could release them too, if useful.

But I agree, if I recall, TFBS::* were dropped from ongoing active development due to issues with data access policys.  And, I think that they no longer with with remotely hosted Transfac.  They did a few years ago. I think I tested a while ago and found that they do not.

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, March 06, 2008 9:42 AM
> To: Sean Davis
> Cc: bioperl-l at lists.open-bio.org; Edward Wijaya
> Subject: Re: [Bioperl-l] BioPerl Module to Parse Transfac
> Flat File Database
>
> Sean Davis wrote:
> > On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> >> Edward Wijaya wrote:
> >>  > Dear experts,
> >>  >
> >>  > Is there any? The TRANSFAC text file which contain
> entry like this.
> >>  > Especially we wich to capture the PWM for each of the
> >> Transcription  > factor.
> >>
> >>  Yes; I've written a module to do this, I just haven't
> committed it
> >> yet  because certain things aren't quite right in terms of
> the API.
> >> But to  just grab the PWM it should work fine. If you want I can
> >> email you the  modules.
> >
> > I believe there are a set of non-bioperl modules called TFBS.  See
> > here (although I'm not sure this is the most up-to-date site):
> >
> > http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest
> Transfac data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange
> and complicated, and the TFBS modules certainly don't let you
> access everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Mar  6 12:10:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:10:35 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>

BioPerl CVS is no longer being updated; you have to use Subversion to  
grab the latest (we have anon. svn set up for this).  We discussed  
syncing svn commits over to cvs but found it way too problematic and  
decided to make a clean break.

The best option I can think of as a replacement (so everyone isn't  
dependent on installing svn to get Gbrowse and bioperl-live) is to get  
a cron job set up which drops a bioperl-live archive into bioperl.org/ 
DIST or bioperl.org/SRC.  We have already talked about doing this for  
nightly builds from svn main trunk; we can probably set that up on our  
end.  Would that be feasible as a fallback in case svn isn't present?

The subversion project page has information on Windows versions:

http://subversion.tigris.org/project_packages.html

chris

On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cain.cshl at gmail.com  Thu Mar  6 12:22:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:22:29 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
Message-ID: <1204824149.6689.14.camel@frissell>

Hi Chris,

I think a nightly generated tarball would be sufficient for my use.  We
used anon cvs to get the lastest bioperl and then threw it away once it
was installed, so a tarball is just as good,if not better, since users
wouldn't need to install svn.  Not needing to install svn is good thing
for all my users, since I think many distributions do not supply it by
default.

Thanks,
Scott


On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
> BioPerl CVS is no longer being updated; you have to use Subversion to  
> grab the latest (we have anon. svn set up for this).  We discussed  
> syncing svn commits over to cvs but found it way too problematic and  
> decided to make a clean break.
> 
> The best option I can think of as a replacement (so everyone isn't  
> dependent on installing svn to get Gbrowse and bioperl-live) is to get  
> a cron job set up which drops a bioperl-live archive into bioperl.org/ 
> DIST or bioperl.org/SRC.  We have already talked about doing this for  
> nightly builds from svn main trunk; we can probably set that up on our  
> end.  Would that be feasible as a fallback in case svn isn't present?
> 
> The subversion project page has information on Windows versions:
> 
> http://subversion.tigris.org/project_packages.html
> 
> chris
> 
> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
> 
> > Hi All,
> >
> > So now that the transition to svn is complete (and I like it), should
> > anonymous cvs still be working?  I believe there was discussion about
> > keeping it going via mirroring, and I hope that is the case.  It will
> > make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl  
> > via
> > anon cvs.  If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >
> > Thanks,
> > Scott
> >
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 12:28:13 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:28:13 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <47D023F7.4000803@jays.net>
References: <1204821815.6689.7.camel@frissell>  <47D023F7.4000803@jays.net>
Message-ID: <1204824493.6689.19.camel@frissell>

Hi Jay,

It could use anon svn, though svn is considerably less ubiquitous, so it
effectively adds another prerequisite.

For cvs, the GUI WinCVS provides command line cvs as well.  I was
wondering if there was an easy to install equivalent for svn, though it
may be moot for me if the powers that be will provide a nightly
tarball :-)

Scott

On Thu, 2008-03-06 at 11:03 -0600, Jay Hannah wrote:
> Scott Cain wrote:
> > It will make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl via
> > anon cvs.
> 
> Those installer scripts can't use anon SVN instead?
> 
> > If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >   
> 
> At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
> and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
> there isn't an SVN equivalent?
> 
> j
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Thu Mar  6 12:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:28:36 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204824149.6689.14.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
Message-ID: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>

I'm working on the nightly build script now and will post back when  
everything is set up.

chris

On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:

> Hi Chris,
>
> I think a nightly generated tarball would be sufficient for my use.   
> We
> used anon cvs to get the lastest bioperl and then threw it away once  
> it
> was installed, so a tarball is just as good,if not better, since users
> wouldn't need to install svn.  Not needing to install svn is good  
> thing
> for all my users, since I think many distributions do not supply it by
> default.
>
> Thanks,
> Scott
>
>
>
> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>> BioPerl CVS is no longer being updated; you have to use Subversion to
>> grab the latest (we have anon. svn set up for this).  We discussed
>> syncing svn commits over to cvs but found it way too problematic and
>> decided to make a clean break.
>>
>> The best option I can think of as a replacement (so everyone isn't
>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>> get
>> a cron job set up which drops a bioperl-live archive into  
>> bioperl.org/
>> DIST or bioperl.org/SRC.  We have already talked about doing this for
>> nightly builds from svn main trunk; we can probably set that up on  
>> our
>> end.  Would that be feasible as a fallback in case svn isn't present?
>>
>> The subversion project page has information on Windows versions:
>>
>> http://subversion.tigris.org/project_packages.html
>>
>> chris
>>
>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>
>>> Hi All,
>>>
>>> So now that the transition to svn is complete (and I like it),  
>>> should
>>> anonymous cvs still be working?  I believe there was discussion  
>>> about
>>> keeping it going via mirroring, and I hope that is the case.  It  
>>> will
>>> make life a little easier for people who want to do automated  
>>> installs
>>> of GBrowse and would like to use the installer script to get bioperl
>>> via
>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>> suggestions for the best route to take for getting command line  
>>> svn on
>>> Windows?
>>>
>>> Thanks,
>>> Scott
>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>> GMOD Coordinator (http://www.gmod.org/)
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 15:38:22 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 14:38:22 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
	<ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
Message-ID: <2F746C5B-902C-4510-AEA3-2C46D4F51E7A@uiuc.edu>

Okay, I have set up nightly builds for bioperl-live, db, network, and  
run here:

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

At the moment this is running via a crontab off a script in my portal  
account, retrieving everything via anon. svn and bundling it up into  
zip and tarball archives.  I would like to set it up to grab  
everything off dev but I don't want to mess with my ssh setup, so if  
anyone has ideas there...

The script also adds a CHANGELOG file (last 10 commits) and removes  
the .svn directories prior to bundling.  The archive name has the  
subversion revision number and date included; md5 checksums are in the  
SIGNATURES file.  I'll check on it again tomorrow to make sure cron  
ran it.

We can probably set up automated PPM builds as well; might be worth  
testing down the road (we need a way to set defaults for Build args  
prior to getting that running).

chris

On Mar 6, 2008, at 11:28 AM, Chris Fields wrote:

> I'm working on the nightly build script now and will post back when  
> everything is set up.
>
> chris
>
> On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:
>
>> Hi Chris,
>>
>> I think a nightly generated tarball would be sufficient for my  
>> use.  We
>> used anon cvs to get the lastest bioperl and then threw it away  
>> once it
>> was installed, so a tarball is just as good,if not better, since  
>> users
>> wouldn't need to install svn.  Not needing to install svn is good  
>> thing
>> for all my users, since I think many distributions do not supply it  
>> by
>> default.
>>
>> Thanks,
>> Scott
>>
>>
>>
>> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>>> BioPerl CVS is no longer being updated; you have to use Subversion  
>>> to
>>> grab the latest (we have anon. svn set up for this).  We discussed
>>> syncing svn commits over to cvs but found it way too problematic and
>>> decided to make a clean break.
>>>
>>> The best option I can think of as a replacement (so everyone isn't
>>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>>> get
>>> a cron job set up which drops a bioperl-live archive into  
>>> bioperl.org/
>>> DIST or bioperl.org/SRC.  We have already talked about doing this  
>>> for
>>> nightly builds from svn main trunk; we can probably set that up on  
>>> our
>>> end.  Would that be feasible as a fallback in case svn isn't  
>>> present?
>>>
>>> The subversion project page has information on Windows versions:
>>>
>>> http://subversion.tigris.org/project_packages.html
>>>
>>> chris
>>>
>>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>>
>>>> Hi All,
>>>>
>>>> So now that the transition to svn is complete (and I like it),  
>>>> should
>>>> anonymous cvs still be working?  I believe there was discussion  
>>>> about
>>>> keeping it going via mirroring, and I hope that is the case.  It  
>>>> will
>>>> make life a little easier for people who want to do automated  
>>>> installs
>>>> of GBrowse and would like to use the installer script to get  
>>>> bioperl
>>>> via
>>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>>> suggestions for the best route to take for getting command line  
>>>> svn on
>>>> Windows?
>>>>
>>>> Thanks,
>>>> Scott
>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 16:48:37 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 15:48:37 -0600
Subject: [Bioperl-l] Nightly build archives now available
Message-ID: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>

We now have nightly bundled archives for bioperl-live, bioperl-db,  
bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
am every night.

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

The archives are date-stamped and also have the Subversion revision,  
just in case one wanted to ensure they get the correct version for the  
bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
(if there are any).  These are currently derived off the anon. svn  
repository.

chris


From David.Messina at sbc.su.se  Thu Mar  6 18:50:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 00:50:04 +0100
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>

Very slick and well-thought-out, Chris -- nice job!


Dave


From hlapp at gmx.net  Thu Mar  6 19:06:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 19:06:41 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <E39ADEB2-1FE7-4024-8188-EFF98F17BF52@gmx.net>

Awesome - thanks for doing this, Chris! -hilmar

On Mar 6, 2008, at 4:48 PM, Chris Fields wrote:

> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~  
> 1:00 am every night.
>
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>
> The archives are date-stamped and also have the Subversion  
> revision, just in case one wanted to ensure they get the correct  
> version for the bug fix.  They also contain a CHANGELOG file for  
> the last 10 revisions (if there are any).  These are currently  
> derived off the anon. svn repository.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From staffa at niehs.nih.gov  Thu Mar  6 18:27:31 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 18:27:31 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>
Message-ID: <C3F5E813.8903%staffa@niehs.nih.gov>

Thanks 
I really appreciate all the interest given and help generated.
that sure sounds like a great idea, but i think
Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
Is there a substitute?
It works great with
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.

as seen in:
gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
guesser guesses gcg
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAGGGCA
GAGCCTCGGGA  et c.
(yes, I added
my $file_type = $guesser->guess;
print "guesser guesses $file_type\n";
)

BUT
when applied to a genbank sequence passed thru the Seqlab editor and turned
into GCG, to wit:
!!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993
DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate kinase
            (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3).
ACCESSION   X05246 Y00261
...
...
BASE COUNT      583 a    367 c    442 g    519 t
ORIGIN

 HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156  ..

       1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
et c.

It thinks it is a flawed PIR:

gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
guesser guesses pir

------------- EXCEPTION  -------------
MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993


Must look at why guesser is thinking PIR.


On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:

> Hi Nick,
> I don't think you should leave out the -format option. You have to leave
> it in but the format should be provided by the B::T::GuessSeqFormat
> object.
> Something like:
> 
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Tools::GuessSeqFormat;
> 
> $| = 1;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
>   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
>   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
> $guesser->guess);
>   my $seq_object = $seqio_object->next_seq;
>   my $sequence = $seq_object->seq;
>   print "$sequence\n";
> }
> 
> HTH,
> Marc
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
>> Sent: donderdag 6 maart 2008 16:24
>> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
>> Cc: Chris Fields
>> Subject: Re: [Bioperl-l] SeqIO
>> 
>> Here's the scoop:
>> When I use Jason's suggestion, (-format => 'gcg'),
>> My program works without complaint on the original file that looks
> like:
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.
>> 
>> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
>> (which should be retro-gcg format (before version 11?)),
>> my program runs, but there IS a complaint:
>> Use of uninitialized value in scalar chomp at
>> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
> 1.
>> BUT
>> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
>> returned still has its numbers imbedded. This effects my calculations.
>> 
>> Thanks, at least i know what my options are.
>> 
>> 
>> 
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>


From cjfields at uiuc.edu  Thu Mar  6 23:32:39 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 22:32:39 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
Message-ID: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>

I would like to get automated PPM builds set up as well but I think we  
have to rework some Build.PL stuff to get that going.

The next thing is to set up a regular script to check test/POD coverage.

chris

On Mar 6, 2008, at 5:50 PM, Dave Messina wrote:

> Very slick and well-thought-out, Chris -- nice job!
>
>
> Dave


From Marc.Logghe at ablynx.com  Fri Mar  7 04:04:35 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Mar 2008 10:04:35 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F5E813.8903%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A938@alpaca.lan.ablynx.com>

Ahh, my reply did not make much sense when I took a new look. I was the
one who learnt something here :-)
Did not know that Bio::SeqIO was already using B::T::GuessSeqFormat
under the hood. Learnt as well that you have to be careful with the
filename extension because this seems to have precedence.
Regards,
Marc


> -----Original Message-----
> From: Staffa, Nick (NIH/NIEHS) [mailto:staffa at niehs.nih.gov]
> Sent: vrijdag 7 maart 2008 0:28
> To: Marc Logghe; Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Thanks
> I really appreciate all the interest given and help generated.
> that sure sounds like a great idea, but i think
> Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
> Is there a substitute?
> It works great with
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> 
> as seen in:
> gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
> guesser guesses gcg
>
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAG
GG
> CA
> GAGCCTCGGGA  et c.
> (yes, I added
> my $file_type = $guesser->guess;
> print "guesser guesses $file_type\n";
> )
> 
> BUT
> when applied to a genbank sequence passed thru the Seqlab editor and
> turned
> into GCG, to wit:
> !!NA_SEQUENCE 1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate
kinase
>             (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC
2.7.2.3).
> ACCESSION   X05246 Y00261
> ...
> ...
> BASE COUNT      583 a    367 c    442 g    519 t
> ORIGIN
> 
>  HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156
..
> 
>        1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
> et c.
> 
> It thinks it is a flawed PIR:
> 
> gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
> guesser guesses pir
> 
> ------------- EXCEPTION  -------------
> MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE
1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> 
> 
> Must look at why guesser is thinking PIR.
> 
> 
> 
> 
> On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:
> 
> > Hi Nick,
> > I don't think you should leave out the -format option. You have to
leave
> > it in but the format should be provided by the B::T::GuessSeqFormat
> > object.
> > Something like:
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GuessSeqFormat;
> >
> > $| = 1;
> > my $number_of_files = @ARGV;
> > if(!$number_of_files){print "no files entered\n";exit:}
> > foreach my $file (@ARGV){
> >   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
> >   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file,
-format =>
> > $guesser->guess);
> >   my $seq_object = $seqio_object->next_seq;
> >   my $sequence = $seq_object->seq;
> >   print "$sequence\n";
> > }
> >
> > HTH,
> > Marc
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> >> Sent: donderdag 6 maart 2008 16:24
> >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> >> Cc: Chris Fields
> >> Subject: Re: [Bioperl-l] SeqIO
> >>
> >> Here's the scoop:
> >> When I use Jason's suggestion, (-format => 'gcg'),
> >> My program works without complaint on the original file that looks
> > like:
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> >>
> >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> >> (which should be retro-gcg format (before version 11?)),
> >> my program runs, but there IS a complaint:
> >> Use of uninitialized value in scalar chomp at
> >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0>
line
> > 1.
> >> BUT
> >> If I remove (-format => 'gcg'),  I get no complaint, but the
sequence
> >> returned still has its numbers imbedded. This effects my
calculations.
> >>
> >> Thanks, at least i know what my options are.
> >>
> >>
> >>
> >> Nick Staffa
> >> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >> Scientific Computing Support Group
> >> NIEHS Information Technology Support Services Contract
> >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >> National Institute of Environmental Health Sciences
> >> National Institutes of Health
> >> Research Triangle Park, North Carolina
> >


From bix at sendu.me.uk  Fri Mar  7 05:32:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 10:32:01 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
Message-ID: <47D119A1.10408@sendu.me.uk>

Chris Fields wrote:
> I would like to get automated PPM builds set up as well but I think we 
> have to rework some Build.PL stuff to get that going.

What's the hold-up on that front?


From heikki at sanbi.ac.za  Fri Mar  7 06:09:25 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 7 Mar 2008 13:09:25 +0200
Subject: [Bioperl-l] BioSQL V1.0.0 released
Message-ID: <200803071309.25294.heikki@sanbi.ac.za>

BIOSQL V1.0.0 RELEASED
http://news.open-bio.org/archives/2008_03.html#000094


Congratulations, Hilmar!

		-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Fri Mar  7 08:53:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:53:50 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D119A1.10408@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
Message-ID: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>

I haven't tried it out yet, to tell the truth.  The worry I have is  
prompting during the build process for database tests, networking, etc.

I have looked for it, but couldn't determine whether we have a way to  
run 'perl Build.PL' and bypass prompts with passed arguments.  The  
only one I could find was 'network', for network tests.

Scott Cain and I have corresponded about this before, i.e. it would be  
nice to have boolean flags for each prompt (prereqs, database tests,  
scripts, network, etc).  For nightly PPMs I would forego tests and  
include scripts.

chris

On Mar 7, 2008, at 4:32 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I would like to get automated PPM builds set up as well but I think  
>> we have to rework some Build.PL stuff to get that going.
>
> What's the hold-up on that front?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Mar  7 08:22:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:22:27 -0600
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <200803071309.25294.heikki@sanbi.ac.za>
References: <200803071309.25294.heikki@sanbi.ac.za>
Message-ID: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>

Same here.  Great news!

chris

On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:

> BIOSQL V1.0.0 RELEASED
> http://news.open-bio.org/archives/2008_03.html#000094
>
>
> Congratulations, Hilmar!
>
> 		-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Mar  7 09:10:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:10:08 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
Message-ID: <47D14CC0.8000104@sendu.me.uk>

Chris Fields wrote:
> I haven't tried it out yet, to tell the truth.  The worry I have is 
> prompting during the build process for database tests, networking, etc.
> 
> I have looked for it, but couldn't determine whether we have a way to 
> run 'perl Build.PL' and bypass prompts with passed arguments.  The only 
> one I could find was 'network', for network tests.
> 
> Scott Cain and I have corresponded about this before, i.e. it would be 
> nice to have boolean flags for each prompt (prereqs, database tests, 
> scripts, network, etc).  For nightly PPMs I would forego tests and 
> include scripts.

I don't quite understand how you're making the nightlys right now, but 
you should be using the dist actions:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

Ie.

One time (and one time only):
perl Build.PL (it doesn't matter how you answer the questions)

Then every night:
./Build dist
./Build ppmdist

You then upload the resulting .tar.gz and .zip files.


Only if Build.PL or ModuleBuildBioperl are updated might you need to:
./Build realclean
perl Build.PL
again. But this should be a rare event and even more rarely would it be 
/required/ (probably never).


From bix at sendu.me.uk  Fri Mar  7 09:19:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:19:36 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <47D14EF8.5090107@sendu.me.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is 
>> prompting during the build process for database tests, networking, etc.
>>
>> I have looked for it, but couldn't determine whether we have a way to 
>> run 'perl Build.PL' and bypass prompts with passed arguments.  The 
>> only one I could find was 'network', for network tests.
>>
>> Scott Cain and I have corresponded about this before, i.e. it would be 
>> nice to have boolean flags for each prompt (prereqs, database tests, 
>> scripts, network, etc).  For nightly PPMs I would forego tests and 
>> include scripts.
> 
> I don't quite understand how you're making the nightlys right now, but 
> you should be using the dist actions:
> 
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
> 
> Ie.
> 
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
> 
> Then every night:
> ./Build dist
> ./Build ppmdist
> 
> You then upload the resulting .tar.gz and .zip files.

Ah, having uploaded the various archives you'll have to manually delete 
them before dunning the dist action the next night, otherwise dist will 
ask you if you want to overwrite them. Otherwise dist asks no questions.


From cjfields at uiuc.edu  Fri Mar  7 09:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 08:28:36 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>


On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is  
>> prompting during the build process for database tests, networking,  
>> etc.
>> I have looked for it, but couldn't determine whether we have a way  
>> to run 'perl Build.PL' and bypass prompts with passed arguments.   
>> The only one I could find was 'network', for network tests.
>> Scott Cain and I have corresponded about this before, i.e. it would  
>> be nice to have boolean flags for each prompt (prereqs, database  
>> tests, scripts, network, etc).  For nightly PPMs I would forego  
>> tests and include scripts.
>
> I don't quite understand how you're making the nightlys right now,  
> but you should be using the dist actions:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> Ie.
>
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
>
> Then every night:
> ./Build dist
> ./Build ppmdist
>
> You then upload the resulting .tar.gz and .zip files.
>
>
> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
> ./Build realclean
> perl Build.PL
> again. But this should be a rare event and even more rarely would it  
> be /required/ (probably never).

I'm not making a distribution; the archives are merely cleaned up svn  
checkouts (no .svn directories).  This is essentially what the  
net_install script would get when installing GBrowse using the 'dev'  
option, except you don't need to install Subversion to get updates.   
Also, at this point we don't have an analogous 'Download tarball'  
setting for browsable svn either, so this is a suitable alternative.

Again, I don't want to deal with prompts while running a cron job  
(this is a bash script), particularly if I can't guarantee the number  
of prompts or the prompting order won't change down the line.  If we  
can set up a way around that using passed args to Build.PL then it  
would make life much easier and we could automate 'Build dist', 'Build  
ppmdist', 'Build testcover', etc.

chris


From bix at sendu.me.uk  Fri Mar  7 09:54:41 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:54:41 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
Message-ID: <47D15731.2050000@sendu.me.uk>

Chris Fields wrote:
> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>> One time (and one time only):
>> perl Build.PL (it doesn't matter how you answer the questions)
>>
>> Then every night:
>> ./Build dist
>> ./Build ppmdist
>>
>> You then upload the resulting .tar.gz and .zip files.
>>
>>
>> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
>> ./Build realclean
>> perl Build.PL
>> again. But this should be a rare event and even more rarely would it 
>> be /required/ (probably never).
> 
> I'm not making a distribution; the archives are merely cleaned up svn 
> checkouts (no .svn directories).  This is essentially what the 
> net_install script would get when installing GBrowse using the 'dev' 
> option, except you don't need to install Subversion to get updates.  
> Also, at this point we don't have an analogous 'Download tarball' 
> setting for browsable svn either, so this is a suitable alternative.

The dist action does what you want. I did a diff on the most recent 
nightly build and the .tar.gz produced by the dist action of a checkout 
of revision 14603:

$ diff -r bioperl-1.5.2_100 bioperl-live
diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/chaos.pm
2c2
< # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
---
 > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
Only in bioperl-live/Bio/Tools: WebBlat.pm
Only in bioperl-live: CHANGELOG
Only in bioperl-1.5.2_100: MANIFEST
Only in bioperl-1.5.2_100: META.yml
diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
1,31c1,30
< # Note: this file was auto-generated by Module::Build::Compat version 0.03
[snip]
---
 > #!/usr/bin/perl -w
 >
 > # This is a stub that simply tells you to use Build.PL instead
[snip]
Only in bioperl-live: bioperl.lisp
diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl 
bioperl-live/maintenance/cvs2cl_by_file.pl
29c29
< ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
---
 > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $

I don't know what's going on with the date differences, but for a file 
found in a folder called '/DIST/nightly_builds/', you want the MANIFEST 
and META.yml files. You also want the Compat version of Build.PL since 
we haven't yet moved to forcing people to use Build.PL.

'./Build dist' does the right thing.


> Again, I don't want to deal with prompts while running a cron job (this 
> is a bash script), particularly if I can't guarantee the number of 
> prompts or the prompting order won't change down the line.  If we can 
> set up a way around that using passed args to Build.PL then it would 
> make life much easier and we could automate 'Build dist', 'Build 
> ppmdist', 'Build testcover', etc.

Again, you only need to run 'perl Build.PL' once and answer the 
questions only once. Then you can svn update and run the actions with no 
more questions to answer. This isn't a problem that needs to be solved. 
It is /supposed/ to be this way. It's ready to use! Please make use of 
it; it's one of the (many) reasons I moved Bioperl over to Build.PL in 
the first place.


From cjfields at uiuc.edu  Fri Mar  7 10:29:11 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 09:29:11 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D15731.2050000@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
Message-ID: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>


On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>>> One time (and one time only):
>>> perl Build.PL (it doesn't matter how you answer the questions)
>>>
>>> Then every night:
>>> ./Build dist
>>> ./Build ppmdist
>>>
>>> You then upload the resulting .tar.gz and .zip files.
>>>
>>>
>>> Only if Build.PL or ModuleBuildBioperl are updated might you need  
>>> to:
>>> ./Build realclean
>>> perl Build.PL
>>> again. But this should be a rare event and even more rarely would  
>>> it be /required/ (probably never).
>> I'm not making a distribution; the archives are merely cleaned up  
>> svn checkouts (no .svn directories).  This is essentially what the  
>> net_install script would get when installing GBrowse using the  
>> 'dev' option, except you don't need to install Subversion to get  
>> updates.  Also, at this point we don't have an analogous 'Download  
>> tarball' setting for browsable svn either, so this is a suitable  
>> alternative.
>
> The dist action does what you want. I did a diff on the most recent  
> nightly build and the .tar.gz produced by the dist action of a  
> checkout of revision 14603:
>
> $ diff -r bioperl-1.5.2_100 bioperl-live
> diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/ 
> chaos.pm
> 2c2
> < # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
> ---
> > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
> Only in bioperl-live/Bio/Tools: WebBlat.pm
> Only in bioperl-live: CHANGELOG
> Only in bioperl-1.5.2_100: MANIFEST
> Only in bioperl-1.5.2_100: META.yml
> diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
> 1,31c1,30
> < # Note: this file was auto-generated by Module::Build::Compat  
> version 0.03
> [snip]
> ---
> > #!/usr/bin/perl -w
> >
> > # This is a stub that simply tells you to use Build.PL instead
> [snip]
> Only in bioperl-live: bioperl.lisp
> diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl bioperl-live/ 
> maintenance/cvs2cl_by_file.pl
> 29c29
> < ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
> ---
> > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $
>
> I don't know what's going on with the date differences, but for a  
> file found in a folder called '/DIST/nightly_builds/', you want the  
> MANIFEST and META.yml files. You also want the Compat version of  
> Build.PL since we haven't yet moved to forcing people to use Build.PL.
>
> './Build dist' does the right thing.
>
>
>> Again, I don't want to deal with prompts while running a cron job  
>> (this is a bash script), particularly if I can't guarantee the  
>> number of prompts or the prompting order won't change down the  
>> line.  If we can set up a way around that using passed args to  
>> Build.PL then it would make life much easier and we could automate  
>> 'Build dist', 'Build ppmdist', 'Build testcover', etc.
>
> Again, you only need to run 'perl Build.PL' once and answer the  
> questions only once. Then you can svn update and run the actions  
> with no more questions to answer. This isn't a problem that needs to  
> be solved. It is /supposed/ to be this way. It's ready to use!  
> Please make use of it; it's one of the (many) reasons I moved  
> Bioperl over to Build.PL in the first place.

Then set it up the way you want.  I give up.

chris


From bix at sendu.me.uk  Fri Mar  7 10:43:44 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 15:43:44 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
Message-ID: <47D162B0.5070402@sendu.me.uk>

Chris Fields wrote:
> 
> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>
>> Again, you only need to run 'perl Build.PL' once and answer the 
>> questions only once. Then you can svn update and run the actions with 
>> no more questions to answer. This isn't a problem that needs to be 
>> solved. It is /supposed/ to be this way. It's ready to use! Please 
>> make use of it; it's one of the (many) reasons I moved Bioperl over to 
>> Build.PL in the first place.
> 
> Then set it up the way you want.  I give up.

I really don't understand that response. I have merely informed you how 
Build.PL and the actions work, since you didn't know. I have informed 
you it already does what you want in terms of automation; there's 
nothing to wait for, no more work to do. I have requested you use it, 
since there is little value in duplicating code and effort.

Now that you have the information, you can make an informed choice as to 
how to proceed, based on your needs. If you have good reasons for 
sticking with your current nightly build process, by all means stick 
with them.

Mainly I just wanted to make clear (as a general point for anyone 
interested) that the questions asked by Build.PL aren't an issue or 
obstacle in terms of automating builds or tests.


From cjfields at uiuc.edu  Fri Mar  7 11:15:31 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 10:15:31 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D162B0.5070402@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
Message-ID: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>

On Mar 7, 2008, at 9:43 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>>
>>> Again, you only need to run 'perl Build.PL' once and answer the  
>>> questions only once. Then you can svn update and run the actions  
>>> with no more questions to answer. This isn't a problem that needs  
>>> to be solved. It is /supposed/ to be this way. It's ready to use!  
>>> Please make use of it; it's one of the (many) reasons I moved  
>>> Bioperl over to Build.PL in the first place.
>> Then set it up the way you want.  I give up.
>
> I really don't understand that response. I have merely informed you  
> how Build.PL and the actions work, since you didn't know. I have  
> informed you it already does what you want in terms of automation;  
> there's nothing to wait for, no more work to do. I have requested  
> you use it, since there is little value in duplicating code and  
> effort.
>
> Now that you have the information, you can make an informed choice  
> as to how to proceed, based on your needs. If you have good reasons  
> for sticking with your current nightly build process, by all means  
> stick with them.
>
> Mainly I just wanted to make clear (as a general point for anyone  
> interested) that the questions asked by Build.PL aren't an issue or  
> obstacle in terms of automating builds or tests.

It doesn't come across that way; it comes off as pretty  
condescending.  And please don't assume I lack experience with how  
Module::Build works (I have used 'Build ppmdist' and 'Build testcover'  
quite a few times recently, and the next item on my agenda is to fix  
the various issues with Build.PL and database checking, which you  
already know).

So my response is pretty simple; if you feel the need to use  
'Build.PL' to make nightlies, then by all means set it up.  I find it  
much harder to work with the current Build process in an automated way  
using a bash script, so I work around it.  If it makes you happier we  
can switch the directory over to 'nightly_checkouts', but I think  
that's just mincing semantics.

Okay, it's pretty obvious we're not on the same page here.  I'll go  
through it carefully so you understand the problem:

1)  I am running a 'svn co' on anon. svn for the various distros to a  
temp directory.  This is done using a bash script.  If I attempt to  
change into the distribution directory and run 'perl Build.PL' from  
the bash script, I immediately run into permissions issues and several  
odd things:

Checking prerequisites...
  - ERROR: Bio::Root::Version is not installed
(I think you ran Build.PL directly, so will use CPAN to install  
prerequisites on demand)
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
Warning: You are not allowed to write into directory "/root/.cpan/ 
sources/authors".
     I'll continue, but if you encounter problems, they may be due
     to insufficient permissions.
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/ 
authors/01mailrc.txt.gz-8678': Permission denied]
Fetching with Net::FTP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
Cannot open Local file /root/.cpan/sources/authors/01mailrc.txt.gz:  
Permission denied
....

2) I suspect, even if I worked around permissions and set up the job  
as root or admin and worked out why it can't find  
'Bio::Root::Version' (?!?), this would still be a terrific pain in the  
*** to deal with as the Build.PL process is expecting answers for each  
and every prompt, and the process differs for each distribution.  Yes,  
I could set something up to deal with that on in the script.  No, I  
will not do that as any additions or changes to prompts could break/ 
hang the script or (worse) silently change what the archive contains.   
Hence my indication that passing flags to 'perl Build.PL' would be a  
nice way to work around that.  For that I haven't heard a response, so  
I assume that functionality isn't there (or am I assuming incorrectly?).

So, from where I stand, even if using Build.PL is the /proper/ way to  
do it, it doesn't work as expected using an automated process (i.e.  
cron).  Make sense?

chris


From matthewehodges at gmail.com  Fri Mar  7 11:16:47 2008
From: matthewehodges at gmail.com (Matt)
Date: Fri, 7 Mar 2008 16:16:47 +0000 (UTC)
Subject: [Bioperl-l] Reciprocal blast
Message-ID: <loom.20080307T160014-878@post.gmane.org>

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the 
protein models of various species also in fasta format. The aim is o have an 
output showing presence/not presence. I think this is possible to do using 
perl, but i'm very much a beginner so any help in this would be greatly 
appreciated.
Thanks
Matt


From bix at sendu.me.uk  Fri Mar  7 12:34:17 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 17:34:17 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D17C99.9050009@sendu.me.uk>

Chris Fields wrote:
> 1)  I am running a 'svn co' on anon. svn for the various distros to a 
> temp directory.

Is it important that you do a fresh co every night? Why not do a co once 
   and then do a 'svn update' every night? This is the crux of the 
problems: if you choose to simply update, then you only have to get 
'perl Build.PL' to work once.


> If I attempt to change into the distribution directory and run 'perl Build.PL' from the 
> bash script, I immediately run into permissions issues and several odd 
> things:
> 
> Checking prerequisites...
>  - ERROR: Bio::Root::Version is not installed
> (I think you ran Build.PL directly, so will use CPAN to install 
> prerequisites on demand)
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
>   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
> Warning: You are not allowed to write into directory 
> "/root/.cpan/sources/authors".
[snip]

I'm assuming this is on portal? The CPAN setup for users is a little 
broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm

$CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"

Then you can run and configure cpan correctly and install Bundle::CPAN. 
Some of the zlib stuff failed to install for me, but that doesn't seem 
to matter.

Of course, I guess it makes sense for root to just install all of 
Bioperl's prereqs anyway, so that testing can be automated in the future.

Anyway, once you have cpan happy 'perl Build.PL' will run fine. Answer 
'n' to everything and then your cron job just has to call './Build dist'.


> 2) I suspect, even if I worked around permissions and set up the job as 
> root or admin and worked out why it can't find 'Bio::Root::Version' 
> (?!?), this would still be a terrific pain in the *** to deal with as 
> the Build.PL process is expecting answers for each and every prompt, and 
> the process differs for each distribution.

You won't be running Build.PL in the cron job.


> passing flags to 'perl Build.PL' would be a nice way to work around 
> that.  For that I haven't heard a response, so I assume that 
> functionality isn't there (or am I assuming incorrectly?).

It isn't AFAIK, but my point is that it doesn't need to be (for this 
particular use-case at least).


> So, from where I stand, even if using Build.PL is the /proper/ way to do 
> it, it doesn't work as expected using an automated process (i.e. cron).  
> Make sense?

Only if you can't run 'svn update' instead of 'svn co' each night.


From cjfields at uiuc.edu  Fri Mar  7 13:00:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 12:00:52 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D17C99.9050009@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
Message-ID: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>


On Mar 7, 2008, at 11:34 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> 1)  I am running a 'svn co' on anon. svn for the various distros to  
>> a temp directory.
>
> Is it important that you do a fresh co every night? Why not do a co  
> once   and then do a 'svn update' every night? This is the crux of  
> the problems: if you choose to simply update, then you only have to  
> get 'perl Build.PL' to work once.

Unless you update Build.PL (which will happen as the distributions  
grow).  Then you need to rerun 'perl Build.PL'.  It seems safer to run  
that each time with a 'pass-through' flag for automated builds.

>> If I attempt to change into the distribution directory and run  
>> 'perl Build.PL' from the bash script, I immediately run into  
>> permissions issues and several odd things:
>> Checking prerequisites...
>> - ERROR: Bio::Root::Version is not installed
>> (I think you ran Build.PL directly, so will use CPAN to install  
>> prerequisites on demand)
>> CPAN: Storable loaded ok
>> Going to read /root/.cpan/Metadata
>>  Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
>> Warning: You are not allowed to write into directory "/root/.cpan/ 
>> sources/authors".
> [snip]
>
> I'm assuming this is on portal? The CPAN setup for users is a little  
> broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm
>
> $CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"
>
> Then you can run and configure cpan correctly and install  
> Bundle::CPAN. Some of the zlib stuff failed to install for me, but  
> that doesn't seem to matter.
>
> Of course, I guess it makes sense for root to just install all of  
> Bioperl's prereqs anyway, so that testing can be automated in the  
> future.
>
> Anyway, once you have cpan happy 'perl Build.PL' will run fine.  
> Answer 'n' to everything and then your cron job just has to call './ 
> Build dist'.

I agree about setting up the prereqs.  I could also (as mentioned  
before) set this up as root.  However, if we go this route we need to  
have 'perl Build.PL' included in the process in order to ensure a  
clean build process each time and to prevent the script from breaking  
whenever someone decides to change Build.PL.

>> 2) I suspect, even if I worked around permissions and set up the  
>> job as root or admin and worked out why it can't find  
>> 'Bio::Root::Version' (?!?), this would still be a terrific pain in  
>> the *** to deal with as the Build.PL process is expecting answers  
>> for each and every prompt, and the process differs for each  
>> distribution.
>
> You won't be running Build.PL in the cron job.

See above.  I don't want to set up something automated which can't be  
maintained in the long term.

>> passing flags to 'perl Build.PL' would be a nice way to work around  
>> that.  For that I haven't heard a response, so I assume that  
>> functionality isn't there (or am I assuming incorrectly?).
>
> It isn't AFAIK, but my point is that it doesn't need to be (for this  
> particular use-case at least).

See above.  There are very good reasons to allow this (and the  
functionality has been requested before, particularly from the GMOD  
crowd).  If I can pass in a single flag (for instance, --defaults,  
which just uses the default arg for each prompt) then it would make  
it /much/ easier.

>> So, from where I stand, even if using Build.PL is the /proper/ way  
>> to do it, it doesn't work as expected using an automated process  
>> (i.e. cron).  Make sense?
>
> Only if you can't run 'svn update' instead of 'svn co' each night.

I think a single co with updates is feasible (I can do that with the  
current setup; just run the initial co, copy the directory over to a  
temp copy, then go about my business).

I'll leave the nightly build setup as is for now and work on getting  
Build.PL working (something we need anyway for Devel::Cover and  
Pod::Coverage work).

chris


From David.Messina at sbc.su.se  Fri Mar  7 13:14:38 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 19:14:38 +0100
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <loom.20080307T160014-878@post.gmane.org>
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>

Hey Matt,

Your question is a little beyond the scope of this mailing list. I don't
know what your bioinformatics background is, but in my experience it's best
to get started hands-on, either in a class or with someone you can sit down
and work through it with. You'll have a million questions, and a mailing
list isn't really suitable for that.

That being said, I would run the blasts on the command-line, parse out the
best hits with BioPerl, and then use hashes to identify mutual best hits.

Briefly, you have two datasets A & B. Format each dataset into a blast
database using xdformat or formatdb. Run two blasts, one with A as query and
B as database and then one with B as query and A as database. The two output
files, each containing multiple Blast reports, can then be processed with
Bio::SearchIO to extract the best hit for each protein.

Read this tutorial for help with that:
http://www.bioperl.org/wiki/HOWTO:SearchIO

Once you get the best hit for each protein, then you can use Perl to find
every instance where two proteins, one from each set, are each other's best
hit. One way would be to create two hashes, one for each set, with query
proteins as keys and best hits as values, and then step through to find the
reciprocal bests.


Dave


From jay at jays.net  Fri Mar  7 13:51:35 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 12:51:35 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
References: <loom.20080307T160014-878@post.gmane.org>
	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
Message-ID: <47D18EB7.3060906@jays.net>

Dave Messina wrote:
> Your question is a little beyond the scope of this mailing list. I don't
> know what your bioinformatics background is, but in my experience it's best
> to get started hands-on, either in a class or with someone you can sit down
> and work through it with. You'll have a million questions, and a mailing
> list isn't really suitable for that.
>
> That being said, I would run the blasts on the command-line, parse out the
> best hits with BioPerl, and then use hashes to identify mutual best hits.
>   

Hi Matt,

If you're a glutton for punishment and want to see a ball of Perl that 
automates and tracks stats across my version of "reciprocal blasts" 
(mine is called cross_blast()), help yourself:

   svn checkout svn://vc.jays.net/seqlab seqlab

I abandoned my maiden voyage into bioinformatics, called "SeqLab," as a 
stand-alone entity when the subsequent thousand tasks I worked turned 
out to be unrelated to all the software I had built so far. My naive 
grand unification vision for all of bioinformatics didn't quite work out 
as I had planned. -laugh-  Nowadays I just cherry-pick solutions out of 
its guts on demand.  :)

I'm happy to field any questions you have about that code, if it helps 
you any.

Cheers,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Fri Mar  7 14:43:03 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 13:43:03 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <47D18EB7.3060906@jays.net>
References: <loom.20080307T160014-878@post.gmane.org>	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
	<47D18EB7.3060906@jays.net>
Message-ID: <47D19AC7.1060907@jays.net>

Jay Hannah wrote:
> I'm happy to field any questions you have about that code, if it helps 
> you any.

I created a wiki page since I stopped paying the bill on the 
"seqlab.net" domain:   :)

http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

Cheers,

j


From cain.cshl at gmail.com  Fri Mar  7 15:17:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 15:17:29 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <1204921049.6467.9.camel@frissell>

Hi Chris,

Thanks much for this.  I have one observation though: both the http and
ftp directories are empty except for a log file :-/

Also, I saw that you mentioned the 'accept the defaults' option I asked
about in January.  I did implement that on Build.PL at the exact time
that the transition from cvs to svn was happening, so I never got
committed back.  Hopefully I still have it :-)  I'll look around and
commit it when I find it.

Scott

On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
> am every night.
> 
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> 
> The archives are date-stamped and also have the Subversion revision,  
> just in case one wanted to ensure they get the correct version for the  
> bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
> (if there are any).  These are currently derived off the anon. svn  
> repository.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Fri Mar  7 15:25:01 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 14:25:01 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204921049.6467.9.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
Message-ID: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>

I was testing a few things earlier using 'Build dist' which tanked the  
old archives.  I reran the script manually so everything should be up  
now.

If you have the default setting implemented for Build.PL that would be  
great.  There is a lingering minor issue with Data::Dumper error  
output via perl 5.10, but beyond that it should be fine.

chris

On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:

> Hi Chris,
>
> Thanks much for this.  I have one observation though: both the http  
> and
> ftp directories are empty except for a log file :-/
>
> Also, I saw that you mentioned the 'accept the defaults' option I  
> asked
> about in January.  I did implement that on Build.PL at the exact time
> that the transition from cvs to svn was happening, so I never got
> committed back.  Hopefully I still have it :-)  I'll look around and
> commit it when I find it.
>
> Scott
>
> On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
>> We now have nightly bundled archives for bioperl-live, bioperl-db,
>> bioperl-run, and bioperl-network running; these will be updated ~  
>> 1:00
>> am every night.
>>
>> http://www.bioperl.org/DIST/nightly_builds/
>> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>>
>> The archives are date-stamped and also have the Subversion revision,
>> just in case one wanted to ensure they get the correct version for  
>> the
>> bug fix.  They also contain a CHANGELOG file for the last 10  
>> revisions
>> (if there are any).  These are currently derived off the anon. svn
>> repository.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From N.Haigh at sheffield.ac.uk  Fri Mar  7 16:01:43 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri,  7 Mar 2008 21:01:43 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
	<BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
Message-ID: <1204923703.47d1ad37a614a@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

-- snip --

> 
> I'll leave the nightly build setup as is for now and work on getting  
> Build.PL working (something we need anyway for Devel::Cover and  
> Pod::Coverage work).
> 

One of the test metrics of Devel::Cover is Pod::Coverage ....no need to have a seperate Pod::Coverage test :o)

Nath


From cain.cshl at gmail.com  Fri Mar  7 17:25:53 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 17:25:53 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
Message-ID: <1204928753.6467.19.camel@frissell>

OK, I added my 'accept the defaults' option.  Use it like this:

  perl Build.PL --accept 1

Scott

On Fri, 2008-03-07 at 14:25 -0600, Chris Fields wrote:
> I was testing a few things earlier using 'Build dist' which tanked the  
> old archives.  I reran the script manually so everything should be up  
> now.
> 
> If you have the default setting implemented for Build.PL that would be  
> great.  There is a lingering minor issue with Data::Dumper error  
> output via perl 5.10, but beyond that it should be fine.
> 
> chris
> 
> On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:
> 
> > Hi Chris,
> >
> > Thanks much for this.  I have one observation though: both the http  
> > and
> > ftp directories are empty except for a log file :-/
> >
> > Also, I saw that you mentioned the 'accept the defaults' option I  
> > asked
> > about in January.  I did implement that on Build.PL at the exact time
> > that the transition from cvs to svn was happening, so I never got
> > committed back.  Hopefully I still have it :-)  I'll look around and
> > commit it when I find it.
> >
> > Scott
> >
> > On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> >> We now have nightly bundled archives for bioperl-live, bioperl-db,
> >> bioperl-run, and bioperl-network running; these will be updated ~  
> >> 1:00
> >> am every night.
> >>
> >> http://www.bioperl.org/DIST/nightly_builds/
> >> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> >>
> >> The archives are date-stamped and also have the Subversion revision,
> >> just in case one wanted to ensure they get the correct version for  
> >> the
> >> bug fix.  They also contain a CHANGELOG file for the last 10  
> >> revisions
> >> (if there are any).  These are currently derived off the anon. svn
> >> repository.
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From n.haigh at sheffield.ac.uk  Sat Mar  8 07:55:39 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sat, 08 Mar 2008 12:55:39 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>	<47D119A1.10408@sendu.me.uk>	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>	<47D14CC0.8000104@sendu.me.uk>	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>	<47D15731.2050000@sendu.me.uk>	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D28CCB.50507@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:

- -- snip --

> 1)  I am running a 'svn co' on anon. svn for the various distros to a
> temp directory.  This is done using a bash script.  If I attempt to
> change into the distribution directory and run 'perl Build.PL' from the
> bash script, I immediately run into permissions issues and several odd
> things:
> 

- -- snip --

Hi Chris,

Do you need to do any svn commands after the checkout? If not, you can do "svn export" instead:
http://svnbook.red-bean.com/en/1.0/re10.html

This basically recursively gets the URL specified without the .svn dirs. However, you then won't be able to run any svn commands on it, as it won't be
a working copy....save bandwidth and possible post processing to delete all the .svn dirs.

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH0ozL9gTv6QYzVL4RAkvcAJ9eSosx3+YWfbBg/KT6+HZrbweGSgCguLCe
ZYtTxSi5q6iiR+sVGDQEZ68=
=uFNP
-----END PGP SIGNATURE-----


From nm249 at cornell.edu  Sat Mar  8 11:48:44 2008
From: nm249 at cornell.edu (Naama Menda)
Date: Sat, 08 Mar 2008 11:48:44 -0500
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <A042F0A0-F007-45C2-A838-7666A85F5836@cornell.edu>

Hi Hilmar,

I have a loading script that uses Bio::Ontology::OntologyI  for  
parsing  obo files and loading terms  into chado schema.
I'm trying to find all relationship types, and it seems that the  
parser looks at the distinct relationship types used by the terms in  
the file,
but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
This is important for storing the relationships in the right  
context , for example all relationships types defined by Sequence  
Ontology should be stored in
the chado schema using the SO cv_id, while other relationship types,  
not defined as Typedef in the obo file, should be stored using the  
'relationship' cv_id.
Without a way to parse Typedefs, I also cannot use Bio::Ontology for  
parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).

Is there another function in Bio::Ontology that handles Typedefs? If  
not can one be added?

Thanks!
-Naama Menda


From bix at sendu.me.uk  Sat Mar  8 18:30:40 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 08 Mar 2008 23:30:40 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204928753.6467.19.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<1204921049.6467.9.camel@frissell>	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>
Message-ID: <47D321A0.9010209@sendu.me.uk>

Scott Cain wrote:
> OK, I added my 'accept the defaults' option.  Use it like this:
> 
>   perl Build.PL --accept 1

Thanks for that Scott, but can you revert and have another go at that 
commit, because you ended up wiping out the recent commits by Chris and 
myself.

Also, rather than individually alter the Bioperl-specific methods like 
choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
perhaps by overriding prompt() itself? Other questions may get added in 
the future, and some existing questions aren't immediately obvious, so 
it would be nice to be sure an automated process like a cron job /never/ 
gets asked a question.


From hlapp at gmx.net  Sun Mar  9 17:37:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 17:37:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <47D2C36C.2020802@cornell.edu>
References: <47D2C36C.2020802@cornell.edu>
Message-ID: <DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>

Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing  
rather than skipping those could be added to the code; I also once  
started and almost completed a project to integrated the go-perl .obo  
parser into the Bio::OntologyIO framework, but the final touches fell  
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and  
traversing it in some way, go-perl might have all you need. If you  
need more than that, could you elaborate?

	-hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

> Hi Hilmar,
>
> I have a loading script that uses Bio::Ontology::OntologyI  for  
> parsing  obo files and loading terms  into chado schema.
> I'm trying to find all relationship types, and it seems that the  
> parser looks at the distinct relationship types used by the terms  
> in the file,
> but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
> This is important for storing the relationships in the right  
> context , for example all relationships types defined by Sequence  
> Ontology should be stored in
> the chado schema using the SO cv_id, while other relationship  
> types, not defined as Typedef in the obo file, should be stored  
> using the 'relationship' cv_id.
> Without a way to parse Typedefs, I also cannot use Bio::Ontology  
> for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
>
> Is there another function in Bio::Ontology that handles Typedefs?  
> If not can one be added?
>
> Thanks!
> -Naama Menda

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Sun Mar  9 21:34:05 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 21:34:05 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <48F99F4E-F17B-4000-8460-9F2CB9E0D75A@gmail.com>

My main problem is that go-perl does not handle updates, so if I want  
to update GO I need an empty schema. We find it more complicated to  
re-load our annotations than to update cvterms and their related data.
Our loading script compares an existing load of an ontology to the  
obo file and updates/insets/deletes accordingly.
We are now in the process of committing this code to GMOD, and  
thought this will be a good opportunity for adding the Typedef  
parsing option.

Thanks,
-Naama


On Sun, Mar 9, 2008 at 5:37 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing
rather than skipping those could be added to the code; I also once
started and almost completed a project to integrated the go-perl .obo
parser into the Bio::OntologyIO framework, but the final touches fell
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and
traversing it in some way, go-perl might have all you need. If you
need more than that, could you elaborate?

        -hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

 > Hi Hilmar,
 >
 > I have a loading script that uses Bio::Ontology::OntologyI  for
 > parsing  obo files and loading terms  into chado schema.
 > I'm trying to find all relationship types, and it seems that the
 > parser looks at the distinct relationship types used by the terms
 > in the file,
 > but not at the ' [Typedef] ' fields (I used 'get_predicate_terms 
()' ).
 > This is important for storing the relationships in the right
 > context , for example all relationships types defined by Sequence
 > Ontology should be stored in
 > the chado schema using the SO cv_id, while other relationship
 > types, not defined as Typedef in the obo file, should be stored
 > using the 'relationship' cv_id.
 > Without a way to parse Typedefs, I also cannot use Bio::Ontology
 > for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
 >
 > Is there another function in Bio::Ontology that handles Typedefs?
 > If not can one be added?
 >
 > Thanks!
 > -Naama Menda

--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Mar  9 22:13:15 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:13:15 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
Message-ID: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>


On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:

> My main problem is that go-perl does not handle updates, so if I  
> want to update GO I need an empty schema. We find it more  
> complicated to re-load our annotations than to update cvterms and  
> their related data.
> Our loading script compares an existing load of an ontology to the  
> obo file and updates/insets/deletes accordingly.

load_ontology.pl in bioperl-db should have all this functionality,  
though of course that doesn't give you the typedef support (yet).

>
> We are now in the process of committing this code to GMOD

Cool - obviously load_ontology.pl doesn't work off of Chado but  
instead uses BioSQL as the schema (though the ontology model is  
*very* similar between the two).

BTW please keep the Bioperl list in the loop, others may have insight  
too or be interested in the information.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Mar  9 22:43:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:43:13 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
Message-ID: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>


On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
>> My main problem is that go-perl does not handle updates, so if I  
>> want to update GO I need an empty schema. We find it more  
>> complicated to re-load our annotations than to update cvterms and  
>> their related data.
>> Our loading script compares an existing load of an ontology to the  
>> obo file and updates/insets/deletes accordingly.
>
> load_ontology.pl in bioperl-db should have all this functionality,  
> though of course that doesn't give you the typedef support (yet).
>
> Will you add this support to obo.pm? I had a look at it and it  
> seems easy to implement.
> Will there be a patch? Or in the next Bioperl release?

If you have ideas for how to implement this we'd be thrilled if you  
can provide a patch.

Most changes in BioPerl happen because and by people who have an itch  
to scratch. Seems like this one is right down your alley?

I'd in principle be interested in doing this too but can't give any  
promises as to when I might have time (unless I need it myself :)

>
>
>>
>> We are now in the process of committing this code to GMOD
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but  
> instead uses BioSQL as the schema (though the ontology model is  
> *very* similar between the two).
>
> We  store ontologies in Chado, and that was the reason for writing  
> a new loader.

Looking at it it seems you wrote a whole new language binding? Did  
you find it too difficult to build on one of the existing ones (which  
use Class::DBI if I recall correctly, though Scott will have the  
details here)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Sun Mar  9 22:26:04 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 22:26:04 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
Message-ID: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
> My main problem is that go-perl does not handle updates, so if I want to
> update GO I need an empty schema. We find it more complicated to re-load our
> annotations than to update cvterms and their related data.
> Our loading script compares an existing load of an ontology to the obo
> file and updates/insets/deletes accordingly.
>
>
> load_ontology.pl in bioperl-db should have all this functionality, though
> of course that doesn't give you the typedef support (yet).
>

Will you add this support to obo.pm? I had a look at it and it seems easy to
implement.
Will there be a patch? Or in the next Bioperl release?

>
>
> We are now in the process of committing this code to GMOD
>
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> uses BioSQL as the schema (though the ontology model is *very* similar
> between the two).
>

We  store ontologies in Chado, and that was the reason for writing a new
loader.

>
> BTW please keep the Bioperl list in the loop, others may have insight too
> or be interested in the information.
>
> -hilmar
>
> --
>
Thanks!
-Naama

> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


From akarger at CGR.Harvard.edu  Mon Mar 10 09:33:49 2008
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 10 Mar 2008 09:33:49 -0400
Subject: [Bioperl-l] Reciprocal blast
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <72AF30DC2881964CB911FD08E57157E7367BD5@lsdiv-msxbe-001.nucleus.harvard.edu>

<plug>
There's a cut & paste protocol for Reciprocal best hit blast at

http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/Pro
tocols/Sequences.html

Let me know if you need to tweak things.
</plug>

-Amir Karger 

> -----Original Message-----
> From: Matt [mailto:matthewehodges at gmail.com] 
> Sent: Friday, March 07, 2008 11:17 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Reciprocal blast
> 
> Dear experts,
> 
> I want to do a best reciprocal blastp of a fasta protein 
> dataset against the 
> protein models of various species also in fasta format. The 
> aim is o have an 
> output showing presence/not presence. I think this is 
> possible to do using 
> perl, but i'm very much a beginner so any help in this would 
> be greatly 
> appreciated.
> Thanks
> Matt
> 
> 
> 


From Daniel.Gerlach at medecine.unige.ch  Mon Mar 10 12:13:39 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Mon, 10 Mar 2008 17:13:39 +0100
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
Message-ID: <47D55E33.8060205@medecine.unige.ch>

Dear all,

This is a very basic question. I have a tree object in $tree and want to 
save its newick representation in a variable as a string:

my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Unfortunately this does not work and he prints out the newick tree on 
stdout plus the message "Use of uninitialized value in print at ...". He 
also prints out the tree on the stdout if I remove the line "print 
$tree_string". The variable $tree_string seems to be empty.

D.


From naama.menda at gmail.com  Mon Mar 10 11:09:12 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Mon, 10 Mar 2008 11:09:12 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
Message-ID: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> >
> > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
> >
> > My main problem is that go-perl does not handle updates, so if I want to
> > update GO I need an empty schema. We find it more complicated to re-load our
> > annotations than to update cvterms and their related data.
> > Our loading script compares an existing load of an ontology to the obo
> > file and updates/insets/deletes accordingly.
> >
> >
> > load_ontology.pl in bioperl-db should have all this functionality,
> > though of course that doesn't give you the typedef support (yet).
> >
>
> Will you add this support to obo.pm? I had a look at it and it seems easy
> to implement.
> Will there be a patch? Or in the next Bioperl release?
>
>
> If you have ideas for how to implement this we'd be thrilled if you can
> provide a patch.
>
> Most changes in BioPerl happen because and by people who have an itch to
> scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any
> promises as to when I might have time (unless I need it myself :)
>

I'll try to provide a patch for this. I'll let you know how it goes..

>
>
>
> >
> > We are now in the process of committing this code to GMOD
> >
> >
> > Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> > uses BioSQL as the schema (though the ontology model is *very* similar
> > between the two).
> >
>
> We  store ontologies in Chado, and that was the reason for writing a new
> loader.
>
>
> Looking at it it seems you wrote a whole new language binding? Did you
> find it too difficult to build on one of the existing ones (which use
> Class::DBI if I recall correctly, though Scott will have the details here)
>

We already had most of the classes. We use these for other code at SGN,
since it's all OO perl. All we needed to do is to add some methods and
accessors to  our Chado classes, and  write a loading script that stores the
ontology from the db and the ontology from the file in hashrefs, compare the
2 and insert/update accordingly. Our main concern was for updating
pre-loaded ontologies (for new ontology files GMOD's make-ontologies works
great!)


> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> -Naama


From clarsen at vecna.com  Mon Mar 10 11:56:55 2008
From: clarsen at vecna.com (Christopher Larsen)
Date: Mon, 10 Mar 2008 11:56:55 -0400 (EDT)
Subject: [Bioperl-l] Reciprocal best blast hits / Orthology
Message-ID: <49819.64.47.82.110.1205164615.squirrel@mail.vecna.com>

Matt, Dave,

Regarding reciprocal best blast hit, yes -- its beyond the list and heres
how/ where to go. It seems what you are looking for is actually an
Ortholog search. If so there is more to it than reciprocity and
ranking--other groups are using phylo trees and bootstrap values etc.

Perhaps check out the perl written up by David Roos and Chris Stockerts
work: OrthoMCL. Their group is quite helpful as well.

http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi

The perl install goes local and will help you to identify a homolog that
should have the same enzyme function or cellular role. Importantly it
tends to the idea of paralogy and pseudogenes as well so you dont step
into a pit. The site explains more.

The point is, you're on the right track, but theres a group that's been
through what you are doing and can supply you with a working
implementation thats very robust and uses BioPerl modules already, so you
don't have to scratch up some code. Also you can check out INPARANOID for
the same reasons. Having just been through this, I'm just trying to lead
to you where we went. Right now we point OrthoMCL at a whole folder of
proteomes (*.faa) and it groups them accordingly.

Brian O: Don't know if this folds well into your MCL wiki page or not,
apologies.

If the group wants some post-processing code that shows the
presence/absence of proteins in any one group perhaps we can help too as
there are a few things written that take the raw output directly.

Cheers,

Chris L

=========================
Message: 5

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the
protein models of various species also in fasta format. The aim is o have an
output showing presence/not presence. I think this is possible to do using
perl, but i'm very much a beginner so any help in this would be greatly
appreciated.

Thanks

Matt

-- 
Christopher Larsen, Ph.D.
Senior Scientist
Research Grants Manager
Vecna Technologies
5004 Lehigh Ave
College Park, MD 20740
240-737-1625


From Kevin.M.Brown at asu.edu  Mon Mar 10 12:17:11 2008
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Mar 2008 09:17:11 -0700
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <1A4207F8295607498283FE9E93B775B4048A0825@EX02.asurite.ad.asu.edu>

You need to either pass in a FileHandle or a path to an output file else
you are going to see the behavior you are getting.

open my $tree_string, ">TreeFile.txt";
my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');

OR

my $out = new Bio::TreeIO(-file => "TreeFile.txt", -format => 'newick');

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Daniel Gerlach
> Sent: Monday, March 10, 2008 9:14 AM
> To: bioperl-l at portal.open-bio.org
> Subject: [Bioperl-l] Bio::TreeIO - tree object to string
> 
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree 
> and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print 
> at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.
> 
> D.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Mon Mar 10 12:51:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 10 Mar 2008 16:51:59 +0000
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <47D5672F.6000709@sendu.me.uk>

Daniel Gerlach wrote:
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.

The -fh argument is supposed to be a file handle, not a string. You can 
use whatever standard Perl method you like for attaching a filehandle to 
a scalar.

Eg.
my $tree_string = '';
open(my $fake_fh, "+<", \$tree_string);
my $out = new Bio::TreeIO(-fh => $fake_fh, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Alternatively, my $tree_string = $tree->simplify_to_leaves_string() 
might give you want you want.


From stephan.rosecker at ish.de  Mon Mar 10 12:27:57 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Mon, 10 Mar 2008 17:27:57 +0100
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
Message-ID: <web-15326627@mail-be-01.mail01.ish.de>

Dear list,

I try to understand how to fetch unigene-cluster with help of bio-db and a local 
biosql-db, but without success.
I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".

#!/usr/local/bin/perl -w

use strict;
use Bio::DB::BioDB;
use Bio::DB::Query::BioQuery;

my $db = Bio::DB::BioDB->new(
         -database => 'biosql',
         -user   => 'postgres',
         -pass   => 'foo',
         -dbname   => 'bioseqdb',
         -host     => 'foo.bar',
         -port     => 5435,    # optional
         -driver   => 'Pg'
);

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections(
          ["Bio::PrimarySeqI c::subject",
          "Bio::PrimarySeqI p::object",
         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
$query->where(["p.accession_number = 'Hs.2'"]);

my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
my $qres = $adp->find_by_query($query);
my $qres2 = $adp2->find_by_query($query);

while(my $pseq = $qres->next_object()) {
         print $pseq->accession_number,?\n?;
}
while(my $pseq = $qres2->next_object()) {
         print $pseq->accession_number,?\n?;
}

Maybe this way is wrong.
Hope you can help me.

stephan
  

From hlapp at gmx.net  Mon Mar 10 22:53:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 22:53:47 -0400
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
In-Reply-To: <web-15326627@mail-be-01.mail01.ish.de>
References: <web-15326627@mail-be-01.mail01.ish.de>
Message-ID: <A95440B6-AFFA-4F45-B90F-B74D3C0770C7@gmx.net>

Stephan -

what is the result that you are getting? Do you receive an error? Or  
simply nothing?

(BTW note that the object adaptor that you let the query execute will  
determine what you kind of object you get in return. Hence, I'd  
expect your $qres2 to return Bio::ClusterI compliant objects, not  
Bio::PrimarySeqI ones. That is probably not at the root of the  
problem here, though.)

	-hilmar

On Mar 10, 2008, at 12:27 PM, stephan.rosecker wrote:

> Dear list,
>
> I try to understand how to fetch unigene-cluster with help of bio- 
> db and a local biosql-db, but without success.
> I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use Bio::DB::BioDB;
> use Bio::DB::Query::BioQuery;
>
> my $db = Bio::DB::BioDB->new(
>         -database => 'biosql',
>         -user   => 'postgres',
>         -pass   => 'foo',
>         -dbname   => 'bioseqdb',
>         -host     => 'foo.bar',
>         -port     => 5435,    # optional
>         -driver   => 'Pg'
> );
>
> my $query = Bio::DB::Query::BioQuery->new();
>
> $query->datacollections(
>          ["Bio::PrimarySeqI c::subject",
>          "Bio::PrimarySeqI p::object",
>         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
> $query->where(["p.accession_number = 'Hs.2'"]);
>
> my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
> my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
> my $qres = $adp->find_by_query($query);
> my $qres2 = $adp2->find_by_query($query);
>
> while(my $pseq = $qres->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
> while(my $pseq = $qres2->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
>
> Maybe this way is wrong.
> Hope you can help me.
>
> stephan
>  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Mar 10 23:17:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 23:17:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
	<e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
Message-ID: <D355F5D0-756C-4CED-A341-496B23DD5A8B@gmx.net>


On Mar 10, 2008, at 11:09 AM, Naama Menda wrote:

>
>
> On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>>
>> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>>
>>> My main problem is that go-perl does not handle updates, so if I  
>>> want to update GO I need an empty schema. We find it more  
>>> complicated to re-load our annotations than to update cvterms and  
>>> their related data.
>>> Our loading script compares an existing load of an ontology to  
>>> the obo file and updates/insets/deletes accordingly.
>>
>> load_ontology.pl in bioperl-db should have all this functionality,  
>> though of course that doesn't give you the typedef support (yet).
>>
>> Will you add this support to obo.pm? I had a look at it and it  
>> seems easy to implement.
>> Will there be a patch? Or in the next Bioperl release?
>
> If you have ideas for how to implement this we'd be thrilled if you  
> can provide a patch.
>
> Most changes in BioPerl happen because and by people who have an  
> itch to scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any  
> promises as to when I might have time (unless I need it myself :)
>
> I'll try to provide a patch for this. I'll let you know how it goes..

That'd be awesome! Don't hesitate to let us know if you hit bumps.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From stephan.rosecker at ish.de  Tue Mar 11 07:08:08 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Tue, 11 Mar 2008 12:08:08 +0100
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
Message-ID: <web-15341068@mail-be-01.mail01.ish.de>

Dear list,

I have started the "bp_load_seqdatabase.pl" script from the "bioperl-db-1.5.2_100" 
package with the unigene
"Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS is postgres on a 
similar machine.
BioSQL core schema is v1.0.0..

The job runs since friday.

./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb --dbuser foo --dbpass 
bar --driver Pg --format ClusterIO::unigene ../ncbi/Hs.data

Is it normal that it takes so long?
What are your experiences?

best regards
stephan


From diriano at uni-potsdam.de  Tue Mar 11 07:55:54 2008
From: diriano at uni-potsdam.de (=?UTF-8?B?RGllZ28gTWF1cmljaW8gUmlhwpbDsW8gUGFjaMOzwpdu?=)
Date: Tue, 11 Mar 2008 12:55:54 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <47D6734A.5060103@uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using 
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => 
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
   my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
   my $blastresult=$writertxt->to_string($result);
   while (my $hit = $result->next_hit){
     print $hit->name."\n";
   }
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diriano.vcf
Type: text/x-vcard
Size: 324 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080311/85980f6c/attachment-0002.vcf>

From diriano at uni-potsdam.de  Tue Mar 11 09:31:35 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 14:31:35 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205242295.47d689b7186ad@webmail.uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n";
  }
}
-- 

Diego Mauricio Ria?o-Pach?n
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From diriano at uni-potsdam.de  Tue Mar 11 10:25:19 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 15:25:19 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205245519.47d6964fcaa48@webmail.uni-potsdam.de>

Dear all,

Please excuse my previous e-mail, it was incomplete, here it is again:

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n"; #I will do further processing of the HSPs
  }
}

But I do not get any output.  It works if I comment the lines referencing the
$writertxt.  The problem is that I need to extract the whole report to later
insert it ($blastresult) into a database. But I also need to process each hit
and hsp.

Any idea how can I accomplish this?  Any help will be greatly appreciated.

Have a nice day,

Diego
-- 

Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From sac at bioperl.org  Tue Mar 11 15:04:48 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 11 Mar 2008 12:04:48 -0700
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
Message-ID: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>

Ditto. Thanks for biting the bullet, Hilmar.

BTW, I put together a little compendium/review of various bioinformatics
data models a few months ago, where I mention BioSQL among others, but I
never really announced it:

http://biodatamodel.org/

It thought about wikifying it to get the community involved in maintaining
it, but haven't gotten around to it yet.

Feedback is welcome.

Cheers,
Steve

On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:

> Same here.  Great news!
>
> chris
>
> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>
> > BIOSQL V1.0.0 RELEASED
> > http://news.open-bio.org/archives/2008_03.html#000094
> >
> >
> > Congratulations, Hilmar!
> >
> >               -Heikki
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cain.cshl at gmail.com  Tue Mar 11 15:29:41 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 11 Mar 2008 15:29:41 -0400
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D321A0.9010209@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>  <47D321A0.9010209@sendu.me.uk>
Message-ID: <1205263781.6220.37.camel@frissell>

Hi Sendu,

Sorry about that; I diffed the version I had with what was in svn, but
apparently didn't look closely at the results.  Do you happen to know
the best way of reverting with svn?

After it gets reverted one way or the other, I agree that overriding
prompt to include another argument is a good way to go.  That way the
value of $accept can be passed to it and it just does the right thing,
regardless of when/where it is getting called.  I'll do that.

Scott

On Sat, 2008-03-08 at 23:30 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > OK, I added my 'accept the defaults' option.  Use it like this:
> > 
> >   perl Build.PL --accept 1
> 
> Thanks for that Scott, but can you revert and have another go at that 
> commit, because you ended up wiping out the recent commits by Chris and 
> myself.
> 
> Also, rather than individually alter the Bioperl-specific methods like 
> choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
> perhaps by overriding prompt() itself? Other questions may get added in 
> the future, and some existing questions aren't immediately obvious, so 
> it would be nice to be sure an automated process like a cron job /never/ 
> gets asked a question.
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hlapp at gmx.net  Tue Mar 11 17:34:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 11 Mar 2008 17:34:30 -0400
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
In-Reply-To: <web-15341068@mail-be-01.mail01.ish.de>
References: <web-15341068@mail-be-01.mail01.ish.de>
Message-ID: <B4761ACF-F3D2-42A4-88D9-6C9790C4BAFB@gmx.net>

It won't be fast, as it will create about ~6 Mln bioentries in your  
database. However, it running since Friday sounds on the high end.

The first step I recommend doing when running into this kind of  
situation is checking the CPU load that the script generates,  
compared to the load generated by the database server. If the  
script's CPU load is significantly less than ~10% then it is likely  
that your database is too slow.

There are various possible reasons why it may be too slow, ranging  
from limited resources, to grossly suboptimal configuration. If your  
database is running on the same 15GB server then resources should not  
be an issue (assuming that you don't have a totally antiquated CPU  
there). You might still want to check the PostgreSQL config file,  
though. What I would suspect though is that you didn't VACUUM the  
database before and/or during the load. That will make the indexes  
used for lookup increasingly slow as a large amount of data accumulates.

Does this ring a bell?

	-hilmar

On Mar 11, 2008, at 7:08 AM, stephan.rosecker wrote:

> Dear list,
>
> I have started the "bp_load_seqdatabase.pl" script from the  
> "bioperl-db-1.5.2_100" package with the unigene
> "Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS  
> is postgres on a similar machine.
> BioSQL core schema is v1.0.0..
>
> The job runs since friday.
>
> ./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb -- 
> dbuser foo --dbpass bar --driver Pg --format ClusterIO::unigene ../ 
> ncbi/Hs.data
>
> Is it normal that it takes so long?
> What are your experiences?
>
> best regards
> stephan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From vuhlhorn at ramapo.edu  Wed Mar 12 14:51:50 2008
From: vuhlhorn at ramapo.edu (Victoria Lyn Uhlhorn)
Date: Wed, 12 Mar 2008 14:51:50 -0400 (EDT)
Subject: [Bioperl-l] HOWTO:Trees module
Message-ID: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>

The following script:

#!/usr/bin/perl -w

use CGI ':standard';
use Bio::Perl;
use Bio::Align::ProteinStatistics;
use Bio::Tree::DistanceFactory;
use Bio::TreeIO;

print header;

print start_html(-bgcolor=>"pink", -title=>('Phylogenetic Tree'), -style=>{-
src=>$style}, -class=>Ltitle), p(), 'Tree';

print start_form, hr;

my $alnio= Bio::AlignIO->new(-file => '/Users/glitterchix4u/Sites/CGI-
bin/HepatitisSerineProt.clustalw',
                             -format => 'clustalw');
my $profactory = Bio::Tree::DistanceFactory->new(-mnethod => 'NJ');
my $stats = Bio::Align::ProteinStatistics->new;
my $treeout = Bio::TreeIO->new(-format => 'newick');

my $tree;
while(my $aln = $alnio->next_aln)
{
	my $mat = $stats->distance(-method => 'Kimura',
	                           -align => $aln);
	$tree = $profactory->make_tree($mat);
	$treeout->write_tree($tree);
}

#$treeout->print_tree($tree);
print "Tree is: ", $tree->size;

print end_form;
print end_html;

How do I print the tree?  I'm having a hard time printing the tree out.


From bix at sendu.me.uk  Wed Mar 12 19:20:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 12 Mar 2008 23:20:01 +0000
Subject: [Bioperl-l] HOWTO:Trees module
In-Reply-To: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
References: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
Message-ID: <47D86521.1010705@sendu.me.uk>

Victoria Lyn Uhlhorn wrote:
> my $treeout = Bio::TreeIO->new(-format => 'newick');

> 	$treeout->write_tree($tree);

> How do I print the tree?  I'm having a hard time printing the tree out.

Your TreeIO will write its trees to the file or filehandle you give it. 
But you haven't given it one. Give it one are write_tree() will then 
cause the tree to be 'printed' there.

If you want the trees stored in a string so you can print() them, there 
are ways to open a filehandle onto a scalar variable.


From hlapp at gmx.net  Thu Mar 13 18:51:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 18:51:13 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
Message-ID: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>

(this is more of a bioperl question than a biosql one)

The load_ncbi_taxonomy.pl script is designed to update the taxon  
tables in a non-disruptive way, and if there weren't many changes  
shouldn't actually take that long (except that recalculating the  
nested set values may take a couple of minutes).

Bioperl-db will store the taxon information it finds in the  
Bio::Species object if it can't locate the taxon by lookup, and will  
not raise an error. The problem with this is that it relies on the  
Bio::SeqIO parser to have gotten the species and lineage information  
correct, which is sometimes a wrong assumption for exotic species.  
Most often the error will not manifest itself at the time of storing  
the erroneously parsed information, but when it is re-retrieved and  
used to populate a Bio::Species object.

For the SymAtlas project we had this situation (new species in  
sequence updates that the last NCBI taxonomy update hadn't yet  
brought in) quite regularly. I wrote a SQL script would fix those  
'haphazard' additions such that load_ncbi_taxonomy would update them  
to their correct values come the next NCBI taxonomy update. I can  
send you the script (it would be for the Oracle version), but I'm not  
sure this is a widely viable strategy.

	-hilmar

On Mar 13, 2008, at 11:06 AM, Peter wrote:

> Dear list,
>
> One of the unresolved issues with Biopython's BioSQL interface is
> dealing with the NCBI taxon ID when loading sequences into the
> database.
>
> As I understand it, ideally before loading any sequences, the user
> will have loaded in the entire NCBI taxonomy using the
> load_ncbi_taxonomy.pl script, as I described here:
> http://biopython.org/wiki/BioSQL#NCBI_Taxonomy
>
> When a new sequence is added to the database with a known taxon id,
> there is no problem.  But happens if its a recently sequenced organism
> which isn't defined yet in the BioSQL taxonomy tables?  Could/should
> the user re-run load_ncbi_taxonomy.pl, and then load in their new
> sequence?
>
> Right now in Biopython due what appears to have been intended as a
> short term hack, we simple don't record the taxon id at all (!), and I
> would like to fix this (bug 2422).
> http://bugzilla.open-bio.org/show_bug.cgi?id=2422
>
> How do BioPerl et al deal with this issue?  Do they try and update the
> taxonomy tables using the available information in the new record's
> annotation (i.e. the new taxon id and the species name)?  Do they
> lookup the NCBI taxonomy definition via the internet?  Do they throw
> an error and halt?
>
> Thanks,
>
> Peter
> (Biopython)
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Mar 13 19:41:43 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 19:41:43 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
Message-ID: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>


On Mar 13, 2008, at 7:13 PM, Peter wrote:

> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> [...]
>>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>>  tables in a non-disruptive way, and if there weren't many changes
>>  shouldn't actually take that long (except that recalculating the
>>  nested set values may take a couple of minutes).
>
> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> could write some minimal taxonomy entry (without any guess work based
> on the species name), in order to record the sequence's taxon

This is what Bioperl-db does. There isn't any guesswork. If  
Bio::Species has lineage information it will also insert the lineage  
information, though.

> - and then running an improved load_ncbi_taxonomy.pl at a later  
> date would
> sort out the proper taxonomy?

If I remember correctly, the script makes (and hence expects) the  
primary key and the NCBI taxonomy ID to be identical. If your loading  
procedure can achieve that already then load_ncbi_taxonomy.pl should  
pick them up and fix them. You can try that by loading the taxonomy  
through the script, then arbitrarily choose a taxon, create a stub  
bioentry for it and set its taxon_id foreign key to the chosen  
taxon,  change its taxon_name.name to some bogus value (for the  
'scientific name' class, for example) (and feel free to change the  
left_id and right_id values in taxon too), and rerun the script. It  
should fix the change you made, and your bioentry should still point  
to the same taxon (because its primary key did not change, and did  
not get deleted either; otherwise the bioentry would now have a null  
value in the foreign key).

The Bioperl-db way of storing things does not give control over  
primary key assignment to Bioperl-db, so the database will assign it.

> [...]
>>  For the SymAtlas project we had this situation (new species in
>>  sequence updates that the last NCBI taxonomy update hadn't yet
>>  brought in) quite regularly. I wrote a SQL script would fix those
>>  'haphazard' additions such that load_ncbi_taxonomy would update them
>>  to their correct values come the next NCBI taxonomy update. I can
>>  send you the script (it would be for the Oracle version), but I'm  
>> not
>>  sure this is a widely viable strategy.
>
> So this wasn't integrated with load_ncbi_taxonomy.pl at all?

No, but now that you say it I don't see any reason why I couldn't.  
Maybe that's just what I should do.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Thu Mar 13 19:13:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Mar 2008 23:13:32 +0000
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
Message-ID: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>

On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> (this is more of a bioperl question than a biosql one)

Well, yes and no.  And I'm not subscribed to the Bioperl list, nor the
BioJava one, nor the BioRuby one.

>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>  tables in a non-disruptive way, and if there weren't many changes
>  shouldn't actually take that long (except that recalculating the
>  nested set values may take a couple of minutes).

Do you think when faced with a novel taxon id, Biopython/BioPerl/...
could write some minimal taxonomy entry (without any guess work based
on the species name), in order to record the sequence's taxon - and
then running an improved load_ncbi_taxonomy.pl at a later date would
sort out the proper taxonomy?

>  Bioperl-db will store the taxon information it finds in the
>  Bio::Species object if it can't locate the taxon by lookup, and will
>  not raise an error. The problem with this is that it relies on the
>  Bio::SeqIO parser to have gotten the species and lineage information
>  correct, which is sometimes a wrong assumption for exotic species.
>  Most often the error will not manifest itself at the time of storing
>  the erroneously parsed information, but when it is re-retrieved and
>  used to populate a Bio::Species object.

This is what I would like to avoid with Biopython.

>  For the SymAtlas project we had this situation (new species in
>  sequence updates that the last NCBI taxonomy update hadn't yet
>  brought in) quite regularly. I wrote a SQL script would fix those
>  'haphazard' additions such that load_ncbi_taxonomy would update them
>  to their correct values come the next NCBI taxonomy update. I can
>  send you the script (it would be for the Oracle version), but I'm not
>  sure this is a widely viable strategy.

So this wasn't integrated with load_ncbi_taxonomy.pl at all?

Peter


From hlapp at gmx.net  Fri Mar 14 00:00:40 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Mar 2008 00:00:40 -0400
Subject: [Bioperl-l] bioperl basics
In-Reply-To: <20080313.195825.6855.0@webmail20.vgs.untd.com>
References: <20080313.195825.6855.0@webmail20.vgs.untd.com>
Message-ID: <0A104B1F-315F-418C-A6DA-84FF04CC438C@gmx.net>

John - you want to send this to the BioPerl mailing list at
bioperl-l at lists.open-bio.org.

Your problem really is a Perl problem though, not BioPerl. The most  
likely cause is that you don't have the Cache::FileCache module  
installed, so that's what I would do.

The answer to your question for how to change @INC is using -I on the  
command line, 'use lib' in your script, or set the PERL5LIB  
environment variable.

	-hilmar

On Mar 14, 2008, at 1:58 AM, mrphysh at juno.com wrote:

> I am a molecular biologist studying bioinformatics from a Perl  
> background and making progress.  I am realizing that without  
> tapping into the existing infrastructure, I will be writing code  
> for ever.  Bioperl is the path for me.  I am moving forward.
>
> the error I encounter is
>
> can't locate Cache/FileCache in @INC (@INC contains /etc/perl/ /usr/ 
> locaql/lib/perl/5.8.8 .....)    and so forth.
>
> I found the files in a home directory.  I must have told the  
> install to put them there...?
>
>
> anyway:  How do I edit this environmental variable..... @INC.  I  
> cannot find anything in my book.
>
> thanks
> john brigham
>
>
> I will be writing code for years and need to tap into the
> _____________________________________________________________
> Need cash? Click to get an emergency loan, bad credit ok
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> Ioyw6i3mKmyQsg01zMPK1Qa0178ZfajwTEBgEXdzlmb9zLLZc8pLOU/
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From markjschreiber at gmail.com  Fri Mar 14 09:48:38 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 14 Mar 2008 21:48:38 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
Message-ID: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>

>From memory BioJava will add it if it is not already in there. If the
taxid can be found then the system connects you with whatever is in
that taxid, it doesn't overwrite it.

This has two curious side effects. Because the details associated with
a taxid sometimes change (eg common name changes a lot) you can get
connected to an outdated version (if your record is newer than your
NCBI taxonomy) or you can get connected with a version that is newer
than your record which means when you round-trip you don't get
complete identity.

For compatibility across the projects some kind of consensus would be good.

- Mark

On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
>
> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>
> > On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >> [...]
>
> >>  The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>  tables in a non-disruptive way, and if there weren't many changes
> >>  shouldn't actually take that long (except that recalculating the
> >>  nested set values may take a couple of minutes).
> >
> > Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> > could write some minimal taxonomy entry (without any guess work based
> > on the species name), in order to record the sequence's taxon
>
> This is what Bioperl-db does. There isn't any guesswork. If
> Bio::Species has lineage information it will also insert the lineage
> information, though.
>
>
> > - and then running an improved load_ncbi_taxonomy.pl at a later
> > date would
> > sort out the proper taxonomy?
>
> If I remember correctly, the script makes (and hence expects) the
> primary key and the NCBI taxonomy ID to be identical. If your loading
> procedure can achieve that already then load_ncbi_taxonomy.pl should
> pick them up and fix them. You can try that by loading the taxonomy
> through the script, then arbitrarily choose a taxon, create a stub
> bioentry for it and set its taxon_id foreign key to the chosen
> taxon,  change its taxon_name.name to some bogus value (for the
> 'scientific name' class, for example) (and feel free to change the
> left_id and right_id values in taxon too), and rerun the script. It
> should fix the change you made, and your bioentry should still point
> to the same taxon (because its primary key did not change, and did
> not get deleted either; otherwise the bioentry would now have a null
> value in the foreign key).
>
> The Bioperl-db way of storing things does not give control over
> primary key assignment to Bioperl-db, so the database will assign it.
>
> > [...]
>
> >>  For the SymAtlas project we had this situation (new species in
> >>  sequence updates that the last NCBI taxonomy update hadn't yet
> >>  brought in) quite regularly. I wrote a SQL script would fix those
> >>  'haphazard' additions such that load_ncbi_taxonomy would update them
> >>  to their correct values come the next NCBI taxonomy update. I can
> >>  send you the script (it would be for the Oracle version), but I'm
> >> not
> >>  sure this is a widely viable strategy.
> >
> > So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>
> No, but now that you say it I don't see any reason why I couldn't.
> Maybe that's just what I should do.
>
>        -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
>
>
>
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>


From cjfields at uiuc.edu  Fri Mar 14 10:31:09 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 14 Mar 2008 09:31:09 -0500
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
Message-ID: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>

The counter to that perspective (using new sequences with old tax  
info) would be to regularly update NCBI taxonomy, particularly in  
circumstances prior to adding new sequences.  Hilmar mentioned that  
once tax is loaded it doesn't take as long to update, so you could set  
up a cron job to update regularly.

I remember someone mentioning weekly or monthly updates on the list  
quite a while ago, but I'm unsure how often NCBI updates tax  
information (i.e. with every release, monthly, weekly, etc).  I can  
see instances popping up where you used the an up-to-date taxonomy but  
a new sequence contains a tax ID not present.  I think bioperl-db  
handles these but I'm not sure what other Bio* do.

chris

On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:

>> From memory BioJava will add it if it is not already in there. If the
> taxid can be found then the system connects you with whatever is in
> that taxid, it doesn't overwrite it.
>
> This has two curious side effects. Because the details associated with
> a taxid sometimes change (eg common name changes a lot) you can get
> connected to an outdated version (if your record is newer than your
> NCBI taxonomy) or you can get connected with a version that is newer
> than your record which means when you round-trip you don't get
> complete identity.
>
> For compatibility across the projects some kind of consensus would  
> be good.
>
> - Mark
> On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>>
>> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>>
>>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>> [...]
>>
>>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
>>>> tables in a non-disruptive way, and if there weren't many changes
>>>> shouldn't actually take that long (except that recalculating the
>>>> nested set values may take a couple of minutes).
>>>
>>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
>>> could write some minimal taxonomy entry (without any guess work  
>>> based
>>> on the species name), in order to record the sequence's taxon
>>
>> This is what Bioperl-db does. There isn't any guesswork. If
>> Bio::Species has lineage information it will also insert the lineage
>> information, though.
>>
>>
>>> - and then running an improved load_ncbi_taxonomy.pl at a later
>>> date would
>>> sort out the proper taxonomy?
>>
>> If I remember correctly, the script makes (and hence expects) the
>> primary key and the NCBI taxonomy ID to be identical. If your loading
>> procedure can achieve that already then load_ncbi_taxonomy.pl should
>> pick them up and fix them. You can try that by loading the taxonomy
>> through the script, then arbitrarily choose a taxon, create a stub
>> bioentry for it and set its taxon_id foreign key to the chosen
>> taxon,  change its taxon_name.name to some bogus value (for the
>> 'scientific name' class, for example) (and feel free to change the
>> left_id and right_id values in taxon too), and rerun the script. It
>> should fix the change you made, and your bioentry should still point
>> to the same taxon (because its primary key did not change, and did
>> not get deleted either; otherwise the bioentry would now have a null
>> value in the foreign key).
>>
>> The Bioperl-db way of storing things does not give control over
>> primary key assignment to Bioperl-db, so the database will assign it.
>>
>>> [...]
>>
>>>> For the SymAtlas project we had this situation (new species in
>>>> sequence updates that the last NCBI taxonomy update hadn't yet
>>>> brought in) quite regularly. I wrote a SQL script would fix those
>>>> 'haphazard' additions such that load_ncbi_taxonomy would update  
>>>> them
>>>> to their correct values come the next NCBI taxonomy update. I can
>>>> send you the script (it would be for the Oracle version), but I'm
>>>> not
>>>> sure this is a widely viable strategy.
>>>
>>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>>
>> No, but now that you say it I don't see any reason why I couldn't.
>> Maybe that's just what I should do.
>>
>>       -hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>>
>>
>>
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From markjschreiber at gmail.com  Fri Mar 14 20:56:37 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 15 Mar 2008 08:56:37 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
Message-ID: <93b45ca50803141756m3d7f022cnb57bd39f37270682@mail.gmail.com>

I agree. A regular update would be best.

Of course if your BioSQL db is limited to one or a few organisms you can
just keep a fragment of the db.

- Mark

On Fri, Mar 14, 2008 at 10:31 PM, Chris Fields <cjfields at uiuc.edu> wrote:

> The counter to that perspective (using new sequences with old tax
> info) would be to regularly update NCBI taxonomy, particularly in
> circumstances prior to adding new sequences.  Hilmar mentioned that
> once tax is loaded it doesn't take as long to update, so you could set
> up a cron job to update regularly.
>
> I remember someone mentioning weekly or monthly updates on the list
> quite a while ago, but I'm unsure how often NCBI updates tax
> information (i.e. with every release, monthly, weekly, etc).  I can
> see instances popping up where you used the an up-to-date taxonomy but
> a new sequence contains a tax ID not present.  I think bioperl-db
> handles these but I'm not sure what other Bio* do.
>
> chris
>
> On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:
>
> >> From memory BioJava will add it if it is not already in there. If the
> > taxid can be found then the system connects you with whatever is in
> > that taxid, it doesn't overwrite it.
> >
> > This has two curious side effects. Because the details associated with
> > a taxid sometimes change (eg common name changes a lot) you can get
> > connected to an outdated version (if your record is newer than your
> > NCBI taxonomy) or you can get connected with a version that is newer
> > than your record which means when you round-trip you don't get
> > complete identity.
> >
> > For compatibility across the projects some kind of consensus would
> > be good.
> >
> > - Mark
> > On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>
> >>
> >> On Mar 13, 2008, at 7:13 PM, Peter wrote:
> >>
> >>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>>> [...]
> >>
> >>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>>> tables in a non-disruptive way, and if there weren't many changes
> >>>> shouldn't actually take that long (except that recalculating the
> >>>> nested set values may take a couple of minutes).
> >>>
> >>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> >>> could write some minimal taxonomy entry (without any guess work
> >>> based
> >>> on the species name), in order to record the sequence's taxon
> >>
> >> This is what Bioperl-db does. There isn't any guesswork. If
> >> Bio::Species has lineage information it will also insert the lineage
> >> information, though.
> >>
> >>
> >>> - and then running an improved load_ncbi_taxonomy.pl at a later
> >>> date would
> >>> sort out the proper taxonomy?
> >>
> >> If I remember correctly, the script makes (and hence expects) the
> >> primary key and the NCBI taxonomy ID to be identical. If your loading
> >> procedure can achieve that already then load_ncbi_taxonomy.pl should
> >> pick them up and fix them. You can try that by loading the taxonomy
> >> through the script, then arbitrarily choose a taxon, create a stub
> >> bioentry for it and set its taxon_id foreign key to the chosen
> >> taxon,  change its taxon_name.name to some bogus value (for the
> >> 'scientific name' class, for example) (and feel free to change the
> >> left_id and right_id values in taxon too), and rerun the script. It
> >> should fix the change you made, and your bioentry should still point
> >> to the same taxon (because its primary key did not change, and did
> >> not get deleted either; otherwise the bioentry would now have a null
> >> value in the foreign key).
> >>
> >> The Bioperl-db way of storing things does not give control over
> >> primary key assignment to Bioperl-db, so the database will assign it.
> >>
> >>> [...]
> >>
> >>>> For the SymAtlas project we had this situation (new species in
> >>>> sequence updates that the last NCBI taxonomy update hadn't yet
> >>>> brought in) quite regularly. I wrote a SQL script would fix those
> >>>> 'haphazard' additions such that load_ncbi_taxonomy would update
> >>>> them
> >>>> to their correct values come the next NCBI taxonomy update. I can
> >>>> send you the script (it would be for the Oracle version), but I'm
> >>>> not
> >>>> sure this is a widely viable strategy.
> >>>
> >>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
> >>
> >> No, but now that you say it I don't see any reason why I couldn't.
> >> Maybe that's just what I should do.
> >>
> >>       -hilmar
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >>
> >> BioSQL-l mailing list
> >> BioSQL-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From charles-listes+bioperl at plessy.org  Mon Mar 17 00:13:11 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Mon, 17 Mar 2008 13:13:11 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
Message-ID: <20080317041311.GA3784@kunpuu.plessy.org>

Dear Bioperl developpers,

In the Debian Project, we distribute packages for Bioperl and need to
express their dependancy to other Perl modules with "Depends",
"Recommends" and "Suggests" levels. For the moment, everything that is
listed in the "recommends" hash of Build.PL is "Recommended" by our
Debian package. This means that they will be installed by default when
installing Bioperl, but that users can force their removal if needed.

Being "Recommended" also means in Debian that if the recommended module
is not available, then the Debian bioperl package will not reach our
internal quality criteria for being part of our stable release.
Therefore I would like to know if you think that some of the modules
recommeded by Bioperl through the "recommends" hash of Build.PL are less
important than others, i.e. that we can just "Suggest" them in our
dependancy system. "Suggested" packages are not installed by default.

The complete definition of the meaning of "Depends", "Recommends" and
"Suggests" for Debian packages can be found in the section 7.2 of the
Debian policy:

http://www.debian.org/doc/debian-policy/ch-relationships.html

Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
using 1.5.2 in its next stable release. We welcome your comments on this
as well.

The Debian package for Bioperl 1.4: http://packages.debian.org/lenny/bioperl
and for Bioperl 1.5.2: http://packages.debian.org/sid/bioperl

(A copy of this email has been sent to the mailing list of the
Debian-Med project).

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wak?, Saitama, Japan


From David.Messina at sbc.su.se  Mon Mar 17 11:38:28 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 17 Mar 2008 16:38:28 +0100
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <20080317041311.GA3784@kunpuu.plessy.org>
References: <20080317041311.GA3784@kunpuu.plessy.org>
Message-ID: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>

Hi Charles,

Thanks for your note.

All of the BioPerl 'recommended' modules involve optional functionality, so
I would think all of them would map to 'suggested' under Debian so they
won't be installed by default.

For everyone else, this is the list of recommended modules he's talking
about:
Ace
Class::AutoClass
Clone
Convert::Binary::C
Data::Stag::XMLWriter
GD
GD::SVG
Graph
HTML::Entities
HTML::Parser
HTTP::Request::Common
LWP::UserAgent
PostScript::TextBlock
Set::Scalar
SOAP::Lite
Spreadsheet::ParseExcel
Storable
SVG
SVG::Graph
Text::Shellwords
URI::Escape
XML::DOM::XPath
XML::Parser
XML::Parser::PerlSAX
XML::SAX
XML::SAX::Writer
XML::Twig
XML::Writer


Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
> using 1.5.2 in its next stable release. We welcome your comments on this
> as well.
>

I think the consensus here would be that 1.5.2 is the appropriate version of
Bioperl to use in the next stable release of Debian. Although we've started
to work toward Bioperl 1.6, that release will be at least a few months off,
and 1.4, while technically our most recent 'stable' release, is waaay out of
date.


Dave


From mrphysh at juno.com  Mon Mar 17 18:27:21 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Mon, 17 Mar 2008 22:27:21 GMT
Subject: [Bioperl-l] bioperl email list
Message-ID: <20080317.162721.27257.1@webmail19.vgs.untd.com>

Hello bioperl people.

I am a Perl programmer/molecular biologist/nice guy.  I am wandering around within the bioinformatics arena and making progress.

I am realizing that I will be writing code forever unless I can tap into the existing infrastructure.  For me that appears to be bioperl.

I would like to be part of the bioperl community.  I subscribed to the bioperl list and got a return email, but have never received an email.  Did I do something wrong?  Could you look into this please?

The truth is:  these object are blowing me away and I need help.

John S. Brigham
13810 Braun Drive
Golden, Colorado 80401
303-216-0994
mrphysh2juno.com 
_____________________________________________________________
Make money while staying at home.  Click here for information on top-notch home businesses.
http://thirdpartyoffers.juno.com/TGL2121/fc/Ioyw6i3l5e2nlxbjZebFePkm5lBTaKaRlgHsk8Xt4yjn3c9lhQhoRW/


From hlapp at gmx.net  Mon Mar 17 23:44:11 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 17 Mar 2008 23:44:11 -0400
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
Message-ID: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>

On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:

> Hi Charles,
>
> Thanks for your note.
>
> All of the BioPerl 'recommended' modules involve optional  
> functionality, so
> I would think all of them would map to 'suggested' under Debian so  
> they
> won't be installed by default.

I would probably elevate LWP to 'recommended.' Other than that I agree.

	-hilmar

>
> For everyone else, this is the list of recommended modules he's  
> talking
> about:
> Ace
> Class::AutoClass
> Clone
> Convert::Binary::C
> Data::Stag::XMLWriter
> GD
> GD::SVG
> Graph
> HTML::Entities
> HTML::Parser
> HTTP::Request::Common
> LWP::UserAgent
> PostScript::TextBlock
> Set::Scalar
> SOAP::Lite
> Spreadsheet::ParseExcel
> Storable
> SVG
> SVG::Graph
> Text::Shellwords
> URI::Escape
> XML::DOM::XPath
> XML::Parser
> XML::Parser::PerlSAX
> XML::SAX
> XML::SAX::Writer
> XML::Twig
> XML::Writer
>
>
>
> Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
>> using 1.5.2 in its next stable release. We welcome your comments  
>> on this
>> as well.
>>
>
> I think the consensus here would be that 1.5.2 is the appropriate  
> version of
> Bioperl to use in the next stable release of Debian. Although we've  
> started
> to work toward Bioperl 1.6, that release will be at least a few  
> months off,
> and 1.4, while technically our most recent 'stable' release, is  
> waaay out of
> date.
>
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Mar 18 05:29:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 09:29:10 +0000
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
References: <20080317041311.GA3784@kunpuu.plessy.org>	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
Message-ID: <47DF8B66.8020509@sendu.me.uk>

Hilmar Lapp wrote:
> On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:
> 
>> Hi Charles,
>>
>> Thanks for your note.
>>
>> All of the BioPerl 'recommended' modules involve optional 
>> functionality, so
>> I would think all of them would map to 'suggested' under Debian so they
>> won't be installed by default.
> 
> I would probably elevate LWP to 'recommended.' Other than that I agree.

I looked at the most used external modules. Used 6 times or more:

   Data::Dumper => used 55 times
   Carp => used 51 times
   IO::String => used 25 times
   Symbol => used 19 times
   File::Spec => used 17 times
   HTTP::Request::Common => used 17 times
   POSIX => used 12 times
   DB_File => used 11 times
   Fcntl => used 11 times
   IO::File => used 11 times
   Exporter => used 10 times
   File::Temp => used 9 times
   Dumpvalue => used 8 times
   LWP::UserAgent => used 8 times
   Scalar::Util => used 8 times
   URI::Escape => used 8 times
   File::Basename => used 6 times
   File::Path => used 6 times
   XML::Writer => used 6 times

I can never remember how to figure out which of those is included with 
perl 5.6.1.

Except maybe XML::Writer, if we do want to promote anything to 
recommended, I suppose it would be those above.

I also agree with everything Dave said; if it's easier everything can be 
'suggested'.


(I reckon most if not all of the Data::Dumper and Carp usages should be 
removed)


From David.Messina at sbc.su.se  Tue Mar 18 10:30:02 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 15:30:02 +0100
Subject: [Bioperl-l] bioperl email list
In-Reply-To: <20080317.162721.27257.1@webmail19.vgs.untd.com>
References: <20080317.162721.27257.1@webmail19.vgs.untd.com>
Message-ID: <628aabb70803180730g2aae7ae0u18a9e6a360c725fa@mail.gmail.com>

Hi John,

Welcome to BioPerl!


>  I subscribed to the bioperl list and got a return email, but have never
> received an email.


As I'm sure you know, most of the time when email doesn't show up, it's
because it's been filtered as spam. I assuming you looked for this already
though.

If that's not it, then I suggest trying to log in to the mailing list server
here:
http://bioperl.org/mailman/listinfo/bioperl-l

Log in by entering your email address (the one you subscribed with) in the
last field on that page and clicking the "Unsubscribe or edit options"
button.

On the next page that comes up, type your password in the first field on the
page.

If for some reason your haven't been subscribed to the list properly, then
you will get an error here. Otherwise, you will be taken to your membership
configuration page. There you can verify. among other options, that mail
delivery is enabled.


The truth is:  these object are blowing me away and I need help.


BioPerl does have a bit of a learning curve, but fortunately there are some
good tutorials that should help you to get started. If you haven't already,
visit the HOWTO section of bioperl.org. Check out the one on BioPerl for
beginners, and then you might follow up with the SeqIO and SearchIO HOWTOs
which cover how to read and write sequences and sequence alignment program
output.

Also, there's lots of great example code in the examples folder of the
BioPerl distribution. I find looking at how other people use BioPerl is very
helpful in understanding what objects are used for what.

Finally, I'll plug the BioPerl Deobfuscator, which is a class browser for
BioPerl and available at: http://bioperl.org/cgi-bin/deob_interface.cgi

BioPerl classes tend to have multlple levels of inheritance, and the
Deobfuscator lets you see all of the methods available to objects of a given
class.


Dave


From bix at sendu.me.uk  Tue Mar 18 11:32:25 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 15:32:25 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
Message-ID: <47DFE089.1070304@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
>> Or is the split intended to be 'core' == "anything and everything
>> that was in 1.4", '????' == "everything else"? In which case,
>> what's a good name for "modules created after 1.4"? 'crust'? ;)
> 
> Nah, "icing".
> 
> a module "use" map might be very useful to help identify "core" vs.
> other layers of mantle/crust/icing.
> 
> http://www.perlmonks.org/?node_id=87329 
> http://search.cpan.org/src/NEILB/pmusage-1.2/

Thanks for those. Neither could quite cope with BioPerl, but I've munged
them together and hacked up 'module_usage.pl' which I've just committed
to the maintenance directory of bioperl-live.

module_usage.pl ../Bio

Produces:
  *warning, may crash your browser; download it and view in a dedicated
image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg
http://bix.sendu.me.uk/files/module_usage.txt

First I considered what modules each BioPerl package (aka class, module)
'uses' (what modules does it load via 'use', 'require' or inherit from
via 'use base', excluding external (non-BioPerl) modules), then grouped
together packages that have identical usage. The graph shows all the
groups with more than one member as nodes and edges from them pointing
to the individual packages that they use. The set of those individual
packages pointed to by groups also have edges showing their
use-relationship to other members of the set (only). Members of the set
are also shaded in red. The saturation of the shade indicates how many
packages use that package (so dark red packages are used a lot).

(I had to simplify in this way because otherwise GraphViz bailed on me.
If anyone can come with nicer simplification/visualisation systems,
please do! It's important to note that there is lots of information loss
in my scheme, so you can't rely on the graph alone.)

Getting to the question on how to decide what is 'core' and on what
basis to split things up, first consider the darker red packages. Next
consider how many groups point to it. Finally consider the membership of
those groups: are they all highly related, or are they from different
'parts' of BioPerl?

For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
pointing to it, but all the members of those groups are
Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
Bio::Graphics?) could be split off cleanly if desired and not kept in
core. Bio::SimpleAlign, on the other hand, whilst not being quite as
dark a red, has 7 attached groups with members from Bio::AlignIO,
Bio::Search and Bio::Tools. You could easily argue it is more
fundamental to BioPerl and should be in core. In turn, the things that
Bio::SimpleAlign points to would also have to be in core.

I haven't done any full analysis along these lines and leave as an
exercise for the interested reader for now ;)


Chris Fields wrote:
> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> 
> I'm pretty flexible on any of that; it's a proposal only and I think
> some of it may be wrongheaded, but hey, I'm willing to take a few
> rotten tomatoes.  The key issue is we should try to work out what we
> mean by 'core' or the core library.  I have a rather extreme view of
> it as being the bare essentials without external, non-perl core
> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> and required modules for those classes) but I'm sure others would
> lump in parsers, DB functionality, etc.  I basically suggest placing
> those (and any stable but potentially non-core code) in a
> 'bioperl-main', with any unstable or untested code going into a
> 'bioperl-unstable'.

My thoughts are along these lines:
# I agree that core should have no external dependencies
# I agree that it might mostly be interfaces
# It should represent a framework with all the interfaces (that have
   stable APIs), directory structure and base classes that everything
   else relies on
# It might not do much useful bioinformatics, but provides just about
   everything needed for a dev to create a new module that does


> In essence, bioperl-main would require core and resemble a stable
> release; bioperl-unstable would require bioperl-main (and core) and
> resemble a dev release.  Not sure how versioning would go or if this
> is a viable option at all, but it's worth discussing.

# I agree that this 3-way split seems reasonable
# bioperl-main would consist primarily of the 'leaves' of the module
   tree, mostly parsers and the like which, whilst 'stable' and tested
   should still be split away from core because the data sources they
   parse could change format slightly
# bioperl-unstable, better bioperl-bleed, would feature brand-new
   stuff, be it new parsers for totally new formats, new APIs that do
   something not thought of before etc. When they are complete, bug-free
   and have stood the test of time they get moved into bioperl-main.
   (It is not a place for all new commits; bug fixes to something in
   bioperl-main would be committed to bioperl-main)
# The current splits (bioperl-run, bioperl-network etc.) do not get
   their own core and bleed variant. Anything they need for core
   functionality would enter the single bioperl-core, anything new
   would enter the single bioperl-bleed, and anything stable would
   be in their own bioperl-[package]

Discuss :)


From snoze.pa at gmail.com  Tue Mar 18 14:27:47 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 13:27:47 -0500
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
Message-ID: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>

Thanks hilmar. I am still wondering if my old problem was fixed. It is
related to when NR databases mixes files from different databases.

On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org> wrote:

> Ditto. Thanks for biting the bullet, Hilmar.
>
> BTW, I put together a little compendium/review of various bioinformatics
> data models a few months ago, where I mention BioSQL among others, but I
> never really announced it:
>
> http://biodatamodel.org/
>
> It thought about wikifying it to get the community involved in maintaining
> it, but haven't gotten around to it yet.
>
> Feedback is welcome.
>
> Cheers,
> Steve
>
> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>
> > Same here.  Great news!
> >
> > chris
> >
> > On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
> >
> > > BIOSQL V1.0.0 RELEASED
> > > http://news.open-bio.org/archives/2008_03.html#000094
> > >
> > >
> > > Congratulations, Hilmar!
> > >
> > >               -Heikki
> > >
> > > --
> > > ______ _/      _/_____________________________________________________
> > >      _/      _/
> > >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> > >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> > >  _/  _/  _/  University of Western Cape, South Africa
> > >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From darin.london at duke.edu  Tue Mar 18 14:16:58 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 18 Mar 2008 13:16:58 -0500
Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200803181816.m2IIGwOL007248@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. Many Open Source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation and dissemination of new computational methods as well as informatics infrastructure. Since the work of the Open Source Bioinformatics Community represents some of the most cutting edge of Bioinformatics in general, the overall theme for the conference this year is "Tackling Hard Problems with Emerging Technologies". Topics under this umbrella include cyberinfrastructure, grid computing and workflow management and discovery, and visualization. We will also have a series of update talks about the main Open Source Bioinformatics Software suites.

One of the hallmarks of BOSC is the coming together of the open source developer community in one location. A face-to-face meeting of this community creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done.

This year, BOSC is accepting abstract submissions on the conference theme "Tackling Hard Problems with Emerging Technologies". The conference theme reflects that there are new technologies emerging on both the scientific front (new sequencing technologies, etc.) and the IT front (workflows, mashup/web 2.0, improvements in all of the major programming languages, etc.), which may allow the open source community to solve problems that were previously intractable. Abstracts may be submitted for the following topics.

1. Cyberinfrastructure - We are interested in presentations on topics dealing with the development of infrastructure on the web to facilitate software and data re-use (mashups, or traditional), interoperability and inter-process communication, system/service discovery, and data movement and modeling in distributed systems. This may include peer-to-peer systems of data transfer, Web Services, various flavors of data representation (SOAP, JSON, XML, others), and technologies commonly referred to under the Web 2.0 paradigm (e.g. folksonomies/tagging, user-based content generation, content feeds, and Social Networking).

2. Grid Computing and Workflow Management and Discovery - We particularly invite talks that report progress in making workflow systems easier to use and on how to do distributed-collaborative research , e.g. workflows that encompass the coordination of systems running in different parts of the world.

3. Visualization - Visualization is a maturing area of open source software development. We particularly invite talks that demonstrate innovative visualization systems in the context of workflows.

4. Open Source Software - Speakers will present talks on the use, development, or philosophy of open source software in bioinformatics.

5. Bio* Open Source Project Updates - We invite abstracts from the representatives of the open source projects sponsored by or affiliated to the O|B|F (see Projects).


Please consult the official BOSC 2008 website at http://www.open-bio.org/wiki/Upcoming_BOSC_conference  for all updates and extra information.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From hlapp at gmx.net  Tue Mar 18 15:07:54 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 18 Mar 2008 15:07:54 -0400
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
	<10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
Message-ID: <0B1635C3-0BD4-449C-9C52-FD8D07E9D669@gmx.net>

Can you point me to the bug report or a mailing list thread?

	-hilmar

On Mar 18, 2008, at 2:27 PM, snoze pa wrote:

> Thanks hilmar. I am still wondering if my old problem was fixed. It is
> related to when NR databases mixes files from different databases.
>
> On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org>  
> wrote:
>
>> Ditto. Thanks for biting the bullet, Hilmar.
>>
>> BTW, I put together a little compendium/review of various  
>> bioinformatics
>> data models a few months ago, where I mention BioSQL among others,  
>> but I
>> never really announced it:
>>
>> http://biodatamodel.org/
>>
>> It thought about wikifying it to get the community involved in  
>> maintaining
>> it, but haven't gotten around to it yet.
>>
>> Feedback is welcome.
>>
>> Cheers,
>> Steve
>>
>> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu>  
>> wrote:
>>
>>> Same here.  Great news!
>>>
>>> chris
>>>
>>> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>>>
>>>> BIOSQL V1.0.0 RELEASED
>>>> http://news.open-bio.org/archives/2008_03.html#000094
>>>>
>>>>
>>>> Congratulations, Hilmar!
>>>>
>>>>               -Heikki
>>>>
>>>> --
>>>> ______ _/      _/ 
>>>> _____________________________________________________
>>>>      _/      _/
>>>>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>>>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>>>   _/  _/  _/  SANBI, South African National Bioinformatics  
>>>> Institute
>>>>  _/  _/  _/  University of Western Cape, South Africa
>>>>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>>>> ___ _/_/_/_/_/ 
>>>> ________________________________________________________
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From snoze.pa at gmail.com  Tue Mar 18 16:33:08 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 15:33:08 -0500
Subject: [Bioperl-l] NCBI taxonomy database
Message-ID: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>

Dear Users,
 How can i use NCBI taxonomy database in bioperl? any suggestions!!!
thanks in advance
s


From aaron.j.mackey at gsk.com  Tue Mar 18 12:23:41 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 18 Mar 2008 12:23:41 -0400
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
Message-ID: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>

Very cool.  I can envision this being printed as a laminated poster to put 
up next to the periodic table of Perl Elements (
http://www.ozonehouse.com/mark/blog/code/PeriodicTable.html)

One GraphViz trick you could try would be to group Bio::X::* (nodes and 
your collection groups sharing common Bio::X:: prefixes) together as 
subgraphs; that should quickly show you which edges go outside of the 
various "domains", and which are entirely self contained.

you could also try to distinguish "use base" relationships (i.e. 
inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
Bio::X" wrapped in an eval (optional use if available) by various edge 
colorings -- this might help to further break things up if we can guess at 
the intended "use" of any Bio::X by Bio::Y.

-Aaron

"Sendu Bala" <bix at sendu.me.uk> wrote on 03/18/2008 11:32:25 AM:

> aaron.j.mackey at gsk.com wrote:
> >> Or is the split intended to be 'core' == "anything and everything
> >> that was in 1.4", '????' == "everything else"? In which case,
> >> what's a good name for "modules created after 1.4"? 'crust'? ;)
> > 
> > Nah, "icing".
> > 
> > a module "use" map might be very useful to help identify "core" vs.
> > other layers of mantle/crust/icing.
> > 
> > http://www.perlmonks.org/?node_id=87329 
> > http://search.cpan.org/src/NEILB/pmusage-1.2/
> 
> Thanks for those. Neither could quite cope with BioPerl, but I've munged
> them together and hacked up 'module_usage.pl' which I've just committed
> to the maintenance directory of bioperl-live.
> 
> module_usage.pl ../Bio
> 
> Produces:
>   *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
> 
> First I considered what modules each BioPerl package (aka class, module)
> 'uses' (what modules does it load via 'use', 'require' or inherit from
> via 'use base', excluding external (non-BioPerl) modules), then grouped
> together packages that have identical usage. The graph shows all the
> groups with more than one member as nodes and edges from them pointing
> to the individual packages that they use. The set of those individual
> packages pointed to by groups also have edges showing their
> use-relationship to other members of the set (only). Members of the set
> are also shaded in red. The saturation of the shade indicates how many
> packages use that package (so dark red packages are used a lot).
> 
> (I had to simplify in this way because otherwise GraphViz bailed on me.
> If anyone can come with nicer simplification/visualisation systems,
> please do! It's important to note that there is lots of information loss
> in my scheme, so you can't rely on the graph alone.)
> 
> Getting to the question on how to decide what is 'core' and on what
> basis to split things up, first consider the darker red packages. Next
> consider how many groups point to it. Finally consider the membership of
> those groups: are they all highly related, or are they from different
> 'parts' of BioPerl?
> 
> For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
> pointing to it, but all the members of those groups are
> Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
> Bio::Graphics?) could be split off cleanly if desired and not kept in
> core. Bio::SimpleAlign, on the other hand, whilst not being quite as
> dark a red, has 7 attached groups with members from Bio::AlignIO,
> Bio::Search and Bio::Tools. You could easily argue it is more
> fundamental to BioPerl and should be in core. In turn, the things that
> Bio::SimpleAlign points to would also have to be in core.
> 
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)
> 
> 
> Chris Fields wrote:
> > http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> > 
> > I'm pretty flexible on any of that; it's a proposal only and I think
> > some of it may be wrongheaded, but hey, I'm willing to take a few
> > rotten tomatoes.  The key issue is we should try to work out what we
> > mean by 'core' or the core library.  I have a rather extreme view of
> > it as being the bare essentials without external, non-perl core
> > dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> > and required modules for those classes) but I'm sure others would
> > lump in parsers, DB functionality, etc.  I basically suggest placing
> > those (and any stable but potentially non-core code) in a
> > 'bioperl-main', with any unstable or untested code going into a
> > 'bioperl-unstable'.
> 
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>    stable APIs), directory structure and base classes that everything
>    else relies on
> # It might not do much useful bioinformatics, but provides just about
>    everything needed for a dev to create a new module that does
> 
> 
> > In essence, bioperl-main would require core and resemble a stable
> > release; bioperl-unstable would require bioperl-main (and core) and
> > resemble a dev release.  Not sure how versioning would go or if this
> > is a viable option at all, but it's worth discussing.
> 
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>    tree, mostly parsers and the like which, whilst 'stable' and tested
>    should still be split away from core because the data sources they
>    parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>    stuff, be it new parsers for totally new formats, new APIs that do
>    something not thought of before etc. When they are complete, bug-free
>    and have stood the test of time they get moved into bioperl-main.
>    (It is not a place for all new commits; bug fixes to something in
>    bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>    their own core and bleed variant. Anything they need for core
>    functionality would enter the single bioperl-core, anything new
>    would enter the single bioperl-bleed, and anything stable would
>    be in their own bioperl-[package]
> 
> Discuss :)
> 


From David.Messina at sbc.su.se  Tue Mar 18 17:23:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 22:23:18 +0100
Subject: [Bioperl-l] NCBI taxonomy database
In-Reply-To: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
References: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
Message-ID: <628aabb70803181423g305db155r9d66c114f38c64b6@mail.gmail.com>

Hi snoze,

I think you will want to take a look at the docs for the Bio::Taxon module
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html


and these scripts:
scripts/taxa/local_taxonomydb_query.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/local_taxonomydb_query.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Script
that accesses a local taxonomy database and retrieves species or
TaxonIDs<http://www.bioperl.org/wiki/TaxonID>.
 scripts/taxa/query_entrez_taxa.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/query_entrez_taxa.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Demonstrate
how to retrieve the NCBI <http://www.bioperl.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species. Also
retrieve
TaxonID <http://www.bioperl.org/wiki/TaxonID> for a given accession
number<http://www.bioperl.org/wiki/Accession_number>.
 scripts/taxa/taxid4species.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/taxid4species.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Retrieve
the NCBI <http://en.wikipedia.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species.

Dave


From alexl at users.sourceforge.net  Wed Mar 19 04:32:38 2008
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 19 Mar 2008 01:32:38 -0700
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk> (Sendu Bala's message of "Tue\,
	18 Mar 2008 15\:32\:25 +0000")
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <42eja7azbt.fsf@allele2.eebweb.arizona.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> # I agree that this 3-way split seems reasonable # bioperl-main
SB> would consist primarily of the 'leaves' of the module tree, mostly
SB> parsers and the like which, whilst 'stable' and tested should
SB> still be split away from core because the data sources they parse
SB> could change format slightly # bioperl-unstable, better
SB> bioperl-bleed, would feature brand-new stuff, be it new parsers
SB> for totally new formats, new APIs that do something not thought of
SB> before etc. When they are complete, bug-free and have stood the
SB> test of time they get moved into bioperl-main.  (It is not a place
SB> for all new commits; bug fixes to something in bioperl-main would
SB> be committed to bioperl-main) # The current splits (bioperl-run,
SB> bioperl-network etc.) do not get their own core and bleed
SB> variant. Anything they need for core functionality would enter the
SB> single bioperl-core, anything new would enter the single
SB> bioperl-bleed, and anything stable would be in their own
SB> bioperl-[package]

SB> Discuss :)

While on the subject of how to split up the bioperl package, spare a
thought for upstream package maintainers.  The Fedora package for the
bioperl "core" that I now maintain is currently a single package which
makes it easy to get reviewed, included in the distribution and
updated/maintained.  (bioperl-run is a separate package).

While I agree that bioperl is now perhaps a little too monolithic, I
thinking splitting it up in a too fine-grained manner like CPAN might
go too far the other way.  For Fedora, each package would then need to
be reviewed and updated separately.  Similar issues might apply for
other distros (such as Debian/Ubuntu).

I think something similar to the three-way split proposed sounds like
a good compromise, so long as everything that a "basic" user of
Bioperl can install most of the functionality in the current "bioperl"
package in (at most) 2-3 packages.  

One model to look at might be the gstreamer model which has a "core"
(gstreamer) and "gstreamer-plugins-base", "gstreamer-plugins-good",
"gstreamer-plugins-bad" and "gstreamer-plugins-ugly" modules for
plugins, see:

http://gstreamer.net/

Alex


From charles-listes+bioperl at plessy.org  Wed Mar 19 06:01:59 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 19 Mar 2008 19:01:59 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <47DF8B66.8020509@sendu.me.uk>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
	<47DF8B66.8020509@sendu.me.uk>
Message-ID: <20080319100159.GD29304@kunpuu.plessy.org>

Le Tue, Mar 18, 2008 at 09:29:10AM +0000, Sendu Bala a ?crit :
> 
> I looked at the most used external modules. Used 6 times or more:
> 
>   Data::Dumper => used 55 times
>   Carp => used 51 times
>   IO::String => used 25 times
>   Symbol => used 19 times
>   File::Spec => used 17 times
>   HTTP::Request::Common => used 17 times
>   POSIX => used 12 times
>   DB_File => used 11 times
>   Fcntl => used 11 times
>   IO::File => used 11 times
>   Exporter => used 10 times
>   File::Temp => used 9 times
>   Dumpvalue => used 8 times
>   LWP::UserAgent => used 8 times
>   Scalar::Util => used 8 times
>   URI::Escape => used 8 times
>   File::Basename => used 6 times
>   File::Path => used 6 times
>   XML::Writer => used 6 times

Dear Sendu,

thanks a lot for this analysis ! We will downgrade all modules except
those you listed to the priority 'Suggested'. In terms of Debian package,
it means keeping only libio-string-perl, libwww-perl, liburi-perl and
libxml-writer-perl in our 'Recommends' field, as the others are provided
by our perl package itself.

Thanks a lot for the advice,

-- 
Charles Plessy
Debian-Med packaging team
Wak?, Saitama, Japan


From bix at sendu.me.uk  Wed Mar 19 09:27:11 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 19 Mar 2008 13:27:11 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
References: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
Message-ID: <47E114AF.8030801@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
> One GraphViz trick you could try would be to group Bio::X::* (nodes and 
> your collection groups sharing common Bio::X:: prefixes) together as 
> subgraphs; that should quickly show you which edges go outside of the 
> various "domains", and which are entirely self contained.

Not quite sure if I used the 'trick' you were thinking of, but I now 
'cluster' them as you describe. It's no longer quite as attractively 
proportioned, but I suppose it's more useful :)


> you could also try to distinguish "use base" relationships (i.e. 
> inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
> Bio::X" wrapped in an eval (optional use if available) by various edge 
> colorings -- this might help to further break things up if we can guess at 
> the intended "use" of any Bio::X by Bio::Y.

I haven't distinguished the eval require cases, but now edges are green 
for inheritance and blue for use/require.


I updated the jpeg:
*warning, may crash your browser; download it and view in a dedicated
  image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg


If someone wants to mess with the script so it will output a sane ps 
file for conversion to pdf, please do so. I can't figure out how to get 
it to work correctly.


From Jorge.DUARTE at biogemma.com  Wed Mar 19 11:32:44 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:32:44 +0100
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
Message-ID: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>

Dear Bioperl-users,

could someone give me a hint on how to find SNPs in alignments using 
bioperl objects ?

I found several modules capable of representing Sequence Variations,
but could not understand how to go from an "Align" object to a "Variation" 
object.

Any help would be much appreciated,

Thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com


From avilella at gmail.com  Wed Mar 19 12:59:47 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 19 Mar 2008 16:59:47 +0000
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
In-Reply-To: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
References: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
Message-ID: <358f4d650803190959h744f33f5ha345897565a071b0@mail.gmail.com>

Hi Jorge,

(good to see an ex-EBI in the bioperl-ml :-) )

You can use the method aln_to_population in Bio::PopGen::Utilities:

my $pop = Bio::PopGen::Utilities->aln_to_population($aln);

http://www.bioperl.org/wiki/HOWTO:PopGen#Allele_data_from_Alignments_using_Bio::AlignIO_and_Bio::PopGen::Utilities

Cheers,

    Albert.

On Wed, Mar 19, 2008 at 3:32 PM,  <Jorge.DUARTE at biogemma.com> wrote:
> Dear Bioperl-users,
>
>  could someone give me a hint on how to find SNPs in alignments using
>  bioperl objects ?
>
>  I found several modules capable of representing Sequence Variations,
>  but could not understand how to go from an "Align" object to a "Variation"
>  object.
>
>  Any help would be much appreciated,
>
>  Thanks,
>
>  Jorge.
>
>  ---
>  Jorge Duarte
>  Bioinformatics Software Engineer
>  BIOGEMMA
>  Z.I. Du Br?zet
>  8, Rue des Fr?res Lumi?re
>  63028 CLERMONT FERRAND Cedex 2
>  FRANCE
>  Tel : +33 (0)4 73 39 60 73
>  Fax : +33 (0)4 73 39 60 71
>  E-mail : jorge.duarte at biogemma.com
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jorge.DUARTE at biogemma.com  Wed Mar 19 11:22:57 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:22:57 +0100
Subject: [Bioperl-l] Using Bioperl book
Message-ID: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>

Hello,

i just found on amazon something about a book "Using Bioperl", published 
on the 1st of March 2008 but which is no more available.

Does anyone know how to get it ?

Many thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com

*****************************************************************
       Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
         dans les destinataires lors du premier contact
*****************************************************************


From jason at bioperl.org  Wed Mar 19 13:54:16 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Mar 2008 10:54:16 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>

it's probably more than 6 months out. We still haven't finished  
writing it as life and work continues to intrude on book writing.

-jason
On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:

> Hello,
>
> i just found on amazon something about a book "Using Bioperl",  
> published
> on the 1st of March 2008 but which is no more available.
>
> Does anyone know how to get it ?
>
> Many thanks,
>
> Jorge.
>
> ---
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
>
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From smarkel at accelrys.com  Wed Mar 19 13:53:53 2008
From: smarkel at accelrys.com (Scott Markel)
Date: Wed, 19 Mar 2008 10:53:53 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <OF4972A434.60C53CF4-ON88257411.006214B5-88257411.00625152@accelrys.com>

Jorge,

This is a book that Jason Stajich, Ewan Birney, and I are writing.
We're behind.  So it's not that the book is no longer available,
but that it's not yet available.  Hopefully later this year or
early in 2009.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Board of Directors: International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


bioperl-l-bounces at lists.open-bio.org wrote on 19.03.2008 08:22:57:

> Hello,
> 
> i just found on amazon something about a book "Using Bioperl", published 

> on the 1st of March 2008 but which is no more available.
> 
> Does anyone know how to get it ?
> 
> Many thanks,
> 
> Jorge.
> 
> --- 
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
> 
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From me at hongyu.org  Thu Mar 20 14:54:53 2008
From: me at hongyu.org (Hongyu Zhang)
Date: Thu, 20 Mar 2008 11:54:53 -0700 (PDT)
Subject: [Bioperl-l] Bio::DB::GenBank module
Message-ID: <501654.68882.qm@web51412.mail.re2.yahoo.com>

Dear all,

It seems that some of the important methods in Bio::DB::GenBank module was discontinued right now, such as get_Seq_by_acc(). The corresponding methods have empty content underneath its names. How come?
 
Best,

Hongyu Zhang, Ph.D.
 Ceres Inc., Thousand Oaks, CA
Cell: 805-405-5394
Fax: 866-447-8750


From joseph.fass at gmail.com  Thu Mar 20 18:10:33 2008
From: joseph.fass at gmail.com (Joseph Fass)
Date: Thu, 20 Mar 2008 15:10:33 -0700
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality?
Message-ID: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>

I've written code to trim a certain number of bases (and, possibly,
associated qualities) from fasta (or fastq) format sequences, using:

$seq->seq($seq->subseq($a+1,$len-$b));
and, if it's fastq:
$seq->qual($seq->subqual($a+1,$len-$b));
where:
$len = $seq->length; # defined before changing $seq->seq
$a is the number of bases to trim off the beginning of the sequence
$b is the number of bases to trim off the end of the sequence

The code works for sequences, but for qualities I get a trimmed series of
quality characters that is the correct length and is at the correct
position, but has a number of characters (equal to $a) at the *end* of the
series changed to '!' ... i.e.:

@fake header 1
tcggacaatatatat
+
fjasfiojeq%!@%@

becomes:

@fake header 1 trimmed by 4 at beginning and 3 at end
acaatata
+fake header 1 trimmed by 4 at beginning and 3 at end
fioj!!!!

Since the relevant section of code is short, I'll post it:

my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format);
my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format);
my $seq_length;
while (my $seq = $in->next_seq()) {
  $seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at
end");
  $seq_length = $seq->length;
  $seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e));
  if ($format eq 'fastq') { # if fastq, trim qualities then write out in
fastq format
    $seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e));
    $out->write_fastq($seq); }
  else {$out->write_seq($seq);} # just write out sequence in fasta format
}

Why should the same process work for ->seq and ->subseq but not ->qual and
->subqual?  Please enlighten me ...


-- 
Joseph Fass
jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional)
970.227.5928 (c) || 530.752.2698 (w)


From hlapp at gmx.net  Thu Mar 20 18:49:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 18:49:41 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
References: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
Message-ID: <0F80B40B-0232-4367-8433-992588B6E71B@gmx.net>

Hi Erik, thanks for the report. Given the error message, it looks  
more like the integer (which in reality is a string) can't be  
automatically converted to a string.

That would be equally interesting, though. DBI I thought used to bind  
all parameters as string by default, but maybe that has changed?

The parameter values are indeed all bound generically (and the query  
is created dynamically too), and I'm leaving it up to the DBD drivers  
to do the "Right Thing". I could obviously force everything into type  
string, but that is likely to have it's own repercussions on various  
RDBMSs.

So could you file this as a bug report on bugzilla.open-bio.org  
(category bioperl-db, this is actually not a BioSQL problem), and run  
the following test on your 8.3 instance (which minor version actually?):

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

SELECT * from t1 WHERE a = 1::text;
SELECT * from t1 WHERE b = 1::text;
SELECT * from t1 WHERE c = integer '1';

DROP TABLE t1;

These work all fine on my 8.1.4 instance.

	-hilmar

On Mar 20, 2008, at 3:24 PM, Erik wrote:
> Hi,
>
> (latest BioSQL, bioperl-db, and bioperl-live installed.)
>
> Postgres 8.3 will not auto-cast text (='character
> varying') to integer any longer, which causes test
> t/16odba.t to fail:
>
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: error while executing query in
> Bio::DB::BioSQL::SeqAdaptor::find_by_query: ERROR:
> operator does not exist: character varying = integer
> LINE 1: ...eq.taxon_id FROM bioentry seq WHERE
> seq.identifier = 5456929
>
> It seems likely to cause many similar statements to fail;
> how should this be solved?
>
> I tried to fix it but I couldn't find the place where the
> statement/clauses are put together.
>
>
> Thanks,
>
> Erik Rijkers
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 19:30:03 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 00:30:03 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
 integer any longer
Message-ID: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>

On Thu, March 20, 2008 23:49, Hilmar Lapp wrote:
> Hi Erik, thanks for the report. Given the error message,
> it looks
> more like the integer (which in reality is a string) can't
> be automatically converted to a string.

you are right, of course :)


Here is the postgres 8.3.1 result of your sql statements:

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

SELECT * from t1 WHERE a = 1::text;     -- ok
SELECT * from t1 WHERE b = 1::text;     -- ok
SELECT * from t1 WHERE c = integer '1'; -- ok

The failure is always (virtually) the same:
ERROR:  operator does not exist: character varying = integer
LINE 1: SELECT * from t1 WHERE a = 1;
                                 ^
HINT:  No operator matches the given name and argument
type(s). You might need to add explicit type casts.


Then there is the cast function: for instance, I can let
the test in t/16odba.t proceed faultlessly with

 $seq = $biodb->get_Seq_by_id( "cast(5456929 as text)" );


I am also doubtful/curious as to how this would affect the
various loading scripts which I was going to use - I want
to set up a GBrowse with human/mouse/flybase sequence
annotation to show ChipSeq data against.

But one thing at a time, I guess...


> So could you file this as a bug report on
> bugzilla.open-bio.org
> (category bioperl-db, this is actually not a BioSQL
> problem),

I'll make an entry in bugzilla/bioperl-db.


Thanks for you quick reply!


Erik Rijkers


From David.Messina at sbc.su.se  Thu Mar 20 19:39:49 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 00:39:49 +0100
Subject: [Bioperl-l] Bio::DB::GenBank module
In-Reply-To: <501654.68882.qm@web51412.mail.re2.yahoo.com>
References: <501654.68882.qm@web51412.mail.re2.yahoo.com>
Message-ID: <628aabb70803201639y33df19a6ib83967c33dd90b7f@mail.gmail.com>

Hi Hongyu,

Those methods are inherited. get_Seq_by_acc(), for example, comes from
Bio::DB::WebDBSeqI.

The BioPerl Deobfuscator is one way to see where the methods a given class
has are actually coded. Here's the Deobfuscator view of
Bio::DB::GenBank.<http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ADB%3A%3AGenBank&sort_order=by+method&search_string=Bio%3A%3ADB%3A%3AGenBank>


Dave


From hlapp at gmx.net  Thu Mar 20 20:34:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 20:34:42 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
References: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
Message-ID: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>


On Mar 20, 2008, at 7:30 PM, Erik wrote:
> Here is the postgres 8.3.1 result of your sql statements:
>
> CREATE TABLE t1 (a varchar(10), b text, c integer);
>
> SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
> SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
> SELECT * from t1 WHERE c = '1'; -- ok
>
> [...]
> The failure is always (virtually) the same:
> ERROR:  operator does not exist: character varying = integer
> LINE 1: SELECT * from t1 WHERE a = 1;
>                                  ^
> HINT:  No operator matches the given name and argument
> type(s). You might need to add explicit type casts.


So it's indeed the backend that changed behavior. It's actually  
documented as I see now:

http://www.postgresql.org/docs/8.3/static/release-8-3.html

scroll to section E.2.2. Migration to Version 8.3, E.2.2.1. General,  
and the first item there:

<quote>
Non-character data types are no longer automatically cast to TEXT  
(Peter, Tom)

Previously, if a non-character value was supplied to an operator or  
function that requires text input, it was automatically cast to text,  
for most (though not all) built-in data types. This no longer  
happens: an explicit cast to text is now required for all non- 
character-string types.
</quote>

I can see the arguments there but this will prevent upgrading to 8.3  
for many many applications, and the comments from the Pg developers  
('fix your SQL to use casts') that I've seen there on the mailing  
lists are just not helpful. Fixing SQL is for many legacy  
applications is just not an option.

In the case of Bioperl-db it's very non-trivial, because all of a  
sudden we would be changing from a hands-off and let-the-driver- 
figure-it-out approach to forcing types everywhere.

So I think at this point with this change I have to declare Bioperl- 
db officially incompatible with PostgreSQL 8.3+ until we've found a  
solution to this, which is too bad because it seems 8.3 has some  
really nice performance features added.

One possible solution might be to create a CAST in the database  
(namely the one that was taken away, restoring behavior to pre-8.3).  
Another possibility is to move the parameter binding method into the  
driver adaptor which would then delegate to the DBI method but would  
be overridden for the PostgreSQL adapter to force all bindings to  
type string.

Which leads me back to the surprise observation that the parameter  
was bound as an integer in the first place, when DBD::Pg used to bind  
everything as string unless you told it otherwise. Which DBD::Pg  
version is it that you are using? I would suspect (or hope) that  
maybe there is soon an update release of DBD::Pg that fixes this  
problem by going back to binding everything as string by default (and  
as the tests show PostgreSQL will still convert strings to integer if  
necessary).

Depending on what I (or can someone else update us on this?) find out  
for the DBD::Pg plans, I'll probably start looking into moving the  
parameter binding into the driver adapters. Though it does feel  
pathetic that this is now also not transparent between drivers.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 20:51:43 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 01:51:43 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
Message-ID: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>

On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>
> So I think at this point with this change I have to
> declare Bioperl-
> db officially incompatible with PostgreSQL 8.3+ until
> we've found a
> solution to this, which is too bad because it seems 8.3
> has some
> really nice performance features added.

Pg 8.3 is indeed very noticably faster, and it has other
excellent new features like full text indexing. (This also
makes that downgrading is not really an option)


> Which DBD::Pg version is it that you are using?

DBD::Pg 2.3.0


Thanks,

Erik Rijkers


From hlapp at gmx.net  Thu Mar 20 21:36:50 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 21:36:50 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
References: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
Message-ID: <071CB899-AB3E-40B8-9477-82AE98DB88B1@gmx.net>


On Mar 20, 2008, at 8:51 PM, Erik wrote:
> On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>>
>> So I think at this point with this change I have to declare  
>> Bioperl-db officially incompatible with PostgreSQL 8.3+ until  
>> we've found a solution to this, which is too bad because it seems  
>> 8.3 has some really nice performance features added.
>
> Pg 8.3 is indeed very noticably faster, and it has other
> excellent new features like full text indexing. (This also
> makes that downgrading is not really an option)

Right, I saw that too. It is, however, just migrated from what was a  
contrib module before, so downgrading and using the contrib module is  
an option.

Furthermore, folding these new features together with a behavior  
change that is backwards incompatible was a choice the PostgreSQL  
people made, not we.

We also aren't doing poor typing that deserves fixing; we're just not  
doing any typing by treating everything as a string. This is the Perl  
paradigm.

At this point it's actually unclear to me how this new behavior is  
compatible with untyped scripting languages unless you know the type  
of each column that you're binding a value for, because if you  
actually force typecasts to string for everything you get an error if  
an integer is indeed what's needed.

I'm wondering what I'm missing.

	-hilmar

BTW what does the following query yield on your 8.3.1 database:

select s.typname as source, t.typname as target, f.proname as  
function, c.castcontextfrom pg_cast c, pg_type s, pg_type t, pg_proc  
f where c.castsource = s.oid and c.casttarget = t.oid and c.castfunc  
= f.oidand t.typname = 'text';

On my 8.1.4 database I get:

   source    | target | function | castcontext
-------------+--------+----------+-------------
  bpchar      | text   | text     | i
  char        | text   | text     | i
  name        | text   | text     | i
  int8        | text   | text     | i
  int2        | text   | text     | i
  int4        | text   | text     | i
  oid         | text   | text     | i
  float4      | text   | text     | i
  float8      | text   | text     | i
  macaddr     | text   | text     | e
  cidr        | text   | text     | e
  inet        | text   | text     | e
  date        | text   | text     | i
  time        | text   | text     | i
  timestamp   | text   | text     | i
  timestamptz | text   | text     | i
  interval    | text   | text     | i
  timetz      | text   | text     | i
  numeric     | text   | text     | i
(19 rows)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From greg at turnstep.com  Thu Mar 20 22:41:10 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Fri, 21 Mar 2008 02:41:10 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>
Message-ID: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Which leads me back to the surprise observation that the parameter
> was bound as an integer in the first place, when DBD::Pg used to bind
> everything as string unless you told it otherwise. Which DBD::Pg
> version is it that you are using? I would suspect (or hope) that
> maybe there is soon an update release of DBD::Pg that fixes this
> problem by going back to binding everything as string by default (and
> as the tests show PostgreSQL will still convert strings to integer if
> necessary).
>
> Depending on what I (or can someone else update us on this?) find out
> for the DBD::Pg plans, I'll probably start looking into moving the
> parameter binding into the driver adapters. Though it does feel
> pathetic that this is now also not transparent between drivers.

What you are probably looking for is already there, namely:

$dbh->{pg_server_prepare} = 0;

There's good reasons for the casting enforcement in 8.3, although I've
been a sharp critic of the change, and certainly of the suddeness
of it. Another solution to consider is adding the casts back in:

http://people.planetpostgresql.org/peter/index.php?/archives/2008/03.html
(the March 4th entry)

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803202237
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfjIBYACgkQvJuQZxSWSsiamwCdEbNrC4F4oU7AGHrbHAm1YNXG
HbUAoIRJtGW4brvMKklxZYG6pusbcTqf
=Zawx
-----END PGP SIGNATURE-----


From David.Messina at sbc.su.se  Fri Mar 21 04:36:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 09:36:16 +0100
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or
	Bio::Seq::SeqWithQuality?
In-Reply-To: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
References: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
Message-ID: <628aabb70803210136p11de495p26d0ffaebbc3370e@mail.gmail.com>

Hi Joseph,

This looks like a bug; I saw the same thing here. Could you please submit
this to the bug tracker <http://www.bioperl.org/wiki/Bugs> along with your
test code?


Thanks,
Dave


From hlapp at gmx.net  Fri Mar 21 08:52:39 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 21 Mar 2008 08:52:39 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
References: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
Message-ID: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>

Hi Greg - thanks for your email, it's very helpful.

On Mar 20, 2008, at 10:41 PM, Greg Sabino Mullane wrote:
>>
>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

So disabling server-side prepares will leave values quoted? Having  
server-side prepares would be very useful though, especially for  
Bioperl-db with its many lookup queries that all use similar  
parameter values.

>
> There's good reasons for the casting enforcement in 8.3

I do understand that, but it's also a sharp contrast to other RDBMSs  
that doesn't it make it easier for people to choose Pg when they  
should, and doesn't help writing cross-platform database applications  
either.

> although I've been a sharp critic of the change, and certainly of  
> the suddeness
> of it. Another solution to consider is adding the casts back in:
>
> http://people.planetpostgresql.org/peter/index.php?/archives/ 
> 2008/03.html
> (the March 4th entry)


Thanks for this, that helps a lot.

Do you have links to some of the key threads showing what rationale  
went into the decision? (Or should I just search for your name?) I'd  
like to read up on that first before pouring more oil into the fire.  
I suspect that many of those who made the decision are never faced  
with needing to write cross-RDBMS code.

Also, I wonder why this wasn't made a configurable option so it can  
be disabled by a simple config file change (such as the move away  
from automatic OID columns). But obviously this is the wrong list for  
discussing this (though Bioperl-db *is* one of those pieces of  
software that must be cross-RDBMS).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From baucom at msg.ucsf.edu  Fri Mar 21 16:13:00 2008
From: baucom at msg.ucsf.edu (Albion Baucom)
Date: Fri, 21 Mar 2008 13:13:00 -0700
Subject: [Bioperl-l] SearchIO Performance
Message-ID: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>

Hi. I am pretty new to BioPerl, and have a question about performance  
with regard to Blast (nucleotide) file parsing. My Blast result files  
usually have close to 100 or more sequence hits. Each sequence is  
about 1400 nucleotides long.

After profiling code I wrote, I find that calling the next_result()  
function after creating a search object takes substantially longer  
than non-OO, quick and dirty code I am using to parse the same Blast  
files.

What is substantially longer? Well, the existing code takes about 0.25  
seconds, and the BioPerl call takes about 4.5 seconds. I find that to  
be a dramatic difference, and that kind of time difference becomes  
significant when I have to parse 30 Blast files in a row. I understand  
that SearchIO is parsing the entire file and storing it all for easy  
retrieval later, and maybe this time penalty is what I have to pay for  
that convenience and organization.

I am just wondering if there is anything other than writing custom  
code based on BioPerl to speed this up. Something I might not be aware  
of that I can do ahead of time, or during parsing, to limit what is  
parsed, or facilitate the parsing process. For instance, is there a  
way to "look ahead" and simply parse alignments that meet a specific  
expectancy cutoff?

I confess I have not read the documentation thoroughly (although  
obviously enough to make it do what I want), but am certainly willing  
to do so if someone can point me in the right direction.

Thanks

Albion


From jason at bioperl.org  Fri Mar 21 17:40:00 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 21 Mar 2008 14:40:00 -0700
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
Message-ID: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>


On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:

> Hi. I am pretty new to BioPerl, and have a question about  
> performance with regard to Blast (nucleotide) file parsing. My  
> Blast result files usually have close to 100 or more sequence hits.  
> Each sequence is about 1400 nucleotides long.
>
> After profiling code I wrote, I find that calling the next_result()  
> function after creating a search object takes substantially longer  
> than non-OO, quick and dirty code I am using to parse the same  
> Blast files.
>
> What is substantially longer? Well, the existing code takes about  
> 0.25 seconds, and the BioPerl call takes about 4.5 seconds. I find  
> that to be a dramatic difference, and that kind of time difference  
> becomes significant when I have to parse 30 Blast files in a row. I  
> understand that SearchIO is parsing the entire file and storing it  
> all for easy retrieval later, and maybe this time penalty is what I  
> have to pay for that convenience and organization.
>
> I am just wondering if there is anything other than writing custom  
> code based on BioPerl to speed this up. Something I might not be  
> aware of that I can do ahead of time, or during parsing, to limit  
> what is parsed, or facilitate the parsing process. For instance, is  
> there a way to "look ahead" and simply parse alignments that meet a  
> specific expectancy cutoff?
>
> I confess I have not read the documentation thoroughly (although  
> obviously enough to make it do what I want), but am certainly  
> willing to do so if someone can point me in the right direction.
>
We are quite aware of the speed issues.  This is discussed on the  
wiki in brief detail.
http://bioperl.org/wiki/Why_BioPerl_is_slow

It boils down to the object creation not the parsing (relatively  
speaking).  It takes a while because we're creating a lot of objects  
under the hood for each alignment.  Sendu has written a pull parser  
that doesn't require creation of all the objects until the user  
requests them.
As I've said in the past, if someone wrote SearchIO event-listener  
that created lightweight objects (or just hashes) instead this would  
also provide a substantial speedup.

In the fall I did some experimentation with array-based instead of  
hash-based feature objects got a pretty decent speedup as well, but  
just haven't had any time to roll out a more substantial  
prototyping.  For the inner-loops of things it may make sense to  
substitute a less-flexible but super-fast object.

I always advocate thinking about what your needs are - if you just  
want start/stop of alignments, you can grab this out of a blast  
format  table with the -m9 (NCBI) or --mformat =3 (WUBLAST) and you  
can write a fast parser that uses 'split'.


> Thanks
>
> Albion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From er at xs4all.nl  Fri Mar 21 17:43:47 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 22:43:47 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
 swissprot
Message-ID: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>

Hi,

PostgreSQL 8.3.1
DBD::Pg 2.3.0
perl 5.8.8

(The following error may have to do with the 8.3 problems
that I reported yesterday (bug 2472) - I don't know)

 I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
problem.

Then I ran scripts/biosql/load_seqdatabase.pl as:

perl scripts/biosql/load_seqdatabase.pl \
  -driver Pg \
  -dbuser xxxxxxx \
  -dbname bioseqdb \
  -namespace swissprot \
  -format swiss \
   /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat

It took two hours to load 26504 records (7%) of
uniprot_sprot.dat (is it expected to be so slow?), then
failed with:

Could not store Q2UXW0:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Species) failed to insert or to
be found by unique key
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK: Bio::DB::Persistent::PersistentObject::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: scripts/biosql/load_seqdatabase.pl:630
-----------------------------------------------------------


I don't know if this is directly related to the 8.3
casting problems I reported yesterday (bug 2472), or a
separate Bio::Species issue


regards,

Erik Rijkers


From bix at sendu.me.uk  Fri Mar 21 19:17:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 21 Mar 2008 23:17:59 +0000
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
	<8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
Message-ID: <47E44227.3050002@sendu.me.uk>

Jason Stajich wrote:
> 
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
> 
>> Hi. I am pretty new to BioPerl, and have a question about performance 
>> with regard to Blast (nucleotide) file parsing.
[...]
>> What is substantially longer? Well, the existing code takes about 0.25 
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to 
>> be a dramatic difference, and that kind of time difference becomes 
>> significant when I have to parse 30 Blast files in a row. I understand 
>> that SearchIO is parsing the entire file and storing it all for easy 
>> retrieval later, and maybe this time penalty is what I have to pay for 
>> that convenience and organization.
[...]
> Sendu has written a pull parser that 
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that 
> created lightweight objects (or just hashes) instead this would also 
> provide a substantial speedup.

Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the 
format to 'blast_pull'. Depending on the cirumstance and thoughtful 
usage, you can see orders of magnitude speed up.

http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html

The only disadvantage to the normal parser is that the pull parser 
currently only supports NCBI BLASTN and BLASTP.


From hlapp at gmx.net  Sat Mar 22 14:18:45 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 14:18:45 -0400
Subject: [Bioperl-l] Call for Student Applications - NESCent participates in
	the Google Summer of Code
In-Reply-To: <0025B440-EF1E-4632-9DB4-B98489BF3550@duke.edu>
Message-ID: <5AC4F213-8D88-41C6-B380-59B2EF7831F0@gmx.net>

Hi all - just wanted to draw your attention to our Google Summer of  
Code participation this year. One of the projects deals directly with  
BioPerl, another one builds on BioSQL (and could be implemented  
taking advantage of BioPerl or Bio::Phylo, or Biojava).

Cheers,

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================

Phyloinformatics Summer of Code 2008
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

*** Please disseminate this announcement widely to appropriate students
at your institution ***

The National Evolutionary Synthesis Center (NESCent: http:// 
www.nescent.org/) is participating in 2008 for the second year as a  
mentoring organization in the Google Summer of Code (http:// 
code.google.com/soc). Through this program, Google provides  
undergraduate, masters, and PhD students with a unique opportunity to  
obtain hands-on experience writing and extending open-source software  
under the mentorship of experienced developers from around the world.

Our goal in participating is to train future researchers and  
developers to not only have awareness and understanding of the value  
of open-source and collaboratively developed software, but also to  
gain the programming and remote collaboration skills needed to  
successfully contribute to such projects. Students will receive a  
stipend from Google, and may work from their home, or home  
institution, for the duration of the 3 month program. Students will  
each have one or more dedicated mentors with expertise in  
phylogenetic methods and open-source software development.

NESCent is particularly targeting students interested in both  
evolutionary biology and software development. Project ideas (see URL  
below) range from visualizing phylogenetic data in R, to development  
of a Mesquite module, web-services for phylogenetic data providers or  
geophylogeny mashups, implementing phyloXML support, navigating  
databases of networks, topology queries for PhyloCode registries, to  
phylogenetic tree mining in a MapReduce framework, and more.

The project ideas are flexible and many can be adjusted in scope to  
match the skills of the student. If the program sounds interesting to  
you but you are unsure whether you have the necessary skills, please  
email the mentors at the address below.  We will work with you to  
find a project that fits your interests and skills.

INQUIRIES:
Email any questions, including self-proposed project ideas, to  
phylosoc {at}
nescent {dot} org.

TO APPLY:
Apply on-line at the Google Summer of Code website
(http://code.google.com/soc/2008), where you will also find GSoC program
rules and eligibility requirements.  The 1-week application period for
students opens on Monday March 24th and runs through Monday, March  
31st, 2008.

Hilmar Lapp and Todd Vision
US National Evolutionary Synthesis Center

=====
URLs:
=====

2008 NESCent Phyloinformatics Summer of Code:
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

Eligibility requirements:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_eligibility

Stipends:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_administrivia

To sign up for quarterly NESCent newsletters: with announcements about
upcoming programs at the Center:
http://www.nescent.org/about/contact.php


From hlapp at gmx.net  Sat Mar 22 15:30:07 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 15:30:07 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <14191FB6-A8DF-4F60-9FF7-FDAA8F3974B8@gmx.net>

Hi Erik,

I suspect that's a seperate Bio::Species issue. If you query your  
BioSQL database for the existence of the taxon:

SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

do you get a result? If not, then for some reason the taxon wasn't  
yet included in the NCBI taxonomy that you loaded.

If yes, then somehow BioPerl didn't properly parse out the taxonID  
from the record. There should have been another message preceding the  
error below, could you post that too? Otherwise, can you rerun with -- 
printerror as command line argument?

Note also that you can always specify --safe to go past any loading  
error. In fact that's what I recommend doing unless you want to debug  
why a particular record doesn't load.

BTW I would recommend that you restore the CASTs that were removed in  
Pg 8.3; otherwise you may hit random issues in Bioperl-db whenever a  
parameter value for a string-type column happens to be a number.  
(taxon.ncbi_taxon_id is of type integer)

See http://people.planetpostgresql.org/peter/index.php?/archives/18- 
Readding-implicit-casts-in-PostgreSQL-8.3.html

as per Greg's email.

	-hilmar

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> Hi,
>
> PostgreSQL 8.3.1
> DBD::Pg 2.3.0
> perl 5.8.8
>
> (The following error may have to do with the 8.3 problems
> that I reported yesterday (bug 2472) - I don't know)
>
>  I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
> problem.
>
> Then I ran scripts/biosql/load_seqdatabase.pl as:
>
> perl scripts/biosql/load_seqdatabase.pl \
>   -driver Pg \
>   -dbuser xxxxxxx \
>   -dbname bioseqdb \
>   -namespace swissprot \
>   -format swiss \
>    /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat
>
> It took two hours to load 26504 records (7%) of
> uniprot_sprot.dat (is it expected to be so slow?), then
> failed with:
>
> Could not store Q2UXW0:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: create: object (Bio::Species) failed to insert or to
> be found by unique key
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:206
> STACK: Bio::DB::Persistent::PersistentObject::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:244
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:169
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:251
> STACK: Bio::DB::Persistent::PersistentObject::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:271
> STACK: scripts/biosql/load_seqdatabase.pl:630
> -----------------------------------------------------------
>
>
> I don't know if this is directly related to the 8.3
> casting problems I reported yesterday (bug 2472), or a
> separate Bio::Species issue
>
>
> regards,
>
> Erik Rijkers
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Mar 22 16:01:51 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 16:01:51 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
	swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <69D3EA33-810B-40EA-8687-752FA1A34FBF@gmx.net>

Forgot to respond to this:

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> It took two hours to load 26504 records (7%) of uniprot_sprot.dat  
> (is it expected to be so slow?)


The last time I used to load those regularly it was a bit faster (~ 5  
seqs/s) but it is in a ballpark that wouldn't raise a red flag for me.

BTW you can make it print statistics using the --logchunk N option,  
where N is the number of seqs after which you want the current count  
and the #recs/s printed.

You may get it to be faster if you tune the database (e.g., make sure  
there is enough memory for index reorganization, transaction log and  
tablespace datafile are on separate disks, etc; fiddling with the  
query optimizer has probably little effect as almost all queries are  
simple lookups or inserts).

That all said, the strength of load_seqdatabase.pl isn't speed. It  
doesn't make use of any bulk upload optimizations, and therefore the  
initial load of a very large database will take its time. The power  
is more in subsequent updates where you can configure what you want  
to happen, and during which the database is never in an inconsistent  
state, so it can run in the background.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Sat Mar 22 16:34:14 2008
From: er at xs4all.nl (Erik)
Date: Sat, 22 Mar 2008 21:34:14 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>

On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:

> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

No, I don't seem to have that particular id, although I
ran the ncbi load script yesterday just before the sprot. 
Btw, in the meantime I figured out that it was a parsing
error choking on an unexpected period.

You asked for preceding errors, but there were none.

I have now restarted the same uniprot_sprot.dat load with
--safe, which if I understand you correctly will just skip
any non-parsable records.


And wrt the postgres 8.3 casting:

I only added the first cast of the list of Peter Eisentraut:

CREATE FUNCTION pg_catalog.text(integer) RETURNS text
STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT
textin(int4out($1));'; --added 20080322
CREATE CAST (integer AS text) WITH FUNCTION
pg_catalog.text(integer) AS IMPLICIT; --added 20080322

I hope eventually a more durable solution will be found -
I fear this reinstalling of old casting functionality will
generate unexpected problems of it's own.

But it seems a good intermediary solution; with it, the
previously failing t/16odba.t succeeds...


Thank you,

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 17:16:18 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 17:16:18 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
References: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
Message-ID: <3C253027-5A2B-4C0C-9DF6-A0DA84CC96A8@gmx.net>


On Mar 22, 2008, at 4:34 PM, Erik wrote:
> On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:
>
>> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;
>
> No, I don't seem to have that particular id, although I
> ran the ncbi load script yesterday just before the sprot.

Odd. It's on the NCBI taxonomy browser. Maybe just was added the  
other day?

> Btw, in the meantime I figured out that it was a parsing
> error choking on an unexpected period.

Do you want to report that to the BioPerl category on bugzilla.open- 
bio.org?

>
> You asked for preceding errors, but there were none.
>
> I have now restarted the same uniprot_sprot.dat load with
> --safe, which if I understand you correctly will just skip
> any non-parsable records.

And all records that cause some other database error when inserting.  
Note that if you didn't erase the previously loaded records, you will  
either need to choose a new namespace, or, better, use the --lookup  
and --noupdate flags.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mrphysh at juno.com  Sat Mar 22 18:39:22 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Sat, 22 Mar 2008 22:39:22 GMT
Subject: [Bioperl-l] these objects are pretty cool
Message-ID: <20080322.163922.21808.1@webmail01.vgs.untd.com>


I am starting to understand how to use the objects.  

I am the sort who wants to understand how things work, at least on some level.  I think my understanding wold be increased with knowledge of the actual contents of the object.  My book says they are hashes and that makes sense:  field-value.......field-value.......field-value 

But as far as I can tell they cannot be taken apart like a regular hash.

How can I print out the contents?  this makes sense to me:
xxxxxxxxxxxxxxxx
use Bio::SeqIO;

#these objects were made for file input...conversion...file output

while ( my $seq = $out->next_seq() ) {print "$seq\n"; }                        

#or..how about this?

foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }   
xxxxxxxxxxxxxx
this gives back:

Bio::Seq::RichSeq=HASH(0x860dcdc)
Bio::Seq=HASH(0x85f5a20)

Is there some way to slice up the object and look at the parts?

John 
_____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2121/fc/REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/


From jason at bioperl.org  Sat Mar 22 18:46:46 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 22 Mar 2008 15:46:46 -0700
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <20080322.163922.21808.1@webmail01.vgs.untd.com>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
Message-ID: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>

it's got methods that you need to call to get the data.  Did you try  
looking at any of the howtos - they discuss this sort of thing.

http://bioperl.org/wiki/HOWTOs

-jason
On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:

>
> I am starting to understand how to use the objects.
>
> I am the sort who wants to understand how things work, at least on  
> some level.  I think my understanding wold be increased with  
> knowledge of the actual contents of the object.  My book says they  
> are hashes and that makes sense:  field-value.......field- 
> value.......field-value
>
> But as far as I can tell they cannot be taken apart like a regular  
> hash.
>
> How can I print out the contents?  this makes sense to me:
> xxxxxxxxxxxxxxxx
> use Bio::SeqIO;
>
> #these objects were made for file input...conversion...file output
>
> while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>
> #or..how about this?
>
> foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
> xxxxxxxxxxxxxx
> this gives back:
>
> Bio::Seq::RichSeq=HASH(0x860dcdc)
> Bio::Seq=HASH(0x85f5a20)
>
> Is there some way to slice up the object and look at the parts?
>
> John
> _____________________________________________________________
> Click to get a free auto insurance quotes from top companies.
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robfsouza at gmail.com  Sat Mar 22 19:11:49 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 22 Mar 2008 20:11:49 -0300
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
	<DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
Message-ID: <af6a4f100803221611w393e2dd1r965c11afd45978aa@mail.gmail.com>

Hi John,

The methods usually are designed to provide simple means to access the
object's internal data structure. They are intended to hide the
internal data structure, replacing the need to understand it by a
binding to the documented class methods.

If the documentation does not satisfy you and you still want to take a
look at its data structure, try dumping the object with a code like

use Bio::SeqIO;
use Data::Dumper;
while ( my $seq = $out->next_seq() ) { print Dumper($seq),"\n"; }

and check ou chapters four and five of Programming Perl.
Best,
Robson

PS: watch out for lots of printed output...

2008/3/22, Jason Stajich <jason at bioperl.org>:
> it's got methods that you need to call to get the data.  Did you try
>  looking at any of the howtos - they discuss this sort of thing.
>
>  http://bioperl.org/wiki/HOWTOs
>
>
>  -jason
>
> On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:
>
>  >
>  > I am starting to understand how to use the objects.
>  >
>  > I am the sort who wants to understand how things work, at least on
>  > some level.  I think my understanding wold be increased with
>  > knowledge of the actual contents of the object.  My book says they
>  > are hashes and that makes sense:  field-value.......field-
>  > value.......field-value
>  >
>  > But as far as I can tell they cannot be taken apart like a regular
>  > hash.
>  >
>  > How can I print out the contents?  this makes sense to me:
>  > xxxxxxxxxxxxxxxx
>  > use Bio::SeqIO;
>  >
>  > #these objects were made for file input...conversion...file output
>  >
>  > while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>  >
>  > #or..how about this?
>  >
>  > foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
>  > xxxxxxxxxxxxxx
>  > this gives back:
>  >
>  > Bio::Seq::RichSeq=HASH(0x860dcdc)
>  > Bio::Seq=HASH(0x85f5a20)
>  >
>  > Is there some way to slice up the object and look at the parts?
>  >
>  > John
>  > _____________________________________________________________
>  > Click to get a free auto insurance quotes from top companies.
>  > http://thirdpartyoffers.juno.com/TGL2121/fc/
>  > REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>  >
>  >
>  >
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From er at xs4all.nl  Sat Mar 22 19:36:13 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 00:36:13 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>

Hi Hilmar,

> either need to choose a new namespace, or, better, use the
> --lookup and --noupdate flags.

scripts/biosql/load_seqdatabase.pl is now churning along
to load uniprot_sprot.dat.  I'll try to gather up the
rejected records for further inspection / parser
improvement.

The next thing is performance, it's really intolerably
slow, and I don't think the database is the bottleneck -
isn't it more likely bioperl object heaviness?  I get
continuous near 100% load for 1 cpu (this machine has 2
cpus).

I could give it 10 or more processors; I am thinking I
could cut up the input into 10 (or more) chunks.

Is there anything specific in bioperl/biosql that knows
how to use multiple cores?

thank you very much for your help

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 21:40:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 21:40:55 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
References: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
Message-ID: <1B55060F-534D-4732-B428-4863DD098654@gmx.net>


On Mar 22, 2008, at 7:36 PM, Erik wrote:
> The next thing is performance, it's really intolerably
> slow, and I don't think the database is the bottleneck -
> isn't it more likely bioperl object heaviness?  I get
> continuous near 100% load for 1 cpu (this machine has 2
> cpus).


Is the database on the same machine? If yes, and a significant  
fraction (~30-50% or even more) of the load are generated by the perl  
script, rather than almost everything coming from the postmaster,  
then indeed the database is not the bottleneck.

Of course, the bioperl object creation overhead takes a toll too. I  
would be surprised though if BioPerl can't parse more than 3.6  
records/s on a modern CPU; you can convince yourself of that though  
by writing a simple script along the lines of the following and see  
how fast that goes:

my $seqio = Bio::SeqIO->new(-file => '<uniprot_sprot.dat', -format =>  
'swiss);
my $n = 0;
while (my $seq = $seqio->next_seq) {
	$n++;
	# print something every 5,000 sequences or so
}

But maybe load_seqdatabase.pl or even BioSQL or BioPerl aren't  
suitable for your use-case?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Mar 23 10:09:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:09:56 -0500
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
	<0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
Message-ID: <4C401D4F-064C-43F9-A37C-14FA65A96657@uiuc.edu>

Maybe something to discuss at BOSC?

chris

On Mar 19, 2008, at 12:54 PM, Jason Stajich wrote:

> it's probably more than 6 months out. We still haven't finished  
> writing it as life and work continues to intrude on book writing.
>
> -jason
> On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:
>
>> Hello,
>>
>> i just found on amazon something about a book "Using Bioperl",  
>> published
>> on the 1st of March 2008 but which is no more available.
>>
>> Does anyone know how to get it ?
>>
>> Many thanks,
>>
>> Jorge.
>>
>> ---
>> Jorge Duarte
>> Bioinformatics Software Engineer
>> BIOGEMMA
>> Z.I. Du Br?zet
>> 8, Rue des Fr?res Lumi?re
>> 63028 CLERMONT FERRAND Cedex 2
>> FRANCE
>> Tel : +33 (0)4 73 39 60 73
>> Fax : +33 (0)4 73 39 60 71
>> E-mail : jorge.duarte at biogemma.com
>>
>> *****************************************************************
>>       Pour toute demande de support merci d'inclure
>> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>>         dans les destinataires lors du premier contact
>> *****************************************************************
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Mar 23 10:17:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:17:56 -0500
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <FC757880-6FD8-44A5-97BE-F5F37A738C2B@uiuc.edu>


On Mar 18, 2008, at 10:32 AM, Sendu Bala wrote:

> aaron.j.mackey at gsk.com wrote:
>>> Or is the split intended to be 'core' == "anything and everything
>>> that was in 1.4", '????' == "everything else"? In which case,
>>> what's a good name for "modules created after 1.4"? 'crust'? ;)
>> Nah, "icing".
>> a module "use" map might be very useful to help identify "core" vs.
>> other layers of mantle/crust/icing.
>> http://www.perlmonks.org/?node_id=87329 http://search.cpan.org/src/NEILB/pmusage-1.2/
>
> Thanks for those. Neither could quite cope with BioPerl, but I've  
> munged
> them together and hacked up 'module_usage.pl' which I've just  
> committed
> to the maintenance directory of bioperl-live.
>
> module_usage.pl ../Bio
>
> Produces:
> *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
>
> ...
>
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)

I'm coming into this late (just got back) but I agree, this would be  
very useful.  Your updates based on Aaron's comments help quite a bit.

> Chris Fields wrote:
>> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
>> I'm pretty flexible on any of that; it's a proposal only and I think
>> some of it may be wrongheaded, but hey, I'm willing to take a few
>> rotten tomatoes.  The key issue is we should try to work out what we
>> mean by 'core' or the core library.  I have a rather extreme view of
>> it as being the bare essentials without external, non-perl core
>> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
>> and required modules for those classes) but I'm sure others would
>> lump in parsers, DB functionality, etc.  I basically suggest placing
>> those (and any stable but potentially non-core code) in a
>> 'bioperl-main', with any unstable or untested code going into a
>> 'bioperl-unstable'.
>
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>  stable APIs), directory structure and base classes that everything
>  else relies on
> # It might not do much useful bioinformatics, but provides just about
>  everything needed for a dev to create a new module that does

Yes, that's essentially the idea.

>> In essence, bioperl-main would require core and resemble a stable
>> release; bioperl-unstable would require bioperl-main (and core) and
>> resemble a dev release.  Not sure how versioning would go or if this
>> is a viable option at all, but it's worth discussing.
>
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>  tree, mostly parsers and the like which, whilst 'stable' and tested
>  should still be split away from core because the data sources they
>  parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>  stuff, be it new parsers for totally new formats, new APIs that do
>  something not thought of before etc. When they are complete, bug-free
>  and have stood the test of time they get moved into bioperl-main.
>  (It is not a place for all new commits; bug fixes to something in
>  bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>  their own core and bleed variant. Anything they need for core
>  functionality would enter the single bioperl-core, anything new
>  would enter the single bioperl-bleed, and anything stable would
>  be in their own bioperl-[package]
>
> Discuss :)

We can work on updating the plan via the wiki as well as the mail  
list.  I find it easier to track; we can always link back to the mail  
list when needed.

http://www.bioperl.org/wiki/Proposed_1.6_core_modules
http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules

chris


From er at xs4all.nl  Sun Mar 23 14:16:05 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 19:16:05 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
> aren't suitable for your use-case?

well, that may turn out to be the case, but I'm not quite
deterred yet.

I am in a situation like many others, I think: microarray,
mass spec, and chipseq (Solexa) data all need
annotation,and while it is easy to retrieve some useful
records from public data sources (entrez, ensembl &
biomart, etc.), it is not so easy to have such high
atomicity in the locally stored annotation data that
fine-grained filtering and sorting on a sql level becomes
possible.  I hope the bioperl parsers, together with the
biosql schema, will give SQL access to all or most data
bits.

And I understand GBrowse can run on top of BioSQL/Pg too,
albeit somewhat preliminary; this is another usage I will
need.

btw, should not all those references to postgres 7.3 be
upgraded to something newer, like 8.2.7 (maybe not yet 8.3
heh) ?  7.3 is not supported anymore by the pg project.

Sprot loaded in 20 hours. Only 170 were rejected - not too
bad.

Thanks,

Erik Rijkers


From hlapp at gmx.net  Sun Mar 23 15:22:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 23 Mar 2008 15:22:46 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
References: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
Message-ID: <EF4031EF-FBB5-493E-8D3C-8530E2ADFC20@gmx.net>


On Mar 23, 2008, at 2:16 PM, Erik wrote:
> On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
>> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
>> aren't suitable for your use-case?
>
> well, that may turn out to be the case, but I'm not quite
> deterred yet.
>
> I am in a situation like many others, I think: microarray,
> mass spec, and chipseq (Solexa) data all need
> annotation,and while it is easy to retrieve some useful
> records from public data sources (entrez, ensembl &
> biomart, etc.), it is not so easy to have such high
> atomicity in the locally stored annotation data that
> fine-grained filtering and sorting on a sql level becomes
> possible.  I hope the bioperl parsers, together with the
> biosql schema, will give SQL access to all or most data
> bits.

If you mean annotation by data bits then yes, it should be fairly  
normalized (possibly more normalized than you want, in fact).

Also, using BioSQL as the sequence and sequence annotation model add- 
on to some other database holding your lab data is what many others  
have used it for too.

>
> And I understand GBrowse can run on top of BioSQL/Pg too,
> albeit somewhat preliminary; this is another usage I will
> need.

It can, though keep in mind that that's not the use-case it (BioSQL)  
was built for. If you need to have rapid access to genome intervals  
with 10s of thousands of features and their annotation, you'll have  
start thinking about a more de-normalized data store to run this off  
of, such as populating a native GBrowse GFF store.

>
> btw, should not all those references to postgres 7.3 be
> upgraded to something newer, like 8.2.7 (maybe not yet 8.3
> heh) ?  7.3 is not supported anymore by the pg project.

Oops, indeed. Where are they?

>
> Sprot loaded in 20 hours. Only 170 were rejected - not too
> bad.

That's great. Would be nice if you can provide some rough summary as  
to why they were rejected (if that's obvious), such as taxon errors,  
or other errors.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From sac at bioperl.org  Sun Mar 23 18:20:43 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Sun, 23 Mar 2008 15:20:43 -0700
Subject: [Bioperl-l] HitTableWriter error
In-Reply-To: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
References: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
Message-ID: <8f200b4c0803231520o5082f9f5mf7be8cd061faa98f@mail.gmail.com>

Hi Owen

Sorry, I don't have time to look into this right now, but two thoughts:

1) The ResultTableWriter is intended to convert standard blast reports into
a tabular format. So if you already have tabular results, you are sort of
using it "off label", but in principle, it should work.

2) The iteration method is only available to BlastHit objects, since it is
only relevant to PSI-blast results. The fact that you got this error when
working with blast results indicates that the parser did not generate the
correct object type for your hits, using the GenericHit when it should have
used BlastHit. This is just a hunch and would be worth following up on. Can
you submit this as a bug report?

Thanks,
Steve


On Thu, Mar 20, 2008 at 4:56 AM, Owen Lancaster <ol8 at leicester.ac.uk> wrote:

> Hello
>
> I hope you don't mind me emailing you but I have come across a problem
> when trying to use HitTableWriter. The error can be seen below - the
> input for the script is the BLAST tabular output (specified with the -
> m 8 option) from a blastn search.
>
> If you have any idea what the problem might be I would much appreciate
> it! Hope you can help...
>
> Thanks
>
> Owen
>
>
> Using default column map.
>
> ------------- EXCEPTION  -------------
> MSG: Trouble in ResultTableWriter::_set_row_data_func() eval: Can't
> locate object method "iteration" via package
> "Bio::Search::Hit::GenericHit" at (eval 97) line 1, <GEN1> line 2.
>
>
>
> STACK Bio::SearchIO::Writer::ResultTableWriter::__ANON__ /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/ResultTableWriter.pm:328
> STACK Bio::SearchIO::Writer::HitTableWriter::to_string /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/HitTableWriter.pm:268
> STACK Bio::SearchIO::write_result /Library/Perl/5.8.8/Bio/SearchIO.pm:
> 331
> STACK Bio::SearchIO::blast::write_result /Library/Perl/5.8.8/Bio/
> SearchIO/blast.pm:2208
> STACK toplevel ./generate_discordant_tails.pl:62
>
> --------------------------------------
>
>


From greg at turnstep.com  Sun Mar 23 20:42:36 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Mon, 24 Mar 2008 00:42:36 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>
Message-ID: <4ab14dcc59d7566b55ba87027055e9fd@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

> So disabling server-side prepares will leave values quoted? Having
> server-side prepares would be very useful though, especially for
> Bioperl-db with its many lookup queries that all use similar
> parameter values.

Yes, it forces DBD::Pg to do the quoting itself, which basically means
that everything is shipped to the server as a single SQL string, and
no placeholders are used. In the grand scheme of things, the speed
difference is not large for most queries. Certainly one way would be
to turn this on for 8.3 and above, and slowly migrate the queries/schema
over time.

>> There's good reasons for the casting enforcement in 8.3

> I do understand that, but it's also a sharp contrast to other RDBMSs
> that doesn't it make it easier for people to choose Pg when they
> should, and doesn't help writing cross-platform database applications
> either.

I'm not overly familiar with how other databases treat this, but I've
heard DB2 can be a stickler about this too. I've not dug into the bioperl
code in a while, to be honest, so I'm not sure what sort of queries we're
talking about. Certainly long-term the code and schema should move away
from implicit casting. Maybe a better short-term solution is addind
the more obvious casts (e.g. text<->int) back in.

> Do you have links to some of the key threads showing what rationale
> went into the decision? (Or should I just search for your name?) I'd
> like to read up on that first before pouring more oil into the fire.
> I suspect that many of those who made the decision are never faced
> with needing to write cross-RDBMS code.
>
> Also, I wonder why this wasn't made a configurable option so it can
> be disabled by a simple config file change (such as the move away
> from automatic OID columns). But obviously this is the wrong list for
. discussing this (though Bioperl-db *is* one of those pieces of
> software that must be cross-RDBMS).

I did ask about that, and was told it would not have been easy to do so.
But I agree, a phasing in period (heck, even a warning) would have been
nice. Feel free to pour some oil on the fire, I think this is one of
many apps that has been affected. (I've run across two other major
cross-DB apps (Interchange and MediaWiki) that are struggling with the
same pain. I managed to painfully fix the latter, but the former is way
too complex to tackle at the moment).

I could not find the thread(s?) I weighed in on, but you can find some
relevant discussions by googling "strict-typing benefits grokbase"

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803232039
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfm+NAACgkQvJuQZxSWSsi4ogCdGNWvCJIzXxb+YKzdm6wwxQMv
p3AAnizkWXoo/rvxv4KVdC8tD0vF87k3
=dNYi
-----END PGP SIGNATURE-----


From er at xs4all.nl  Sun Mar 23 20:45:49 2008
From: er at xs4all.nl (Erik)
Date: Mon, 24 Mar 2008 01:45:49 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <19067.156.83.1.157.1206319549.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 20:22, Hilmar Lapp wrote:
>
> On Mar 23, 2008, at 2:16 PM, Erik wrote:
>> Sprot loaded in 20 hours. Only 170 were rejected - not
>> too bad.
>
> That's great. Would be nice if you can provide some rough
> summary as  to why they were rejected (if that's
obvious), such as taxon errors,
> or other errors.

see

   http://bugzilla.open-bio.org/show_bug.cgi?id=2474

So I think one easy improvement will be to enlarge that 
varchar(40) column, dbxref.accession. See the following:

select
	dbname
,	accession
,	length(accession)
from 	dbxref
where accession ~ 'Cyc'
order by length(accession) desc
limit 100

patch attached. (which will probably get bug 2389 resolved)

It seems to me bioentry.accession (maybe identifier too?)
needs a similar enlargement.


thanks,

Erikjan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biosqldb-pl.sql.diff
Type: application/octet-stream
Size: 535 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080324/a564d7ee/attachment-0002.obj>

From Russell.Smithies at agresearch.co.nz  Wed Mar 26 22:13:16 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 27 Mar 2008 15:13:16 +1300
Subject: [Bioperl-l] Bioinformatician wanted
Message-ID: <D5DBA313349A4B458528BE63B387F36C06A09345@imail.agresearch.co.nz>

Dear colleagues,

 
It would be appreciated if you could bring the following position to the
attention of potential candidates.

 
Bioinformatician wanted


Many of the world's most amazing scientific discoveries are the result
of someone 'taking a closer look'. It's this inquisitive nature and
relentless search for answers that fuels scientific advancement. And
it's also what we'd like you to apply to us, right now. Mind you, you
won't need to look too hard to discover that AgResearch is the best
place to break all new ground in your career.  


At first glance you'll see we are New Zealand's largest research
institute -world leaders in pastoral research working at the
leading-edge of innovation. Scratch the surface a little more and you'll
find all the diversity and intellectual challenge a Bioinformatician
could ask for. This is a highly collaborative role where you'll be
involved in everything from the analysis of genomic data to the design,
development, implementation and testing of bioinformatics tools.
Knowledge sharing is a pivotal component of our success, so you can also
look forward to acting in consultant capacity (both internally and
externally) and the autonomy to contribute to scientific publications. 

 
Our people are at the pinnacle in their professions, so with your
biological background, higher qualification in bioinformatics or
computing and your experience in contributing bioinformatics expertise
to research groups, you'll not only fit right in, you'll hit the ground
running. An outstanding communicator, time manager and relationship
builder, you'll also come to us with a thorough knowledge of Unix,
pipeline-development, web based technologies and scripting and
programming languages. 

 
AgResearch is a unique organisation at the forefront of our field, and
as far as your future's concerned, that makes us well worth a closer
look. There are many benefits waiting to be discovered here, so isn't it
time you experienced them? The job description is available online and
applications are invited at www.agresearch.co.nz/recruitment/ Reference
AGR661, or contact Nauman Maqbool for further information.

 
Applications close 11 April 2008.

 
Regards,
Russell


Russell Smithies 
Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 
AgResearch Farming Food and Health. First 
Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi 
Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From Marc.Logghe at ablynx.com  Thu Mar 27 09:26:24 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 14:26:24 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
Message-ID: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>

Hi all,

I am a little bit confused about the above mentioned seq_inds() method.
At first, I had the impression that the method returns an array of
positions in the hsp (hit or query) sequence.

At least that is what one would expect looking at the example usage in
the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
second code block).

Am I correct in believing you can only do this if your hsp query stretch
starts at position 1 of the query sequence?

I think seq_inds() returns a list of positions relative to the query/hit
sequence. So, the code shown in the HOWTO is a kind of special case.

However, I do not understand how seq_inds() is dealing with gaps.

An example. If you blast the worm protein ZK822.4 against swissprot
using blastp at ncbi you get this hsp as top:

 
>sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461
 
 Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
stats.
 Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
 
Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

 
Now, if you call seq_inds(query => 'gap') on that particular hsp object,
you get these positions: 417, 431, 432. Obviously, there is no gap in
the original query sequence at these positions. 
How do you have to read these numbers ? Remark also that for instance
417 is the res just in front of the gap.

Regards,

Marc

 
From bix at sendu.me.uk  Thu Mar 27 10:46:35 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 27 Mar 2008 14:46:35 +0000
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <47EBB34B.8010606@sendu.me.uk>

Marc Logghe wrote:
> Hi all,
> 
> I am a little bit confused about the above mentioned seq_inds() method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.

Yes...


> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
> 
> Am I correct in believing you can only do this if your hsp query stretch
> starts at position 1 of the query sequence?

No...


> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
>             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions. 
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.

Its purpose is to let you know the position in query or subject 
coordinates where something interesting happened in the alignment. So 
seq_inds(query => 'gap') is telling you all the places that a gap starts 
in the alignment in terms of the query coordinates. Hence 417 etc.


(Actually, does 432 make sense? Shouldn't it be 431 twice?)


From Marc.Logghe at ablynx.com  Thu Mar 27 11:09:56 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 16:09:56 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <47EBB34B.8010606@sendu.me.uk>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
Message-ID: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>

Hi Sendu, Chris

> > At least that is what one would expect looking at the example usage
in
> > the HOWTOs
(http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> > second code block).
> >
> > Am I correct in believing you can only do this if your hsp query
stretch
> > starts at position 1 of the query sequence?
> 
> No...

<HOWTO>
# put all the conserved matches in query strand into an array
my @str_array = split "",$hsp->query_string;
foreach ( $hsp->seq_inds('query','conserved') ){
  push @conserved,$str_array[$_ - 1];
}
</HOWTO>

$hsp->query_string will return
'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'

In my example using the 'gap' class (instead of 'conserved'), @str_array
will contain 417, 431 and 432. The off-by-one indices do not exist in
that array.
Therefore, I still think the howto shows a special case where the hsp
query sequence starts at 1 (compared to 402 in my particular example). 


> 
> 
> > Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
453
> >             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> > Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
193
> >
> > Now, if you call seq_inds(query => 'gap') on that particular hsp
object,
> > you get these positions: 417, 431, 432. Obviously, there is no gap
in
> > the original query sequence at these positions.
> > How do you have to read these numbers ? Remark also that for
instance
> > 417 is the res just in front of the gap.
> 
> Its purpose is to let you know the position in query or subject
> coordinates where something interesting happened in the alignment. So
> seq_inds(query => 'gap') is telling you all the places that a gap
starts
> in the alignment in terms of the query coordinates. Hence 417 etc.

So, this means you have to interpret that as a gap is coming after 417 ?

> 
> 
> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
Don't know, depends on how you have to 'read' this.
Thanks for looking into this.
Regards,
Marc


From cjfields at uiuc.edu  Thu Mar 27 11:05:59 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 10:05:59 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <489252B3-2255-45C3-9219-A8F8A0978B89@uiuc.edu>

According to the GenericHSP::seq_inds() POD, seq_inds() reports  
residue positions (indices) for the query/subject based on identity/ 
conservation, i.e. these are fro the original sequence positions as  
determined by the HSP data, not alignment column positions.  'gaps'  
should be reported at the position prior to where a gap is inserted.   
However I think something is getting borked when the gap length is  
longer than one, so I would partially qualify this as a bug.

Example: When I ran this using bioperl-live it gives a different set  
of gaps indices which appear to be correct.  I reran the BLASTP using  
the web form using your query against swissprot and parsed it.  I got  
slightly different results for the BLAST report (probably differences  
in the query sequence):

 >gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461

  Score = 35.8 bits (81),  Expect = 0.47, Method: Composition-based  
stats.
  Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)

Query  394  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  445
             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

.....

seq_inds('query' => 'gaps') reports 409,423, and 424, which is  
partially correct, e.g. there is a gap inserted after position 409 and  
423 in the query.  However, no gap is present after 424; I think this  
occurs b/c the gap length is 2.  The other HSPs report similar problems.

chris

P.S. Just saw than Sendu posted; I agree, seq. positions with gap  
lengths > 1 should be repeated.  Should be easy to fix that.

On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote:

> Hi all,
>
> I am a little bit confused about the above mentioned seq_inds()  
> method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.
>
> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
>
> Am I correct in believing you can only do this if your hsp query  
> stretch
> starts at position 1 of the query sequence?
>
> I think seq_inds() returns a list of positions relative to the query/ 
> hit
> sequence. So, the code shown in the HOWTO is a kind of special case.
>
> However, I do not understand how seq_inds() is dealing with gaps.
>
> An example. If you blast the worm protein ZK822.4 against swissprot
> using blastp at ncbi you get this hsp as top:
>
>
>
>> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
> Length=1461
>
> Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
> stats.
> Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
>
> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL   
> 453
>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL   
> 193
>
>
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp  
> object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions.
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.
>
> Regards,
>
> Marc
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar 27 12:04:20 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 11:04:20 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
	<03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
Message-ID: <FF47D72B-7CBA-491B-9C9B-9B32FD2AA8BE@uiuc.edu>


On Mar 27, 2008, at 10:09 AM, Marc Logghe wrote:

> Hi Sendu, Chris
>
>>> At least that is what one would expect looking at the example usage
> in
>>> the HOWTOs
> (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> second code block).
>>>
>>> Am I correct in believing you can only do this if your hsp query
> stretch
>>> starts at position 1 of the query sequence?
>>
>> No...
>
> <HOWTO>
> # put all the conserved matches in query strand into an array
> my @str_array = split "",$hsp->query_string;
> foreach ( $hsp->seq_inds('query','conserved') ){
>  push @conserved,$str_array[$_ - 1];
> }
> </HOWTO>
>
> $hsp->query_string will return
> 'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'
>
> In my example using the 'gap' class (instead of 'conserved'),  
> @str_array
> will contain 417, 431 and 432. The off-by-one indices do not exist in
> that array.
> Therefore, I still think the howto shows a special case where the hsp
> query sequence starts at 1 (compared to 402 in my particular example).

We'll have to look at it; it should probably be clarified particularly  
in reference to 'gaps' and use of seq positions vs. HSP (or alignment)  
positions.

Think of it this way; seq_inds() takes 'identical', 'conserved', etc.,  
all of which refer to the original positions (indices) of the sequence  
which fall into the particular category asked for.  In these cases we  
are using the coordinates for query/hit directly from the HSP info in  
the report.  This is done with the express purpose of mapping  
attributes back to the original sequence, be it the query or subject.

Gaps, however, are tricky, since sequence coordinates refer to  
residues (not gaps) when using BLAST.  In this case we use the  
sequence position prior to the gap to note where a gap is inserted.   
The previous results, then, would be wrong as there is no gap inserted  
after 432.  I just committed a fix which just repeats the position  
based on the number of gaps.

>>> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
> 453
>>>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
>>> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
> 193
>>>
>>> Now, if you call seq_inds(query => 'gap') on that particular hsp
> object,
>>> you get these positions: 417, 431, 432. Obviously, there is no gap
> in
>>> the original query sequence at these positions.
>>> How do you have to read these numbers ? Remark also that for
> instance
>>> 417 is the res just in front of the gap.
>>
>> Its purpose is to let you know the position in query or subject
>> coordinates where something interesting happened in the alignment. So
>> seq_inds(query => 'gap') is telling you all the places that a gap
> starts
>> in the alignment in terms of the query coordinates. Hence 417 etc.
>
> So, this means you have to interpret that as a gap is coming after  
> 417 ?

Yes.

>> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
> Don't know, depends on how you have to 'read' this.
> Thanks for looking into this.
> Regards,
> Marc

Repeating the position based on the number of gaps is now the default  
in bioperl-live.  Just working on fixing problems with collapsing  
numbers and tests and everything should be fine.

chris


From hiekeen at gmail.com  Sat Mar 29 12:09:18 2008
From: hiekeen at gmail.com (Jinyan Huang)
Date: Sun, 30 Mar 2008 00:09:18 +0800
Subject: [Bioperl-l] Gene Id converts.
Message-ID: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>

Hi,

I have a list of gene bank accession id. I want to convert these ids to NCBI id.

For example:

>From NM_011917 to 2919914.

How can I do it?

Thanks


-- 
Best regards,
Jinyan Huang (ekeen)
School of Life Sciences and Technology, 1302 Room
Tongji University
Siping Road 1239, Shanghai 200092
P.R. China
Tel :0086-21-65981041
Msn: hiekeen at hotmail.com
eMail: hiekeen at gmail.com


From cjfields at uiuc.edu  Sat Mar 29 13:42:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 29 Mar 2008 12:42:50 -0500
Subject: [Bioperl-l] Gene Id converts.
In-Reply-To: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
References: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
Message-ID: <FC7E8E14-83E7-4611-868B-0D1390488C82@uiuc.edu>

There are the GenBank LiveLists (updated every Sunday), which has  
accession/version/UID mappings for nuc and protein GenBank records.  I  
haven't used it personally but it's worth a look:

ftp://ftp.ncbi.nih.gov/genbank/livelists/
There is also gene2accession, which contains mappings between  
accession and UID (though this is more EntrezGene-related, I believe):
ftp://ftp.ncbi.nih.gov/gene/DATA/

Both have documentation detailing formats.  I would recommend using  
one of the above two on a local database setup if you plan on  
converting a large number of accessions.

Bio::DB::EUtilities can also do this but is web-based via eutils.   
There are a couple of stub examples in the Cookbook HOWTO under  
'efetch' on converting accessions to UID (and vice versa), though note  
there is no one-to-one correspondence.  You can also convert UIDs to  
accessions using 'esummary' but the converse (accession to GI)  
requires, strangely, using efetch to grab the UIDs first, then re- 
retrieving the acccessions via esummary for one-to-one correspondence.
http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

chris

On Mar 29, 2008, at 11:09 AM, Jinyan Huang wrote:

> Hi,
>
> I have a list of gene bank accession id. I want to convert these ids  
> to NCBI id.
>
> For example:
>
>> From NM_011917 to 2919914.
>
> How can I do it?
>
> Thanks
>
>
> -- 
> Best regards,
> Jinyan Huang (ekeen)
> School of Life Sciences and Technology, 1302 Room
> Tongji University
> Siping Road 1239, Shanghai 200092
> P.R. China
> Tel :0086-21-65981041
> Msn: hiekeen at hotmail.com
> eMail: hiekeen at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pradel.d at gmail.com  Mon Mar 31 10:35:26 2008
From: pradel.d at gmail.com (Damien Pradel)
Date: Mon, 31 Mar 2008 16:35:26 +0200
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
Message-ID: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>

Hello,

I use the SeqIO module in order to  parse EMBL files.
Unfortunately I got a problem: the ID was not recognised because instead of
the ID value  I get the answer "unknown_id" ...

So to solve this problem I have modified the file embl.pm located in
directory SeqIO at the line 189 as follow :

if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

in :
if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

With this modification, the ID value is correctly collected.

Hope it will help.

Damien


From golharam at umdnj.edu  Mon Mar 31 15:31:56 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 31 Mar 2008 15:31:56 -0400
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module
Message-ID: <47F13C2C.4070909@umdnj.edu>

I have a (very) basic SAX implementation of a SeqIO module to parse 
GenBank XML records.  Right now, it only reads in basic information 
regarding the sequence and the sequence itself.

It does not yet parse the features table.  Should I submit it to be 
included in bioperl or wait until I implement more for the features 
table?  I'm not sure when I'll get around to it though

Ryan


From cjfields at uiuc.edu  Mon Mar 31 16:05:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 15:05:51 -0500
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO
	module
In-Reply-To: <47F13C2C.4070909@umdnj.edu>
References: <47F13C2C.4070909@umdnj.edu>
Message-ID: <4A3D5CD8-13D7-4CBF-B89A-CE81B8804C61@uiuc.edu>

You can submit it either to me directly or to bugzilla (start a new  
bug report as an enhancement request, then attach the relevant files).

Does it have a test suite available?  If not, you should try setting  
one up:

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

chris

On Mar 31, 2008, at 2:31 PM, Ryan Golhar wrote:

> I have a (very) basic SAX implementation of a SeqIO module to parse  
> GenBank XML records.  Right now, it only reads in basic information  
> regarding the sequence and the sequence itself.
>
> It does not yet parse the features table.  Should I submit it to be  
> included in bioperl or wait until I implement more for the features  
> table?  I'm not sure when I'll get around to it though
>
> Ryan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Mar 31 19:58:44 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 18:58:44 -0500
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
In-Reply-To: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
References: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
Message-ID: <629B8FAF-3A1F-41E2-BFF1-A709DDE56A09@uiuc.edu>

The parser no longer has this line; it has been updated to work with  
both old and new format EMBL.  You might want to try updating from  
Subversion or install the nightly build.

http://bioperl.org/DIST/nightly_builds/

chris

On Mar 31, 2008, at 9:35 AM, Damien Pradel wrote:
> Hello,
>
> I use the SeqIO module in order to  parse EMBL files.
> Unfortunately I got a problem: the ID was not recognised because  
> instead of
> the ID value  I get the answer "unknown_id" ...
>
> So to solve this problem I have modified the file embl.pm located in
> directory SeqIO at the line 189 as follow :
>
> if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> in :
> if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> With this modification, the ID value is correctly collected.
>
> Hope it will help.
>
> Damien
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dfog22 at hotmail.com  Wed Mar 26 10:13:25 2008
From: dfog22 at hotmail.com (MathGon)
Date: Wed, 26 Mar 2008 07:13:25 -0700 (PDT)
Subject: [Bioperl-l]  File concatenation
Message-ID: <16301515.post@talk.nabble.com>


For my first post, I will introduce myself. I'm a PhD student in microbiology
focusing in horizontal gene transfer in hyperthermophilic Archaea;

I retrieve a genbank file for each contig of an unfinished genome. I want to
produce a unique genbank file by concatenation. I didn't manage to find a
such script and I'm not enough trained in perl to write it...
Have you got an other solution or a script for me?

Best regards...
-- 
View this message in context: http://www.nabble.com/File-concatenation-tp16301515p16301515.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From anjan.purkayastha at gmail.com  Mon Mar  3 12:31:11 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 12:31:11 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
Message-ID: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>

hi
i am tried to use the perl wrappers for EMBOSS with:

use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
use Bio::Factory::EMBOSS;

however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
directory mentioned above.

so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
attached error message.

any ideas on what i need to do to  make this work?
all advice will be appreciated.

tia,

anjan


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emboss_install_error_message.rtf
Type: application/rtf
Size: 123212 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080303/08ee41b2/attachment-0003.rtf>

From cjfields at uiuc.edu  Mon Mar  3 13:54:06 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 12:54:06 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <AC7CFEBA-5699-45A0-9426-8885AEA78216@uiuc.edu>

You'll need to install bioperl-run.  Bio::Factory::EMBOSS is in  
bioperl-run, not the main bioperl distribution (aka bioperl-core).

chris

On Mar 3, 2008, at 11:31 AM, ANJAN PURKAYASTHA wrote:

> hi
> i am tried to use the perl wrappers for EMBOSS with:
>
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
>
> however it seems that Bio::Factory::EMBOSS cannot be found in the  
> bioperl
> directory mentioned above.
>
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i  
> got the
> attached error message.
>
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
>
> tia,
>
> anjan
>
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> < 
> emboss_install_error_message 
> .rtf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Mon Mar  3 14:34:20 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 3 Mar 2008 20:34:20 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>

Hi Anjan,

Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
rather part of bioperl-run. For some reason CPAN went for the old (1.4)
version of bioperl-run rather than the current 1.5.2.

And indeed, I seem to run into the same problem:
cpan> d /bioperl/

Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found

but when I ask in a different way the right distributions show up. [Sendu,
any idea what's going on here?]

cpan> ls
SENDU
 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz

So try doing

cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz

Or if CPAN refuses to cooperate, you can grab it from here:
http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release


Dave


From arareko at campus.iztacala.unam.mx  Mon Mar  3 14:25:14 2008
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 03 Mar 2008 13:25:14 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <47CC509A.10306@campus.iztacala.unam.mx>

Hi Anjan,

It looks like you are using the latest BioPerl developer release
(bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
then you should try installing the latest BioPerl-run as well
(bioperl-run-1.5.2_100). After you install it, you'll have to modify
your 'use lib' pragma for your script to work as you expect:

use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
use Bio::Factory::EMBOSS;

Hope this helps.

Regards,
Mauricio.


ANJAN PURKAYASTHA wrote:
> hi
> i am tried to use the perl wrappers for EMBOSS with:
> 
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
> 
> however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
> directory mentioned above.
> 
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
> attached error message.
> 
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
> 
> tia,
> 
> anjan
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Mon Mar  3 15:05:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 14:05:16 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
Message-ID: <43EC247B-EC01-483D-82B1-D861590A141A@uiuc.edu>


On Mar 3, 2008, at 1:34 PM, Dave Messina wrote:

> Hi Anjan,
>
> Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
> rather part of bioperl-run. For some reason CPAN went for the old  
> (1.4)
> version of bioperl-run rather than the current 1.5.2.
>
> And indeed, I seem to run into the same problem:
> cpan> d /bioperl/
>
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
>
> but when I ask in a different way the right distributions show up.  
> [Sendu,
> any idea what's going on here?]

It's marked as a developer release, which I think requires a full path  
(as you have below) and not just the package name.

chris

> cpan> ls
> SENDU
> 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
>  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
>   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
>  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz
>
> So try doing
>
> cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz
>
> Or if CPAN refuses to cooperate, you can grab it from here:
> http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release
>
>
> Dave


From anjan.purkayastha at gmail.com  Mon Mar  3 14:57:33 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 14:57:33 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <47CC509A.10306@campus.iztacala.unam.mx>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
Message-ID: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>

guys,
thanks! i got bioperl-run to work.
next question, let's say i want to run the palindrome program in emboss
using the bioperl wrapper.   now, palindrome takes in a list of parameter
values- these are fed into emboss as a key-value hash. where do i find the
correct names of the keys to create the input hash?
tia.

anjan

On Mon, Mar 3, 2008 at 2:25 PM, Mauricio Herrera Cuadra <
arareko at campus.iztacala.unam.mx> wrote:

> Hi Anjan,
>
> It looks like you are using the latest BioPerl developer release
> (bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
> then you should try installing the latest BioPerl-run as well
> (bioperl-run-1.5.2_100). After you install it, you'll have to modify
> your 'use lib' pragma for your script to work as you expect:
>
> use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
> use Bio::Factory::EMBOSS;
>
> Hope this helps.
>
> Regards,
> Mauricio.
>
>
> ANJAN PURKAYASTHA wrote:
> > hi
> > i am tried to use the perl wrappers for EMBOSS with:
> >
> > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> > use Bio::Factory::EMBOSS;
> >
> > however it seems that Bio::Factory::EMBOSS cannot be found in the
> bioperl
> > directory mentioned above.
> >
> > so i tried to install Bio::Factory::EMBOSS from the cpan website. i got
> the
> > attached error message.
> >
> > any ideas on what i need to do to  make this work?
> > all advice will be appreciated.
> >
> > tia,
> >
> > anjan
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 03:48:15 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 09:48:15 +0100
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
Message-ID: <47CD0CCF.4060306@medecine.unige.ch>

Hello,

Trying to run Bio::TreeIO by this command:

perl -e 'use Bio::TreeIO'

I get the following error:

Weak references are not implemented in the version of perl at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
recent version of bioperl around 5 month ago. Any suggestions of why 
this module can't be loaded correctly?

Greetings, Daniel


From bix at sendu.me.uk  Tue Mar  4 06:55:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 11:55:32 +0000
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
In-Reply-To: <47CD0CCF.4060306@medecine.unige.ch>
References: <47CD0CCF.4060306@medecine.unige.ch>
Message-ID: <47CD38B4.1070200@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Trying to run Bio::TreeIO by this command:
> 
> perl -e 'use Bio::TreeIO'
> 
> I get the following error:
> 
> Weak references are not implemented in the version of perl 
> [...]
> I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
> recent version of bioperl around 5 month ago. Any suggestions of why 
> this module can't be loaded correctly?

Redhat/Fedora apparently has Perl issues.

First try installing the latest version of Scalar::Util yourself:

perl -MCPAN -e shell
force install Scalar::Util

If that doesn't work, you'll have to download and compile Perl yourself 
from source (don't use Fedora's installation system).


From apapanicolaou at ice.mpg.de  Tue Mar  4 07:03:27 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:03:27 +0100
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
Message-ID: <47CD3A8F.9050902@ice.mpg.de>

hello all,

1) I was wondering if you would you know what this error means and had 
time to help...

Use of uninitialized value in concatenation (.) or string at 
/usr/local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287

line 287 is
            else {
                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
            }

this is the header
# $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
#
# BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm


this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8, 
bioperl: tried with both 1.5.2_102 from cvs and checked out svn version 
today)

use Bio::SearchIO::Writer::GbrowseGFF;
use Bio::SearchIO;
if ($program eq "blastn"){
#my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                        -file   => ">$infile.$query.blast.gff");
#my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                           -file   => ">>$infile.blast.gff");
$out_gff->write_result($result);
$out_gff_whole->write_result($result);
}


Where $result is a blast result...

The aim is to parse a multi-query blast report and split it into 
different queries and make another file with all the queries. I'm sure 
i'm forgetting something but I can't figure what...

The GFF file is produced, but I do get the error above...

2) Finally, there is a small bug but I don't think it comes from this 
module? The id attribute is printed out e.g iD=match_sequence31 with iD 
wrongly capitalised...

many thanks for your time
alexie

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From apapanicolaou at ice.mpg.de  Tue Mar  4 07:04:16 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:04:16 +0100
Subject: [Bioperl-l] Gbrowse.pm followup
Message-ID: <47CD3AC0.4080801@ice.mpg.de>

Oh the iD bug is fixed in the svn developer branch.

ta
a

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From cjfields at uiuc.edu  Tue Mar  4 08:16:04 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 07:16:04 -0600
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <4A68AA28-E508-4257-86E1-393CA9B74082@uiuc.edu>

I have run into a number of problems with the GbrowseGFF module myself  
(I think I committed the ID fix, actually).  It works but needs  
revision and needs better conformity with GFF3.

You can post (1) as a bug and well look into it when we can.  It's  
possible (depending on how extensive the fix is) this may have to wait  
until 1.7.

chris

On Mar 4, 2008, at 6:03 AM, Alexie Papanicolaou wrote:

> hello all,
>
> 1) I was wondering if you would you know what this error means and  
> had time to help...
>
> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>          else {
>              $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>          }
>
> this is the header
> # $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
> #
> # BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm
>
>
> this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8,  
> bioperl: tried with both 1.5.2_102 from cvs and checked out svn  
> version today)
>
> use Bio::SearchIO::Writer::GbrowseGFF;
> use Bio::SearchIO;
> if ($program eq "blastn"){
> #my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                      -file   => ">$infile.$query.blast.gff");
> #my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                         -file   => ">>$infile.blast.gff");
> $out_gff->write_result($result);
> $out_gff_whole->write_result($result);
> }
>
>
>
> Where $result is a blast result...
>
> The aim is to parse a multi-query blast report and split it into  
> different queries and make another file with all the queries. I'm  
> sure i'm forgetting something but I can't figure what...
>
> The GFF file is produced, but I do get the error above...
>
> 2) Finally, there is a small bug but I don't think it comes from  
> this module? The id attribute is printed out e.g iD=match_sequence31  
> with iD wrongly capitalised...
>
> many thanks for your time
> alexie
>
> -- 
> --
> Alexie Papanicolaou
> Entomology
> Max Planck Institute for Chemical Ecology
> Hans Knoell Str 8
> Jena 07745
> Germany
> Email apapanicolaou at ice.mpg.de
> Tel +493641571561
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 07:35:03 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 13:35:03 +0100
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
	alignment
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
Message-ID: <47CD41F7.2000401@medecine.unige.ch>

Hello,

Is it possible to remove only columns containing e.g. more than 75% gaps 
from an alignment? I was thinking at

$aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])

This would allow me to remove all gaps or gap-only columns but not using 
a threshold.

Greetings, Daniel


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 08:46:33 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 14:46:33 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
Message-ID: <47CD52B9.5060906@medecine.unige.ch>

Hello,

I would like to use bioperl to calculate a branch length score for a 
given set of nodes and a tree. I know how to get the total branch length 
by using $tree->total_branch_length, but how could I get the length of 
the subtree spanning some given nodes which are dispersed over the whole 
tree (a subset of nodes from the tree which are not monophyletic)?

Greetings, Daniel


From bix at sendu.me.uk  Tue Mar  4 09:37:53 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:37:53 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD52B9.5060906@medecine.unige.ch>
References: <47CD52B9.5060906@medecine.unige.ch>
Message-ID: <47CD5EC1.2020103@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> I would like to use bioperl to calculate a branch length score for a 
> given set of nodes and a tree. I know how to get the total branch length 
> by using $tree->total_branch_length, but how could I get the length of 
> the subtree spanning some given nodes which are dispersed over the whole 
> tree (a subset of nodes from the tree which are not monophyletic)?

One 'cheat' way of doing it might be to use splice(-keep_ids => 
\@node_ids) or similar, then run total_branch_length() on that. No idea 
if it will actually give you the right answer though. Let us know! :)


From bix at sendu.me.uk  Tue Mar  4 09:26:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:26:10 +0000
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
 alignment
In-Reply-To: <47CD41F7.2000401@medecine.unige.ch>
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
	<47CD41F7.2000401@medecine.unige.ch>
Message-ID: <47CD5C02.8060306@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Is it possible to remove only columns containing e.g. more than 75% gaps 
> from an alignment? I was thinking at
> 
> $aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])
> 
> This would allow me to remove all gaps or gap-only columns but not using 
> a threshold.

Well, you can use gap_col_matrix() to decide which columns you don't 
want, and then use remove_columns().


From hlapp at gmx.net  Tue Mar  4 10:24:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 10:24:13 -0500
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <87808BE4-B6A3-4C7F-A6DC-42ED2686375B@gmx.net>


On Mar 4, 2008, at 7:03 AM, Alexie Papanicolaou wrote:

> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>            else {
>                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>            }


Note that this is a warning, not an error. However, if none of  
$prefix, $seqname, $qpmax, $qpmin can be undefined (or be equal to an  
empty string, which they will default to if undefined) at this  
position, then there is a problem (and it is before the above line).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Mar  4 11:02:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 11:02:02 -0500
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>


On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:

> Daniel Gerlach wrote:
>> Hello,
>> I would like to use bioperl to calculate a branch length score for  
>> a given set of nodes and a tree. I know how to get the total  
>> branch length by using $tree->total_branch_length, but how could I  
>> get the length of the subtree spanning some given nodes which are  
>> dispersed over the whole tree (a subset of nodes from the tree  
>> which are not monophyletic)?
>
> One 'cheat' way of doing it might be to use splice(-keep_ids =>  
> \@node_ids) or similar, then run total_branch_length() on that. No  
> idea if it will actually give you the right answer though. Let us  
> know! :)

Related to that, will contract_linear_paths() actually do the right  
thing and adjust branch lengths if it removes internal nodes with  
outdegree 1?

Rutger - does Bio::Phylo handle this correctly?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 11:12:53 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 17:12:53 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <47CD7505.5080105@medecine.unige.ch>

Hello,

Thanks for the quick answer. I tried:

use Bio::TreeIO;
my $treeio = Bio::TreeIO->new(-format => 'newick',
                   -fh => \*DATA);
my $tree = $treeio->next_tree;
print $tree->total_branch_length,"\n";
$tree->splice(-keep_id => [A,B,E]);
print $tree->total_branch_length,"\n";

__DATA__
(((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);

Which gives me the message "MSG: After splicing, the original root was 
removed but there are multiple candidates for the new root!" however the 
root E was not removed.

If I do it the complementary way by splicing out all unwanted nodes - 
splice(-remove_id => [C,D]) - I get what I want:

34
25

Greetings, Daniel

Sendu Bala wrote:
> Daniel Gerlach wrote:
>> Hello,
>>
>> I would like to use bioperl to calculate a branch length score for a 
>> given set of nodes and a tree. I know how to get the total branch 
>> length by using $tree->total_branch_length, but how could I get the 
>> length of the subtree spanning some given nodes which are dispersed 
>> over the whole tree (a subset of nodes from the tree which are not 
>> monophyletic)?
> 
> One 'cheat' way of doing it might be to use splice(-keep_ids => 
> \@node_ids) or similar, then run total_branch_length() on that. No idea 
> if it will actually give you the right answer though. Let us know! :)


From bix at sendu.me.uk  Tue Mar  4 11:37:47 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 16:37:47 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
	<BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
Message-ID: <47CD7ADB.6050808@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:
> 
>> Daniel Gerlach wrote:
>>> Hello,
>>> I would like to use bioperl to calculate a branch length score for a 
>>> given set of nodes and a tree. I know how to get the total branch 
>>> length by using $tree->total_branch_length, but how could I get the 
>>> length of the subtree spanning some given nodes which are dispersed 
>>> over the whole tree (a subset of nodes from the tree which are not 
>>> monophyletic)?
>>
>> One 'cheat' way of doing it might be to use splice(-keep_ids => 
>> \@node_ids) or similar, then run total_branch_length() on that. No 
>> idea if it will actually give you the right answer though. Let us 
>> know! :)
> 
> Related to that, will contract_linear_paths() actually do the right 
> thing and adjust branch lengths if it removes internal nodes with 
> outdegree 1?

I think ultimately it boils down to remove_Descendent() being called as 
appropriate which does the branch length alteration. From a glance I 
can't answer your question with certainly, but it 'should' do the right 
thing. It needs to be tested; when I implemented these things I was only 
concerned with tree topology, not branch lengths or anything else.


From David.Messina at sbc.su.se  Tue Mar  4 15:47:06 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 4 Mar 2008 21:47:06 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
Message-ID: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>

> where do i find the
> correct names of the keys to create the input hash?


I've never used this module, but from a quick look at the code it appears to
pass on any parameters to palindrome.

I'm guessing you've already done this, but have you tried using the
parameter names and values that palindrome itself asks for?


Dave


From cjfields at uiuc.edu  Tue Mar  4 16:34:21 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 15:34:21 -0600
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
Message-ID: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>

I don't know what the current status is for OBDA, but we have several  
bugs listed for Bio::DB::Flat which need someone versed in OBDA to  
look at them (they are all interrelated):

http://bugzilla.open-bio.org/show_bug.cgi?id=2336
http://bugzilla.open-bio.org/show_bug.cgi?id=2337
http://bugzilla.open-bio.org/show_bug.cgi?id=2338
http://bugzilla.open-bio.org/show_bug.cgi?id=2339

If anyone has any input I would greatly appreciate it.  I have been  
trying to stomp as many bugs as possible so we can work on a new  
release.

chris


From bosborne11 at verizon.net  Tue Mar  4 16:42:05 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 04 Mar 2008 16:42:05 -0500
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
In-Reply-To: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
References: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
Message-ID: <BAD3A0C8-69C4-45FF-A666-527289428A58@verizon.net>

Chris,

I'll take a look at them this weekend.

Brian O.


On Mar 4, 2008, at 4:34 PM, Chris Fields wrote:

> I don't know what the current status is for OBDA, but we have  
> several bugs listed for Bio::DB::Flat which need someone versed in  
> OBDA to look at them (they are all interrelated):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2336
> http://bugzilla.open-bio.org/show_bug.cgi?id=2337
> http://bugzilla.open-bio.org/show_bug.cgi?id=2338
> http://bugzilla.open-bio.org/show_bug.cgi?id=2339
>
> If anyone has any input I would greatly appreciate it.  I have been  
> trying to stomp as many bugs as possible so we can work on a new  
> release.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anjan.purkayastha at gmail.com  Tue Mar  4 18:52:09 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Tue, 4 Mar 2008 18:52:09 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
	<628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
Message-ID: <f03f40920803041552u49a68a64gf31b90cd0159ce19@mail.gmail.com>

guys,
thanks for all your inputs.
i went to the following site:
http://www.koders.com/perl/fid5F28A3DDD453F0DB4995B7DDF304B02DBBACE0A0.aspx?s=calculate

they have the key names for most of the emboss programs.

thanks,
anjan

On Tue, Mar 4, 2008 at 3:47 PM, Dave Messina <David.Messina at sbc.su.se>
wrote:

>
> where do i find the
> > correct names of the keys to create the input hash?
>
>
>
> I've never used this module, but from a quick look at the code it appears
> to pass on any parameters to palindrome.
>
> I'm guessing you've already done this, but have you tried using the
> parameter names and values that palindrome itself asks for?
>
>
> Dave
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From staffa at niehs.nih.gov  Wed Mar  5 18:43:30 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 18:43:30 -0500
Subject: [Bioperl-l] SeqIO
Message-ID: <C3F49A52.88AD%staffa@niehs.nih.gov>

So the Howto says that Bio::SeqIO will read almost any known format
including GCG. 
So I create a GCG file with Seqlab and try to printout its sequence as a
string. ( I did guess at the way to get the sequence string:

#!/usr/bin/perl -w
use strict;
$| = 1;
use Bio::SeqIO;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;
my $sequence = $seq_object->seq;
print "$sequence\n";
my $status = &windowscore($sequence);
}

But what it returned was the entire contents of the file with no format
decoding. Have I been deluded?

NewDNALength:810March5,200818:26Type:NCheck:3368..1TGTTCGAATTCCGTGCGGTCCACCT
CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From cjfields at uiuc.edu  Wed Mar  5 21:22:53 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 20:22:53 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>

I thought GCG format changed somewhere along the way but I maybe I'm  
wrong?  Regardless, you'll have to post this as a bug (along with an  
example file).

Also, kind of odd that the sequence data wasn't checked...

chris

On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Wed Mar  5 21:33:48 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Mar 2008 18:33:48 -0800
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>

probably you should try specifying the format explicitly first- as in  
(-format => 'gcg')

-j
On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:

> I thought GCG format changed somewhere along the way but I maybe  
> I'm wrong?  Regardless, you'll have to post this as a bug (along  
> with an example file).
>
> Also, kind of odd that the sequence data wasn't checked...
>
> chris
>
> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>
>> So the Howto says that Bio::SeqIO will read almost any known format
>> including GCG.
>> So I create a GCG file with Seqlab and try to printout its  
>> sequence as a
>> string. ( I did guess at the way to get the sequence string:
>>
>> #!/usr/bin/perl -w
>> use strict;
>> $| = 1;
>> use Bio::SeqIO;
>> my $number_of_files = @ARGV;
>> if(!$number_of_files){print "no files entered\n";exit:}
>> foreach my $file (@ARGV){
>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>> my $seq_object = $seqio_object->next_seq;
>> my $sequence = $seq_object->seq;
>> print "$sequence\n";
>> my $status = &windowscore($sequence);
>> }
>>
>> But what it returned was the entire contents of the file with no  
>> format
>> decoding. Have I been deluded?
>>
>> NewDNALength:810March5,200818:26Type:NCheck: 
>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG 
>> CGAAGGT
>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC 
>> GGCTGCT
>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT 
>> GCAGAGC
>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG 
>> GCCAGCG
>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG 
>> TCCCCTG
>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 
>> 51GGCAG
>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG 
>> AGACATC
>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG 
>> CCGCCC6
>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT 
>> TCATGCG
>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG 
>> CAGCCGC
>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA 
>> GGG
>>
>>
>>
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Wed Mar  5 21:01:07 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:01:07 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <19DC527F-3D34-4F3E-9B4C-D2C6011A2C8F@verizon.net>

Nick,

Take a look at the GCG files that are used in the SeqIO tests:

bioperl-live//t/data/test.gcg
bioperl-live//t/data/test_badlf.gcg

Does the file that you created have a format like the format in those  
files? I'm guessing you're going to say 'yes', from the looks of your  
output.

Brian O.


On Mar 5, 2008, at 6:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From staffa at niehs.nih.gov  Wed Mar  5 22:09:11 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 22:09:11 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>
Message-ID: <C3F4CA87.8074%staffa@niehs.nih.gov>

Verily,
One interpretation of the docs might be: will read any format if the format
is specified.
I was hoping that I could write a program that one needn't specify format.
It'd be more user-friendly and useful.


On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:

> probably you should try specifying the format explicitly first- as in
> (-format => 'gcg')
> 
> -j
> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> 
>> I thought GCG format changed somewhere along the way but I maybe
>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>> with an example file).
>> 
>> Also, kind of odd that the sequence data wasn't checked...
>> 
>> chris
>> 
>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>> 
>>> So the Howto says that Bio::SeqIO will read almost any known format
>>> including GCG.
>>> So I create a GCG file with Seqlab and try to printout its
>>> sequence as a
>>> string. ( I did guess at the way to get the sequence string:
>>> 
>>> #!/usr/bin/perl -w
>>> use strict;
>>> $| = 1;
>>> use Bio::SeqIO;
>>> my $number_of_files = @ARGV;
>>> if(!$number_of_files){print "no files entered\n";exit:}
>>> foreach my $file (@ARGV){
>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>> my $seq_object = $seqio_object->next_seq;
>>> my $sequence = $seq_object->seq;
>>> print "$sequence\n";
>>> my $status = &windowscore($sequence);
>>> }
>>> 
>>> But what it returned was the entire contents of the file with no
>>> format
>>> decoding. Have I been deluded?
>>> 
>>> NewDNALength:810March5,200818:26Type:NCheck:
>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>> CGAAGGT
>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>> GGCTGCT
>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>> GCAGAGC
>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>> GCCAGCG
>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>> TCCCCTG
>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>> 51GGCAG
>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>> AGACATC
>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>> CCGCCC6
>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>> TCATGCG
>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>> CAGCCGC
>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>> GGG
>>> 
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Wed Mar  5 22:44:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:44:14 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
	<1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
Message-ID: <9146DF9D-C0D6-4F18-9B7E-7BB42FCE0737@uiuc.edu>

Heh, good one!  Though Jason may have worked out the issue (not  
indicating the format explicitly).  Would be worth looking at the  
tested files.

As for dinosaurs, well I can't talk ...

chris

On Mar 5, 2008, at 8:49 PM, Brian Osborne wrote:

> Chris,
>
> Many many years ago, when dinosaurs roamed the earth, only about  
> half of the formats had their own tests. A primitive being saw this  
> and created simple tests for all the 'missing' formats. His thought  
> probably was 'this is better than nothing'. In fact this being  
> assumed that GCG was an outdated and unused format, even as long ago  
> as that time was.
>
> The origins of so much of what we now know as 'Bioperl' are  
> frequently mysterious, or incomprehensible to modern day humans...
>
> Brian O.
>
> On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:
>
>> Also, kind of odd that the sequence data wasn't checked...


From bosborne11 at verizon.net  Wed Mar  5 21:49:26 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:49:26 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>

Chris,

Many many years ago, when dinosaurs roamed the earth, only about half  
of the formats had their own tests. A primitive being saw this and  
created simple tests for all the 'missing' formats. His thought  
probably was 'this is better than nothing'. In fact this being assumed  
that GCG was an outdated and unused format, even as long ago as that  
time was.

The origins of so much of what we now know as 'Bioperl' are frequently  
mysterious, or incomprehensible to modern day humans...

Brian O.

On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:

> Also, kind of odd that the sequence data wasn't checked...


From cjfields at uiuc.edu  Wed Mar  5 22:54:15 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:54:15 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <67C6AE9D-3934-4717-A97A-4C31DB4F7E33@uiuc.edu>

You can leave off the format, but you must append the correct file  
extension for the parser to determine the correct format ('.gcg' for  
GCG, for example).  There is also Bio::Tools::GuessSeqFormat though it  
doesn't cover all formats.

chris

On Mar 5, 2008, at 9:09 PM, Staffa, Nick (NIH/NIEHS) wrote:

> Verily,
> One interpretation of the docs might be: will read any format if the  
> format
> is specified.
> I was hoping that I could write a program that one needn't specify  
> format.
> It'd be more user-friendly and useful.
>
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>
>> probably you should try specifying the format explicitly first- as in
>> (-format => 'gcg')
>>
>> -j
>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>
>>> I thought GCG format changed somewhere along the way but I maybe
>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>> with an example file).
>>>
>>> Also, kind of odd that the sequence data wasn't checked...
>>>
>>> chris
>>>
>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>
>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>> including GCG.
>>>> So I create a GCG file with Seqlab and try to printout its
>>>> sequence as a
>>>> string. ( I did guess at the way to get the sequence string:
>>>>
>>>> #!/usr/bin/perl -w
>>>> use strict;
>>>> $| = 1;
>>>> use Bio::SeqIO;
>>>> my $number_of_files = @ARGV;
>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>> foreach my $file (@ARGV){
>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>> my $seq_object = $seqio_object->next_seq;
>>>> my $sequence = $seq_object->seq;
>>>> print "$sequence\n";
>>>> my $status = &windowscore($sequence);
>>>> }
>>>>
>>>> But what it returned was the entire contents of the file with no
>>>> format
>>>> decoding. Have I been deluded?
>>>>
>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>> CGAAGGT
>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>> GGCTGCT
>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>> GCAGAGC
>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>> GCCAGCG
>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>> TCCCCTG
>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>> 51GGCAG
>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>> AGACATC
>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>> CCGCCC6
>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>> TCATGCG
>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>> CAGCCGC
>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>> GGG
>>>>
>>>>
>>>>
>>>> Nick Staffa
>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>> Scientific Computing Support Group
>>>> NIEHS Information Technology Support Services Contract
>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>> National Institute of Environmental Health Sciences
>>>> National Institutes of Health
>>>> Research Triangle Park, North Carolina
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at gmail.com  Thu Mar  6 03:16:25 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 6 Mar 2008 16:16:25 +0800
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
Message-ID: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>

Dear experts,

Is there any? The TRANSFAC text file which contain entry like this.
Especially we wich to capture the PWM for each of the Transcription
factor.

Regards,
Edward

__BEGIN__
VV  TRANSFAC MATRIX TABLE, Release 11.1 - licensed - 2007-03-31, (C)
Biobase GmbH
XX
//
AC  M00001
XX
ID  V$MYOD_01
XX
DT  19.10.1992 (created); ewi.
DT  22.10.1997 (updated); dbo.
CO  Copyright (C), Biobase GmbH.
XX
NA  MyoD
XX
DE  myoblast determination gene product
XX
BF  T00526; MyoD; Species: mouse, Mus musculus.
BF  T09177; MyoD; Species: mouse, Mus musculus.
XX
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G

....etc....


From watashi at post.com  Thu Mar  6 07:06:42 2008
From: watashi at post.com (Masa Masa)
Date: Thu, 6 Mar 2008 07:06:42 -0500
Subject: [Bioperl-l] failure of add_seqfeature
Message-ID: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>

Dear experts,

Would anybody know why the following codes generate an error of:


------------- EXCEPTION  -------------
MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained within parent feature, and expansion is not valid
STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
STACK toplevel test.pl:118

--------------------------------------
15616 15693
79568 83016

=================


use Bio::Graphics;                                                                    
use Bio::SeqFeature::Generic;                                                         
use Bio::SeqIO;                         


my $bsg = 'Bio::SeqFeature::Generic';                                                 
                                                                                      
my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], -display_name=>'U');      
                                                                                      
for (my $i=0; $i < @from; $i++) {                                                     
print "$from[$i] $to[$i]\n";                                                            
    $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to[$i]));            
    if ($i > 10) {                                                                    
       exit;                                                                                 
    }
}                

-- 
Want an e-mail address like mine?
Get a free e-mail account today at www.mail.com!


From heikki at sanbi.ac.za  Thu Mar  6 07:20:03 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 6 Mar 2008 14:20:03 +0200
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <200803061420.04123.heikki@sanbi.ac.za>


Nick,

This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:

/Length: .*Type: .*Check: .*\.\.$/

It is the second  line in GCG file. If first line matches to some other format 
regex, this will not not be evaluated.

Let us know,

	-Heikki

On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> Verily,
> One interpretation of the docs might be: will read any format if the format
> is specified.
> I was hoping that I could write a program that one needn't specify format.
> It'd be more user-friendly and useful.
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> > probably you should try specifying the format explicitly first- as in
> > (-format => 'gcg')
> >
> > -j
> >
> > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >> I thought GCG format changed somewhere along the way but I maybe
> >> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >> with an example file).
> >>
> >> Also, kind of odd that the sequence data wasn't checked...
> >>
> >> chris
> >>
> >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>> So the Howto says that Bio::SeqIO will read almost any known format
> >>> including GCG.
> >>> So I create a GCG file with Seqlab and try to printout its
> >>> sequence as a
> >>> string. ( I did guess at the way to get the sequence string:
> >>>
> >>> #!/usr/bin/perl -w
> >>> use strict;
> >>> $| = 1;
> >>> use Bio::SeqIO;
> >>> my $number_of_files = @ARGV;
> >>> if(!$number_of_files){print "no files entered\n";exit:}
> >>> foreach my $file (@ARGV){
> >>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>> my $seq_object = $seqio_object->next_seq;
> >>> my $sequence = $seq_object->seq;
> >>> print "$sequence\n";
> >>> my $status = &windowscore($sequence);
> >>> }
> >>>
> >>> But what it returned was the entire contents of the file with no
> >>> format
> >>> decoding. Have I been deluded?
> >>>
> >>> NewDNALength:810March5,200818:26Type:NCheck:
> >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>> CGAAGGT
> >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>> GGCTGCT
> >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>> GCAGAGC
> >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>> GCCAGCG
> >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>> TCCCCTG
> >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>> 51GGCAG
> >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>> AGACATC
> >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>> CCGCCC6
> >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>> TCATGCG
> >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>> CAGCCGC
> >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>> GGG
> >>>
> >>>
> >>>
> >>> Nick Staffa
> >>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>> Scientific Computing Support Group
> >>> NIEHS Information Technology Support Services Contract
> >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>> National Institute of Environmental Health Sciences
> >>> National Institutes of Health
> >>> Research Triangle Park, North Carolina
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bix at sendu.me.uk  Thu Mar  6 08:07:21 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 13:07:21 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
Message-ID: <47CFEC89.1000705@sendu.me.uk>

Edward Wijaya wrote:
> Dear experts,
> 
> Is there any? The TRANSFAC text file which contain entry like this.
> Especially we wich to capture the PWM for each of the Transcription
> factor.

Yes; I've written a module to do this, I just haven't committed it yet 
because certain things aren't quite right in terms of the API. But to 
just grab the PWM it should work fine. If you want I can email you the 
modules.


From sdavis2 at mail.nih.gov  Thu Mar  6 08:40:25 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 6 Mar 2008 08:40:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47CFEC89.1000705@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
Message-ID: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>

On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> Edward Wijaya wrote:
>  > Dear experts,
>  >
>  > Is there any? The TRANSFAC text file which contain entry like this.
>  > Especially we wich to capture the PWM for each of the Transcription
>  > factor.
>
>  Yes; I've written a module to do this, I just haven't committed it yet
>  because certain things aren't quite right in terms of the API. But to
>  just grab the PWM it should work fine. If you want I can email you the
>  modules.

I believe there are a set of non-bioperl modules called TFBS.  See
here (although I'm not sure this is the most up-to-date site):

http://tfbs.genereg.net/

Sean


From David.Messina at sbc.su.se  Thu Mar  6 09:55:24 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 6 Mar 2008 15:55:24 +0100
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <628aabb70803060655k5245296etf5ee2f31755230d3@mail.gmail.com>

Hi Masa,

Could you give us a little more information?

A complete test case (the code you included doesn't run because for example
the @from array doesn't exist) and input file would be helpful, as well as
the version of BioPerl you are using.

Dave


From staffa at niehs.nih.gov  Thu Mar  6 10:23:34 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 10:23:34 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <200803061420.04123.heikki@sanbi.ac.za>
Message-ID: <C3F576A6.88C9%staffa@niehs.nih.gov>

Here's the scoop:
When I use Jason's suggestion, (-format => 'gcg'),
My program works without complaint on the original file that looks like:
!!NA_SEQUENCE 1.0
   NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..

       1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
et c.

BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
(which should be retro-gcg format (before version 11?)),
my program runs, but there IS a complaint:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line 1.
BUT 
If I remove (-format => 'gcg'),  I get no complaint, but the sequence
returned still has its numbers imbedded. This effects my calculations.

Thanks, at least i know what my options are.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:

> 
> Nick,
> 
> This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:
> 
> /Length: .*Type: .*Check: .*\.\.$/
> 
> It is the second  line in GCG file. If first line matches to some other format
> regex, this will not not be evaluated.
> 
> Let us know,
> 
> -Heikki
> 
> On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
>> Verily,
>> One interpretation of the docs might be: will read any format if the format
>> is specified.
>> I was hoping that I could write a program that one needn't specify format.
>> It'd be more user-friendly and useful.
>> 
>> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>>> probably you should try specifying the format explicitly first- as in
>>> (-format => 'gcg')
>>> 
>>> -j
>>> 
>>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>>> I thought GCG format changed somewhere along the way but I maybe
>>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>>> with an example file).
>>>> 
>>>> Also, kind of odd that the sequence data wasn't checked...
>>>> 
>>>> chris
>>>> 
>>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>>> including GCG.
>>>>> So I create a GCG file with Seqlab and try to printout its
>>>>> sequence as a
>>>>> string. ( I did guess at the way to get the sequence string:
>>>>> 
>>>>> #!/usr/bin/perl -w
>>>>> use strict;
>>>>> $| = 1;
>>>>> use Bio::SeqIO;
>>>>> my $number_of_files = @ARGV;
>>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>>> foreach my $file (@ARGV){
>>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>>> my $seq_object = $seqio_object->next_seq;
>>>>> my $sequence = $seq_object->seq;
>>>>> print "$sequence\n";
>>>>> my $status = &windowscore($sequence);
>>>>> }
>>>>> 
>>>>> But what it returned was the entire contents of the file with no
>>>>> format
>>>>> decoding. Have I been deluded?
>>>>> 
>>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>>> CGAAGGT
>>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>>> GGCTGCT
>>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>>> GCAGAGC
>>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>>> GCCAGCG
>>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>>> TCCCCTG
>>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>>> 51GGCAG
>>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>>> AGACATC
>>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>>> CCGCCC6
>>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>>> TCATGCG
>>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>>> CAGCCGC
>>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>>> GGG
>>>>> 
>>>>> 
>>>>> 
>>>>> Nick Staffa
>>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>>> Scientific Computing Support Group
>>>>> NIEHS Information Technology Support Services Contract
>>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>>> National Institute of Environmental Health Sciences
>>>>> National Institutes of Health
>>>>> Research Triangle Park, North Carolina
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From hlapp at gmx.net  Thu Mar  6 10:26:52 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 10:26:52 -0500
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <6BD917FC-803E-471B-A0C4-219286E53C47@gmx.net>

It seems you are adding subfeatures with a location that is not  
within their parent feature location. If that's indeed what you want  
to do, add the 'EXPAND' argument.

Excerpted from the POD of Bio::SeqFeature::Generic:

         Usage   : $feat->add_SeqFeature($subfeat);
                   $feat->add_SeqFeature($subfeat,'EXPAND')
         Function: adds a SeqFeature into the subSeqFeature array.
                   with no 'EXPAND' qualifer, subfeat will be tested
                   as to whether it lies inside the parent, and throw
                   an exception if not.

                   If EXPAND is used, the parent's start/end/strand will
                   be adjusted so that it grows to accommodate the new
                   subFeature


On Mar 6, 2008, at 7:06 AM, Masa Masa wrote:

> Dear experts,
>
> Would anybody know why the following codes generate an error of:
>
>
> ------------- EXCEPTION  -------------
> MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained  
> within parent feature, and expansion is not valid
> STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/ 
> site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
> STACK toplevel test.pl:118
>
> --------------------------------------
> 15616 15693
> 79568 83016
>
> =================
>
>
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use Bio::SeqIO;
>
>
> my $bsg = 'Bio::SeqFeature::Generic';
>
> my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], - 
> display_name=>'U');
>
> for (my $i=0; $i < @from; $i++) {
> print "$from[$i] $to[$i]\n";
>     $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to 
> [$i]));
>     if ($i > 10) {
>        exit;
>     }
> }
>
> -- 
> Want an e-mail address like mine?
> Get a free e-mail account today at www.mail.com!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Mar  6 10:41:49 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 15:41:49 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>	
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
Message-ID: <47D010BD.4000801@sendu.me.uk>

Sean Davis wrote:
> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>> Edward Wijaya wrote:
>>  > Dear experts,
>>  >
>>  > Is there any? The TRANSFAC text file which contain entry like this.
>>  > Especially we wich to capture the PWM for each of the Transcription
>>  > factor.
>>
>>  Yes; I've written a module to do this, I just haven't committed it yet
>>  because certain things aren't quite right in terms of the API. But to
>>  just grab the PWM it should work fine. If you want I can email you the
>>  modules.
> 
> I believe there are a set of non-bioperl modules called TFBS.  See
> here (although I'm not sure this is the most up-to-date site):
> 
> http://tfbs.genereg.net/

I believe it's out of date enough to not work on the latest Transfac 
data, though I haven't used tried to confirm.

At any rate, the Transfac (Pro) database is pretty strange and 
complicated, and the TFBS modules certainly don't let you access 
everything in the way you might want or expect.


From cain.cshl at gmail.com  Thu Mar  6 11:43:35 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:43:35 -0500
Subject: [Bioperl-l] anonymous cvs?
Message-ID: <1204821815.6689.7.camel@frissell>

Hi All,

So now that the transition to svn is complete (and I like it), should
anonymous cvs still be working?  I believe there was discussion about
keeping it going via mirroring, and I hope that is the case.  It will
make life a little easier for people who want to do automated installs
of GBrowse and would like to use the installer script to get bioperl via
anon cvs.  If anon cvs is no longer available, does anyone have
suggestions for the best route to take for getting command line svn on
Windows?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 11:48:08 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:48:08 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <1204822088.6689.8.camel@frissell>

I should have mentioned that I tried it and it is not currently working:

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl checkout bioperl-live 
can't create temporary directory /tmp/cvs-serv32067
No space left on device


On Thu, 2008-03-06 at 11:43 -0500, Scott Cain wrote:
> Hi All,
> 
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
> 
> Thanks,
> Scott
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From Marc.Logghe at ablynx.com  Thu Mar  6 11:22:10 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 6 Mar 2008 17:22:10 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F576A6.88C9%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>

Hi Nick,
I don't think you should leave out the -format option. You have to leave
it in but the format should be provided by the B::T::GuessSeqFormat
object.
Something like:

#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::Tools::GuessSeqFormat;

$| = 1;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
  my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
  my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
$guesser->guess);
  my $seq_object = $seqio_object->next_seq;
  my $sequence = $seq_object->seq;
  print "$sequence\n";
}

HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> Sent: donderdag 6 maart 2008 16:24
> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Here's the scoop:
> When I use Jason's suggestion, (-format => 'gcg'),
> My program works without complaint on the original file that looks
like:
> !!NA_SEQUENCE 1.0
>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
> 
>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> et c.
> 
> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> (which should be retro-gcg format (before version 11?)),
> my program runs, but there IS a complaint:
> Use of uninitialized value in scalar chomp at
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
1.
> BUT
> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
> returned still has its numbers imbedded. This effects my calculations.
> 
> Thanks, at least i know what my options are.
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:
> 
> >
> > Nick,
> >
> > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a
gcg
> file:
> >
> > /Length: .*Type: .*Check: .*\.\.$/
> >
> > It is the second  line in GCG file. If first line matches to some
other
> format
> > regex, this will not not be evaluated.
> >
> > Let us know,
> >
> > -Heikki
> >
> > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> >> Verily,
> >> One interpretation of the docs might be: will read any format if
the
> format
> >> is specified.
> >> I was hoping that I could write a program that one needn't specify
> format.
> >> It'd be more user-friendly and useful.
> >>
> >> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> >>> probably you should try specifying the format explicitly first- as
in
> >>> (-format => 'gcg')
> >>>
> >>> -j
> >>>
> >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >>>> I thought GCG format changed somewhere along the way but I maybe
> >>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >>>> with an example file).
> >>>>
> >>>> Also, kind of odd that the sequence data wasn't checked...
> >>>>
> >>>> chris
> >>>>
> >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>>>> So the Howto says that Bio::SeqIO will read almost any known
format
> >>>>> including GCG.
> >>>>> So I create a GCG file with Seqlab and try to printout its
> >>>>> sequence as a
> >>>>> string. ( I did guess at the way to get the sequence string:
> >>>>>
> >>>>> #!/usr/bin/perl -w
> >>>>> use strict;
> >>>>> $| = 1;
> >>>>> use Bio::SeqIO;
> >>>>> my $number_of_files = @ARGV;
> >>>>> if(!$number_of_files){print "no files entered\n";exit:}
> >>>>> foreach my $file (@ARGV){
> >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>>>> my $seq_object = $seqio_object->next_seq;
> >>>>> my $sequence = $seq_object->seq;
> >>>>> print "$sequence\n";
> >>>>> my $status = &windowscore($sequence);
> >>>>> }
> >>>>>
> >>>>> But what it returned was the entire contents of the file with no
> >>>>> format
> >>>>> decoding. Have I been deluded?
> >>>>>
> >>>>> NewDNALength:810March5,200818:26Type:NCheck:
> >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>>>>
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>>>> CGAAGGT
> >>>>>
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>>>> GGCTGCT
> >>>>>
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>>>> GCAGAGC
> >>>>>
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>>>> GCCAGCG
> >>>>>
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>>>> TCCCCTG
> >>>>>
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>>>> 51GGCAG
> >>>>>
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>>>> AGACATC
> >>>>>
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>>>> CCGCCC6
> >>>>>
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>>>> TCATGCG
> >>>>>
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>>>> CAGCCGC
> >>>>>
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>>>> GGG
> >>>>>
> >>>>>
> >>>>>
> >>>>> Nick Staffa
> >>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>>>> Scientific Computing Support Group
> >>>>> NIEHS Information Technology Support Services Contract
> >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>>>> National Institute of Environmental Health Sciences
> >>>>> National Institutes of Health
> >>>>> Research Triangle Park, North Carolina
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher
> >>>> Lab of Dr. Robert Switzer
> >>>> Dept of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stefan.kirov at bms.com  Thu Mar  6 10:51:25 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 06 Mar 2008 10:51:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <47D012FD.7090600@bms.com>

Sendu Bala wrote:
> Sean Davis wrote:
>> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>>> Edward Wijaya wrote:
>>> > Dear experts,
>>> >
>>> > Is there any? The TRANSFAC text file which contain entry like this.
>>> > Especially we wich to capture the PWM for each of the Transcription
>>> > factor.
>>>
>>> Yes; I've written a module to do this, I just haven't committed it yet
>>> because certain things aren't quite right in terms of the API. But to
>>> just grab the PWM it should work fine. If you want I can email you the
>>> modules.
>>
>> I believe there are a set of non-bioperl modules called TFBS. See
>> here (although I'm not sure this is the most up-to-date site):
>>
>> http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest Transfac
> data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange and
> complicated, and the TFBS modules certainly don't let you access
> everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
Also be careful: there is a difference between PFM and PWM. Getting PWM
through most programs I have encountered will assume random distribution
(0.25 per each position in the background), unless you specify your own.
This could be something you may be comfortable with, but you definitely
should be aware of.


From jay at jays.net  Thu Mar  6 12:03:51 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 06 Mar 2008 11:03:51 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <47D023F7.4000803@jays.net>

Scott Cain wrote:
> It will make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.

Those installer scripts can't use anon SVN instead?

> If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>   

At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
there isn't an SVN equivalent?

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From whs at ebi.ac.uk  Thu Mar  6 12:08:51 2008
From: whs at ebi.ac.uk (William Spooner)
Date: Thu, 6 Mar 2008 17:08:51 +0000
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <07E3119E-0354-4E93-9980-3CB2B26DF2BE@ebi.ac.uk>

This will be important for Ensembl as well. As far as I know all of  
their install docs refer to BioPerl's anonymous CVS.

On 6 Mar 2008, at 16:43, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
William Spooner
Visiting Scientist
whs at ebi.ac.uk


From MEC at stowers-institute.org  Thu Mar  6 11:58:57 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 6 Mar 2008 10:58:57 -0600
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <BD62CBAC4395B94096109020651BE2EC11CE393C@exchmb-02.stowers-institute.org>

we use TFBS all the time against data coming from a recent local install of TRANSFAC(r) Professional 11.1 (2007-03-31)

the most recent is 11.4 (2007-12-14)

TFBS::* has the nice advantage that you can interoperate Transfac pwms with other (say, Jaspar) matrices and/or simple consesus sequence patterns; and it COULD be fairly easily extended to allow interoperation with other sources, say cisRED.  "One interface to rule them all" - bwa ha ha.

However, if you DO have locally installed Transfac (Pro) ($$), and want to use just it, then you should know that you can also call their `match` routines from the unix command line (though this is not documented to my knowledge).  I can supply my cheat sheet or otherwise advise if desired.  Also, if you go this way, I've written the requisite TFMatchOut2GFF to convert TRANSFAC match's output to GFF, if it suits your purpose, which I could release if asked.

If you want to use TFBS::**, I have written a command-line wrapper for the TFBS perl modules that might give you a leg up if you decide to use TFBS::**.  I could release them too, if useful.

But I agree, if I recall, TFBS::* were dropped from ongoing active development due to issues with data access policys.  And, I think that they no longer with with remotely hosted Transfac.  They did a few years ago. I think I tested a while ago and found that they do not.

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, March 06, 2008 9:42 AM
> To: Sean Davis
> Cc: bioperl-l at lists.open-bio.org; Edward Wijaya
> Subject: Re: [Bioperl-l] BioPerl Module to Parse Transfac
> Flat File Database
>
> Sean Davis wrote:
> > On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> >> Edward Wijaya wrote:
> >>  > Dear experts,
> >>  >
> >>  > Is there any? The TRANSFAC text file which contain
> entry like this.
> >>  > Especially we wich to capture the PWM for each of the
> >> Transcription  > factor.
> >>
> >>  Yes; I've written a module to do this, I just haven't
> committed it
> >> yet  because certain things aren't quite right in terms of
> the API.
> >> But to  just grab the PWM it should work fine. If you want I can
> >> email you the  modules.
> >
> > I believe there are a set of non-bioperl modules called TFBS.  See
> > here (although I'm not sure this is the most up-to-date site):
> >
> > http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest
> Transfac data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange
> and complicated, and the TFBS modules certainly don't let you
> access everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Mar  6 12:10:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:10:35 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>

BioPerl CVS is no longer being updated; you have to use Subversion to  
grab the latest (we have anon. svn set up for this).  We discussed  
syncing svn commits over to cvs but found it way too problematic and  
decided to make a clean break.

The best option I can think of as a replacement (so everyone isn't  
dependent on installing svn to get Gbrowse and bioperl-live) is to get  
a cron job set up which drops a bioperl-live archive into bioperl.org/ 
DIST or bioperl.org/SRC.  We have already talked about doing this for  
nightly builds from svn main trunk; we can probably set that up on our  
end.  Would that be feasible as a fallback in case svn isn't present?

The subversion project page has information on Windows versions:

http://subversion.tigris.org/project_packages.html

chris

On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cain.cshl at gmail.com  Thu Mar  6 12:22:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:22:29 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
Message-ID: <1204824149.6689.14.camel@frissell>

Hi Chris,

I think a nightly generated tarball would be sufficient for my use.  We
used anon cvs to get the lastest bioperl and then threw it away once it
was installed, so a tarball is just as good,if not better, since users
wouldn't need to install svn.  Not needing to install svn is good thing
for all my users, since I think many distributions do not supply it by
default.

Thanks,
Scott


On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
> BioPerl CVS is no longer being updated; you have to use Subversion to  
> grab the latest (we have anon. svn set up for this).  We discussed  
> syncing svn commits over to cvs but found it way too problematic and  
> decided to make a clean break.
> 
> The best option I can think of as a replacement (so everyone isn't  
> dependent on installing svn to get Gbrowse and bioperl-live) is to get  
> a cron job set up which drops a bioperl-live archive into bioperl.org/ 
> DIST or bioperl.org/SRC.  We have already talked about doing this for  
> nightly builds from svn main trunk; we can probably set that up on our  
> end.  Would that be feasible as a fallback in case svn isn't present?
> 
> The subversion project page has information on Windows versions:
> 
> http://subversion.tigris.org/project_packages.html
> 
> chris
> 
> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
> 
> > Hi All,
> >
> > So now that the transition to svn is complete (and I like it), should
> > anonymous cvs still be working?  I believe there was discussion about
> > keeping it going via mirroring, and I hope that is the case.  It will
> > make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl  
> > via
> > anon cvs.  If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >
> > Thanks,
> > Scott
> >
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 12:28:13 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:28:13 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <47D023F7.4000803@jays.net>
References: <1204821815.6689.7.camel@frissell>  <47D023F7.4000803@jays.net>
Message-ID: <1204824493.6689.19.camel@frissell>

Hi Jay,

It could use anon svn, though svn is considerably less ubiquitous, so it
effectively adds another prerequisite.

For cvs, the GUI WinCVS provides command line cvs as well.  I was
wondering if there was an easy to install equivalent for svn, though it
may be moot for me if the powers that be will provide a nightly
tarball :-)

Scott

On Thu, 2008-03-06 at 11:03 -0600, Jay Hannah wrote:
> Scott Cain wrote:
> > It will make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl via
> > anon cvs.
> 
> Those installer scripts can't use anon SVN instead?
> 
> > If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >   
> 
> At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
> and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
> there isn't an SVN equivalent?
> 
> j
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Thu Mar  6 12:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:28:36 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204824149.6689.14.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
Message-ID: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>

I'm working on the nightly build script now and will post back when  
everything is set up.

chris

On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:

> Hi Chris,
>
> I think a nightly generated tarball would be sufficient for my use.   
> We
> used anon cvs to get the lastest bioperl and then threw it away once  
> it
> was installed, so a tarball is just as good,if not better, since users
> wouldn't need to install svn.  Not needing to install svn is good  
> thing
> for all my users, since I think many distributions do not supply it by
> default.
>
> Thanks,
> Scott
>
>
>
> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>> BioPerl CVS is no longer being updated; you have to use Subversion to
>> grab the latest (we have anon. svn set up for this).  We discussed
>> syncing svn commits over to cvs but found it way too problematic and
>> decided to make a clean break.
>>
>> The best option I can think of as a replacement (so everyone isn't
>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>> get
>> a cron job set up which drops a bioperl-live archive into  
>> bioperl.org/
>> DIST or bioperl.org/SRC.  We have already talked about doing this for
>> nightly builds from svn main trunk; we can probably set that up on  
>> our
>> end.  Would that be feasible as a fallback in case svn isn't present?
>>
>> The subversion project page has information on Windows versions:
>>
>> http://subversion.tigris.org/project_packages.html
>>
>> chris
>>
>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>
>>> Hi All,
>>>
>>> So now that the transition to svn is complete (and I like it),  
>>> should
>>> anonymous cvs still be working?  I believe there was discussion  
>>> about
>>> keeping it going via mirroring, and I hope that is the case.  It  
>>> will
>>> make life a little easier for people who want to do automated  
>>> installs
>>> of GBrowse and would like to use the installer script to get bioperl
>>> via
>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>> suggestions for the best route to take for getting command line  
>>> svn on
>>> Windows?
>>>
>>> Thanks,
>>> Scott
>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>> GMOD Coordinator (http://www.gmod.org/)
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 15:38:22 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 14:38:22 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
	<ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
Message-ID: <2F746C5B-902C-4510-AEA3-2C46D4F51E7A@uiuc.edu>

Okay, I have set up nightly builds for bioperl-live, db, network, and  
run here:

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

At the moment this is running via a crontab off a script in my portal  
account, retrieving everything via anon. svn and bundling it up into  
zip and tarball archives.  I would like to set it up to grab  
everything off dev but I don't want to mess with my ssh setup, so if  
anyone has ideas there...

The script also adds a CHANGELOG file (last 10 commits) and removes  
the .svn directories prior to bundling.  The archive name has the  
subversion revision number and date included; md5 checksums are in the  
SIGNATURES file.  I'll check on it again tomorrow to make sure cron  
ran it.

We can probably set up automated PPM builds as well; might be worth  
testing down the road (we need a way to set defaults for Build args  
prior to getting that running).

chris

On Mar 6, 2008, at 11:28 AM, Chris Fields wrote:

> I'm working on the nightly build script now and will post back when  
> everything is set up.
>
> chris
>
> On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:
>
>> Hi Chris,
>>
>> I think a nightly generated tarball would be sufficient for my  
>> use.  We
>> used anon cvs to get the lastest bioperl and then threw it away  
>> once it
>> was installed, so a tarball is just as good,if not better, since  
>> users
>> wouldn't need to install svn.  Not needing to install svn is good  
>> thing
>> for all my users, since I think many distributions do not supply it  
>> by
>> default.
>>
>> Thanks,
>> Scott
>>
>>
>>
>> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>>> BioPerl CVS is no longer being updated; you have to use Subversion  
>>> to
>>> grab the latest (we have anon. svn set up for this).  We discussed
>>> syncing svn commits over to cvs but found it way too problematic and
>>> decided to make a clean break.
>>>
>>> The best option I can think of as a replacement (so everyone isn't
>>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>>> get
>>> a cron job set up which drops a bioperl-live archive into  
>>> bioperl.org/
>>> DIST or bioperl.org/SRC.  We have already talked about doing this  
>>> for
>>> nightly builds from svn main trunk; we can probably set that up on  
>>> our
>>> end.  Would that be feasible as a fallback in case svn isn't  
>>> present?
>>>
>>> The subversion project page has information on Windows versions:
>>>
>>> http://subversion.tigris.org/project_packages.html
>>>
>>> chris
>>>
>>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>>
>>>> Hi All,
>>>>
>>>> So now that the transition to svn is complete (and I like it),  
>>>> should
>>>> anonymous cvs still be working?  I believe there was discussion  
>>>> about
>>>> keeping it going via mirroring, and I hope that is the case.  It  
>>>> will
>>>> make life a little easier for people who want to do automated  
>>>> installs
>>>> of GBrowse and would like to use the installer script to get  
>>>> bioperl
>>>> via
>>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>>> suggestions for the best route to take for getting command line  
>>>> svn on
>>>> Windows?
>>>>
>>>> Thanks,
>>>> Scott
>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 16:48:37 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 15:48:37 -0600
Subject: [Bioperl-l] Nightly build archives now available
Message-ID: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>

We now have nightly bundled archives for bioperl-live, bioperl-db,  
bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
am every night.

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

The archives are date-stamped and also have the Subversion revision,  
just in case one wanted to ensure they get the correct version for the  
bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
(if there are any).  These are currently derived off the anon. svn  
repository.

chris


From David.Messina at sbc.su.se  Thu Mar  6 18:50:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 00:50:04 +0100
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>

Very slick and well-thought-out, Chris -- nice job!


Dave


From hlapp at gmx.net  Thu Mar  6 19:06:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 19:06:41 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <E39ADEB2-1FE7-4024-8188-EFF98F17BF52@gmx.net>

Awesome - thanks for doing this, Chris! -hilmar

On Mar 6, 2008, at 4:48 PM, Chris Fields wrote:

> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~  
> 1:00 am every night.
>
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>
> The archives are date-stamped and also have the Subversion  
> revision, just in case one wanted to ensure they get the correct  
> version for the bug fix.  They also contain a CHANGELOG file for  
> the last 10 revisions (if there are any).  These are currently  
> derived off the anon. svn repository.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From staffa at niehs.nih.gov  Thu Mar  6 18:27:31 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 18:27:31 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>
Message-ID: <C3F5E813.8903%staffa@niehs.nih.gov>

Thanks 
I really appreciate all the interest given and help generated.
that sure sounds like a great idea, but i think
Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
Is there a substitute?
It works great with
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.

as seen in:
gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
guesser guesses gcg
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAGGGCA
GAGCCTCGGGA  et c.
(yes, I added
my $file_type = $guesser->guess;
print "guesser guesses $file_type\n";
)

BUT
when applied to a genbank sequence passed thru the Seqlab editor and turned
into GCG, to wit:
!!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993
DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate kinase
            (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3).
ACCESSION   X05246 Y00261
...
...
BASE COUNT      583 a    367 c    442 g    519 t
ORIGIN

 HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156  ..

       1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
et c.

It thinks it is a flawed PIR:

gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
guesser guesses pir

------------- EXCEPTION  -------------
MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993


Must look at why guesser is thinking PIR.


On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:

> Hi Nick,
> I don't think you should leave out the -format option. You have to leave
> it in but the format should be provided by the B::T::GuessSeqFormat
> object.
> Something like:
> 
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Tools::GuessSeqFormat;
> 
> $| = 1;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
>   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
>   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
> $guesser->guess);
>   my $seq_object = $seqio_object->next_seq;
>   my $sequence = $seq_object->seq;
>   print "$sequence\n";
> }
> 
> HTH,
> Marc
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
>> Sent: donderdag 6 maart 2008 16:24
>> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
>> Cc: Chris Fields
>> Subject: Re: [Bioperl-l] SeqIO
>> 
>> Here's the scoop:
>> When I use Jason's suggestion, (-format => 'gcg'),
>> My program works without complaint on the original file that looks
> like:
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.
>> 
>> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
>> (which should be retro-gcg format (before version 11?)),
>> my program runs, but there IS a complaint:
>> Use of uninitialized value in scalar chomp at
>> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
> 1.
>> BUT
>> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
>> returned still has its numbers imbedded. This effects my calculations.
>> 
>> Thanks, at least i know what my options are.
>> 
>> 
>> 
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>


From cjfields at uiuc.edu  Thu Mar  6 23:32:39 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 22:32:39 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
Message-ID: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>

I would like to get automated PPM builds set up as well but I think we  
have to rework some Build.PL stuff to get that going.

The next thing is to set up a regular script to check test/POD coverage.

chris

On Mar 6, 2008, at 5:50 PM, Dave Messina wrote:

> Very slick and well-thought-out, Chris -- nice job!
>
>
> Dave


From Marc.Logghe at ablynx.com  Fri Mar  7 04:04:35 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Mar 2008 10:04:35 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F5E813.8903%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A938@alpaca.lan.ablynx.com>

Ahh, my reply did not make much sense when I took a new look. I was the
one who learnt something here :-)
Did not know that Bio::SeqIO was already using B::T::GuessSeqFormat
under the hood. Learnt as well that you have to be careful with the
filename extension because this seems to have precedence.
Regards,
Marc


> -----Original Message-----
> From: Staffa, Nick (NIH/NIEHS) [mailto:staffa at niehs.nih.gov]
> Sent: vrijdag 7 maart 2008 0:28
> To: Marc Logghe; Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Thanks
> I really appreciate all the interest given and help generated.
> that sure sounds like a great idea, but i think
> Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
> Is there a substitute?
> It works great with
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> 
> as seen in:
> gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
> guesser guesses gcg
>
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAG
GG
> CA
> GAGCCTCGGGA  et c.
> (yes, I added
> my $file_type = $guesser->guess;
> print "guesser guesses $file_type\n";
> )
> 
> BUT
> when applied to a genbank sequence passed thru the Seqlab editor and
> turned
> into GCG, to wit:
> !!NA_SEQUENCE 1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate
kinase
>             (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC
2.7.2.3).
> ACCESSION   X05246 Y00261
> ...
> ...
> BASE COUNT      583 a    367 c    442 g    519 t
> ORIGIN
> 
>  HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156
..
> 
>        1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
> et c.
> 
> It thinks it is a flawed PIR:
> 
> gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
> guesser guesses pir
> 
> ------------- EXCEPTION  -------------
> MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE
1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> 
> 
> Must look at why guesser is thinking PIR.
> 
> 
> 
> 
> On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:
> 
> > Hi Nick,
> > I don't think you should leave out the -format option. You have to
leave
> > it in but the format should be provided by the B::T::GuessSeqFormat
> > object.
> > Something like:
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GuessSeqFormat;
> >
> > $| = 1;
> > my $number_of_files = @ARGV;
> > if(!$number_of_files){print "no files entered\n";exit:}
> > foreach my $file (@ARGV){
> >   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
> >   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file,
-format =>
> > $guesser->guess);
> >   my $seq_object = $seqio_object->next_seq;
> >   my $sequence = $seq_object->seq;
> >   print "$sequence\n";
> > }
> >
> > HTH,
> > Marc
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> >> Sent: donderdag 6 maart 2008 16:24
> >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> >> Cc: Chris Fields
> >> Subject: Re: [Bioperl-l] SeqIO
> >>
> >> Here's the scoop:
> >> When I use Jason's suggestion, (-format => 'gcg'),
> >> My program works without complaint on the original file that looks
> > like:
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> >>
> >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> >> (which should be retro-gcg format (before version 11?)),
> >> my program runs, but there IS a complaint:
> >> Use of uninitialized value in scalar chomp at
> >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0>
line
> > 1.
> >> BUT
> >> If I remove (-format => 'gcg'),  I get no complaint, but the
sequence
> >> returned still has its numbers imbedded. This effects my
calculations.
> >>
> >> Thanks, at least i know what my options are.
> >>
> >>
> >>
> >> Nick Staffa
> >> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >> Scientific Computing Support Group
> >> NIEHS Information Technology Support Services Contract
> >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >> National Institute of Environmental Health Sciences
> >> National Institutes of Health
> >> Research Triangle Park, North Carolina
> >


From bix at sendu.me.uk  Fri Mar  7 05:32:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 10:32:01 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
Message-ID: <47D119A1.10408@sendu.me.uk>

Chris Fields wrote:
> I would like to get automated PPM builds set up as well but I think we 
> have to rework some Build.PL stuff to get that going.

What's the hold-up on that front?


From heikki at sanbi.ac.za  Fri Mar  7 06:09:25 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 7 Mar 2008 13:09:25 +0200
Subject: [Bioperl-l] BioSQL V1.0.0 released
Message-ID: <200803071309.25294.heikki@sanbi.ac.za>

BIOSQL V1.0.0 RELEASED
http://news.open-bio.org/archives/2008_03.html#000094


Congratulations, Hilmar!

		-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Fri Mar  7 08:53:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:53:50 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D119A1.10408@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
Message-ID: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>

I haven't tried it out yet, to tell the truth.  The worry I have is  
prompting during the build process for database tests, networking, etc.

I have looked for it, but couldn't determine whether we have a way to  
run 'perl Build.PL' and bypass prompts with passed arguments.  The  
only one I could find was 'network', for network tests.

Scott Cain and I have corresponded about this before, i.e. it would be  
nice to have boolean flags for each prompt (prereqs, database tests,  
scripts, network, etc).  For nightly PPMs I would forego tests and  
include scripts.

chris

On Mar 7, 2008, at 4:32 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I would like to get automated PPM builds set up as well but I think  
>> we have to rework some Build.PL stuff to get that going.
>
> What's the hold-up on that front?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Mar  7 08:22:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:22:27 -0600
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <200803071309.25294.heikki@sanbi.ac.za>
References: <200803071309.25294.heikki@sanbi.ac.za>
Message-ID: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>

Same here.  Great news!

chris

On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:

> BIOSQL V1.0.0 RELEASED
> http://news.open-bio.org/archives/2008_03.html#000094
>
>
> Congratulations, Hilmar!
>
> 		-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Mar  7 09:10:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:10:08 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
Message-ID: <47D14CC0.8000104@sendu.me.uk>

Chris Fields wrote:
> I haven't tried it out yet, to tell the truth.  The worry I have is 
> prompting during the build process for database tests, networking, etc.
> 
> I have looked for it, but couldn't determine whether we have a way to 
> run 'perl Build.PL' and bypass prompts with passed arguments.  The only 
> one I could find was 'network', for network tests.
> 
> Scott Cain and I have corresponded about this before, i.e. it would be 
> nice to have boolean flags for each prompt (prereqs, database tests, 
> scripts, network, etc).  For nightly PPMs I would forego tests and 
> include scripts.

I don't quite understand how you're making the nightlys right now, but 
you should be using the dist actions:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

Ie.

One time (and one time only):
perl Build.PL (it doesn't matter how you answer the questions)

Then every night:
./Build dist
./Build ppmdist

You then upload the resulting .tar.gz and .zip files.


Only if Build.PL or ModuleBuildBioperl are updated might you need to:
./Build realclean
perl Build.PL
again. But this should be a rare event and even more rarely would it be 
/required/ (probably never).


From bix at sendu.me.uk  Fri Mar  7 09:19:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:19:36 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <47D14EF8.5090107@sendu.me.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is 
>> prompting during the build process for database tests, networking, etc.
>>
>> I have looked for it, but couldn't determine whether we have a way to 
>> run 'perl Build.PL' and bypass prompts with passed arguments.  The 
>> only one I could find was 'network', for network tests.
>>
>> Scott Cain and I have corresponded about this before, i.e. it would be 
>> nice to have boolean flags for each prompt (prereqs, database tests, 
>> scripts, network, etc).  For nightly PPMs I would forego tests and 
>> include scripts.
> 
> I don't quite understand how you're making the nightlys right now, but 
> you should be using the dist actions:
> 
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
> 
> Ie.
> 
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
> 
> Then every night:
> ./Build dist
> ./Build ppmdist
> 
> You then upload the resulting .tar.gz and .zip files.

Ah, having uploaded the various archives you'll have to manually delete 
them before dunning the dist action the next night, otherwise dist will 
ask you if you want to overwrite them. Otherwise dist asks no questions.


From cjfields at uiuc.edu  Fri Mar  7 09:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 08:28:36 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>


On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is  
>> prompting during the build process for database tests, networking,  
>> etc.
>> I have looked for it, but couldn't determine whether we have a way  
>> to run 'perl Build.PL' and bypass prompts with passed arguments.   
>> The only one I could find was 'network', for network tests.
>> Scott Cain and I have corresponded about this before, i.e. it would  
>> be nice to have boolean flags for each prompt (prereqs, database  
>> tests, scripts, network, etc).  For nightly PPMs I would forego  
>> tests and include scripts.
>
> I don't quite understand how you're making the nightlys right now,  
> but you should be using the dist actions:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> Ie.
>
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
>
> Then every night:
> ./Build dist
> ./Build ppmdist
>
> You then upload the resulting .tar.gz and .zip files.
>
>
> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
> ./Build realclean
> perl Build.PL
> again. But this should be a rare event and even more rarely would it  
> be /required/ (probably never).

I'm not making a distribution; the archives are merely cleaned up svn  
checkouts (no .svn directories).  This is essentially what the  
net_install script would get when installing GBrowse using the 'dev'  
option, except you don't need to install Subversion to get updates.   
Also, at this point we don't have an analogous 'Download tarball'  
setting for browsable svn either, so this is a suitable alternative.

Again, I don't want to deal with prompts while running a cron job  
(this is a bash script), particularly if I can't guarantee the number  
of prompts or the prompting order won't change down the line.  If we  
can set up a way around that using passed args to Build.PL then it  
would make life much easier and we could automate 'Build dist', 'Build  
ppmdist', 'Build testcover', etc.

chris


From bix at sendu.me.uk  Fri Mar  7 09:54:41 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:54:41 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
Message-ID: <47D15731.2050000@sendu.me.uk>

Chris Fields wrote:
> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>> One time (and one time only):
>> perl Build.PL (it doesn't matter how you answer the questions)
>>
>> Then every night:
>> ./Build dist
>> ./Build ppmdist
>>
>> You then upload the resulting .tar.gz and .zip files.
>>
>>
>> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
>> ./Build realclean
>> perl Build.PL
>> again. But this should be a rare event and even more rarely would it 
>> be /required/ (probably never).
> 
> I'm not making a distribution; the archives are merely cleaned up svn 
> checkouts (no .svn directories).  This is essentially what the 
> net_install script would get when installing GBrowse using the 'dev' 
> option, except you don't need to install Subversion to get updates.  
> Also, at this point we don't have an analogous 'Download tarball' 
> setting for browsable svn either, so this is a suitable alternative.

The dist action does what you want. I did a diff on the most recent 
nightly build and the .tar.gz produced by the dist action of a checkout 
of revision 14603:

$ diff -r bioperl-1.5.2_100 bioperl-live
diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/chaos.pm
2c2
< # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
---
 > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
Only in bioperl-live/Bio/Tools: WebBlat.pm
Only in bioperl-live: CHANGELOG
Only in bioperl-1.5.2_100: MANIFEST
Only in bioperl-1.5.2_100: META.yml
diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
1,31c1,30
< # Note: this file was auto-generated by Module::Build::Compat version 0.03
[snip]
---
 > #!/usr/bin/perl -w
 >
 > # This is a stub that simply tells you to use Build.PL instead
[snip]
Only in bioperl-live: bioperl.lisp
diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl 
bioperl-live/maintenance/cvs2cl_by_file.pl
29c29
< ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
---
 > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $

I don't know what's going on with the date differences, but for a file 
found in a folder called '/DIST/nightly_builds/', you want the MANIFEST 
and META.yml files. You also want the Compat version of Build.PL since 
we haven't yet moved to forcing people to use Build.PL.

'./Build dist' does the right thing.


> Again, I don't want to deal with prompts while running a cron job (this 
> is a bash script), particularly if I can't guarantee the number of 
> prompts or the prompting order won't change down the line.  If we can 
> set up a way around that using passed args to Build.PL then it would 
> make life much easier and we could automate 'Build dist', 'Build 
> ppmdist', 'Build testcover', etc.

Again, you only need to run 'perl Build.PL' once and answer the 
questions only once. Then you can svn update and run the actions with no 
more questions to answer. This isn't a problem that needs to be solved. 
It is /supposed/ to be this way. It's ready to use! Please make use of 
it; it's one of the (many) reasons I moved Bioperl over to Build.PL in 
the first place.


From cjfields at uiuc.edu  Fri Mar  7 10:29:11 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 09:29:11 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D15731.2050000@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
Message-ID: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>


On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>>> One time (and one time only):
>>> perl Build.PL (it doesn't matter how you answer the questions)
>>>
>>> Then every night:
>>> ./Build dist
>>> ./Build ppmdist
>>>
>>> You then upload the resulting .tar.gz and .zip files.
>>>
>>>
>>> Only if Build.PL or ModuleBuildBioperl are updated might you need  
>>> to:
>>> ./Build realclean
>>> perl Build.PL
>>> again. But this should be a rare event and even more rarely would  
>>> it be /required/ (probably never).
>> I'm not making a distribution; the archives are merely cleaned up  
>> svn checkouts (no .svn directories).  This is essentially what the  
>> net_install script would get when installing GBrowse using the  
>> 'dev' option, except you don't need to install Subversion to get  
>> updates.  Also, at this point we don't have an analogous 'Download  
>> tarball' setting for browsable svn either, so this is a suitable  
>> alternative.
>
> The dist action does what you want. I did a diff on the most recent  
> nightly build and the .tar.gz produced by the dist action of a  
> checkout of revision 14603:
>
> $ diff -r bioperl-1.5.2_100 bioperl-live
> diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/ 
> chaos.pm
> 2c2
> < # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
> ---
> > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
> Only in bioperl-live/Bio/Tools: WebBlat.pm
> Only in bioperl-live: CHANGELOG
> Only in bioperl-1.5.2_100: MANIFEST
> Only in bioperl-1.5.2_100: META.yml
> diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
> 1,31c1,30
> < # Note: this file was auto-generated by Module::Build::Compat  
> version 0.03
> [snip]
> ---
> > #!/usr/bin/perl -w
> >
> > # This is a stub that simply tells you to use Build.PL instead
> [snip]
> Only in bioperl-live: bioperl.lisp
> diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl bioperl-live/ 
> maintenance/cvs2cl_by_file.pl
> 29c29
> < ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
> ---
> > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $
>
> I don't know what's going on with the date differences, but for a  
> file found in a folder called '/DIST/nightly_builds/', you want the  
> MANIFEST and META.yml files. You also want the Compat version of  
> Build.PL since we haven't yet moved to forcing people to use Build.PL.
>
> './Build dist' does the right thing.
>
>
>> Again, I don't want to deal with prompts while running a cron job  
>> (this is a bash script), particularly if I can't guarantee the  
>> number of prompts or the prompting order won't change down the  
>> line.  If we can set up a way around that using passed args to  
>> Build.PL then it would make life much easier and we could automate  
>> 'Build dist', 'Build ppmdist', 'Build testcover', etc.
>
> Again, you only need to run 'perl Build.PL' once and answer the  
> questions only once. Then you can svn update and run the actions  
> with no more questions to answer. This isn't a problem that needs to  
> be solved. It is /supposed/ to be this way. It's ready to use!  
> Please make use of it; it's one of the (many) reasons I moved  
> Bioperl over to Build.PL in the first place.

Then set it up the way you want.  I give up.

chris


From bix at sendu.me.uk  Fri Mar  7 10:43:44 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 15:43:44 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
Message-ID: <47D162B0.5070402@sendu.me.uk>

Chris Fields wrote:
> 
> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>
>> Again, you only need to run 'perl Build.PL' once and answer the 
>> questions only once. Then you can svn update and run the actions with 
>> no more questions to answer. This isn't a problem that needs to be 
>> solved. It is /supposed/ to be this way. It's ready to use! Please 
>> make use of it; it's one of the (many) reasons I moved Bioperl over to 
>> Build.PL in the first place.
> 
> Then set it up the way you want.  I give up.

I really don't understand that response. I have merely informed you how 
Build.PL and the actions work, since you didn't know. I have informed 
you it already does what you want in terms of automation; there's 
nothing to wait for, no more work to do. I have requested you use it, 
since there is little value in duplicating code and effort.

Now that you have the information, you can make an informed choice as to 
how to proceed, based on your needs. If you have good reasons for 
sticking with your current nightly build process, by all means stick 
with them.

Mainly I just wanted to make clear (as a general point for anyone 
interested) that the questions asked by Build.PL aren't an issue or 
obstacle in terms of automating builds or tests.


From cjfields at uiuc.edu  Fri Mar  7 11:15:31 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 10:15:31 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D162B0.5070402@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
Message-ID: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>

On Mar 7, 2008, at 9:43 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>>
>>> Again, you only need to run 'perl Build.PL' once and answer the  
>>> questions only once. Then you can svn update and run the actions  
>>> with no more questions to answer. This isn't a problem that needs  
>>> to be solved. It is /supposed/ to be this way. It's ready to use!  
>>> Please make use of it; it's one of the (many) reasons I moved  
>>> Bioperl over to Build.PL in the first place.
>> Then set it up the way you want.  I give up.
>
> I really don't understand that response. I have merely informed you  
> how Build.PL and the actions work, since you didn't know. I have  
> informed you it already does what you want in terms of automation;  
> there's nothing to wait for, no more work to do. I have requested  
> you use it, since there is little value in duplicating code and  
> effort.
>
> Now that you have the information, you can make an informed choice  
> as to how to proceed, based on your needs. If you have good reasons  
> for sticking with your current nightly build process, by all means  
> stick with them.
>
> Mainly I just wanted to make clear (as a general point for anyone  
> interested) that the questions asked by Build.PL aren't an issue or  
> obstacle in terms of automating builds or tests.

It doesn't come across that way; it comes off as pretty  
condescending.  And please don't assume I lack experience with how  
Module::Build works (I have used 'Build ppmdist' and 'Build testcover'  
quite a few times recently, and the next item on my agenda is to fix  
the various issues with Build.PL and database checking, which you  
already know).

So my response is pretty simple; if you feel the need to use  
'Build.PL' to make nightlies, then by all means set it up.  I find it  
much harder to work with the current Build process in an automated way  
using a bash script, so I work around it.  If it makes you happier we  
can switch the directory over to 'nightly_checkouts', but I think  
that's just mincing semantics.

Okay, it's pretty obvious we're not on the same page here.  I'll go  
through it carefully so you understand the problem:

1)  I am running a 'svn co' on anon. svn for the various distros to a  
temp directory.  This is done using a bash script.  If I attempt to  
change into the distribution directory and run 'perl Build.PL' from  
the bash script, I immediately run into permissions issues and several  
odd things:

Checking prerequisites...
  - ERROR: Bio::Root::Version is not installed
(I think you ran Build.PL directly, so will use CPAN to install  
prerequisites on demand)
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
Warning: You are not allowed to write into directory "/root/.cpan/ 
sources/authors".
     I'll continue, but if you encounter problems, they may be due
     to insufficient permissions.
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/ 
authors/01mailrc.txt.gz-8678': Permission denied]
Fetching with Net::FTP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
Cannot open Local file /root/.cpan/sources/authors/01mailrc.txt.gz:  
Permission denied
....

2) I suspect, even if I worked around permissions and set up the job  
as root or admin and worked out why it can't find  
'Bio::Root::Version' (?!?), this would still be a terrific pain in the  
*** to deal with as the Build.PL process is expecting answers for each  
and every prompt, and the process differs for each distribution.  Yes,  
I could set something up to deal with that on in the script.  No, I  
will not do that as any additions or changes to prompts could break/ 
hang the script or (worse) silently change what the archive contains.   
Hence my indication that passing flags to 'perl Build.PL' would be a  
nice way to work around that.  For that I haven't heard a response, so  
I assume that functionality isn't there (or am I assuming incorrectly?).

So, from where I stand, even if using Build.PL is the /proper/ way to  
do it, it doesn't work as expected using an automated process (i.e.  
cron).  Make sense?

chris


From matthewehodges at gmail.com  Fri Mar  7 11:16:47 2008
From: matthewehodges at gmail.com (Matt)
Date: Fri, 7 Mar 2008 16:16:47 +0000 (UTC)
Subject: [Bioperl-l] Reciprocal blast
Message-ID: <loom.20080307T160014-878@post.gmane.org>

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the 
protein models of various species also in fasta format. The aim is o have an 
output showing presence/not presence. I think this is possible to do using 
perl, but i'm very much a beginner so any help in this would be greatly 
appreciated.
Thanks
Matt


From bix at sendu.me.uk  Fri Mar  7 12:34:17 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 17:34:17 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D17C99.9050009@sendu.me.uk>

Chris Fields wrote:
> 1)  I am running a 'svn co' on anon. svn for the various distros to a 
> temp directory.

Is it important that you do a fresh co every night? Why not do a co once 
   and then do a 'svn update' every night? This is the crux of the 
problems: if you choose to simply update, then you only have to get 
'perl Build.PL' to work once.


> If I attempt to change into the distribution directory and run 'perl Build.PL' from the 
> bash script, I immediately run into permissions issues and several odd 
> things:
> 
> Checking prerequisites...
>  - ERROR: Bio::Root::Version is not installed
> (I think you ran Build.PL directly, so will use CPAN to install 
> prerequisites on demand)
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
>   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
> Warning: You are not allowed to write into directory 
> "/root/.cpan/sources/authors".
[snip]

I'm assuming this is on portal? The CPAN setup for users is a little 
broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm

$CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"

Then you can run and configure cpan correctly and install Bundle::CPAN. 
Some of the zlib stuff failed to install for me, but that doesn't seem 
to matter.

Of course, I guess it makes sense for root to just install all of 
Bioperl's prereqs anyway, so that testing can be automated in the future.

Anyway, once you have cpan happy 'perl Build.PL' will run fine. Answer 
'n' to everything and then your cron job just has to call './Build dist'.


> 2) I suspect, even if I worked around permissions and set up the job as 
> root or admin and worked out why it can't find 'Bio::Root::Version' 
> (?!?), this would still be a terrific pain in the *** to deal with as 
> the Build.PL process is expecting answers for each and every prompt, and 
> the process differs for each distribution.

You won't be running Build.PL in the cron job.


> passing flags to 'perl Build.PL' would be a nice way to work around 
> that.  For that I haven't heard a response, so I assume that 
> functionality isn't there (or am I assuming incorrectly?).

It isn't AFAIK, but my point is that it doesn't need to be (for this 
particular use-case at least).


> So, from where I stand, even if using Build.PL is the /proper/ way to do 
> it, it doesn't work as expected using an automated process (i.e. cron).  
> Make sense?

Only if you can't run 'svn update' instead of 'svn co' each night.


From cjfields at uiuc.edu  Fri Mar  7 13:00:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 12:00:52 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D17C99.9050009@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
Message-ID: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>


On Mar 7, 2008, at 11:34 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> 1)  I am running a 'svn co' on anon. svn for the various distros to  
>> a temp directory.
>
> Is it important that you do a fresh co every night? Why not do a co  
> once   and then do a 'svn update' every night? This is the crux of  
> the problems: if you choose to simply update, then you only have to  
> get 'perl Build.PL' to work once.

Unless you update Build.PL (which will happen as the distributions  
grow).  Then you need to rerun 'perl Build.PL'.  It seems safer to run  
that each time with a 'pass-through' flag for automated builds.

>> If I attempt to change into the distribution directory and run  
>> 'perl Build.PL' from the bash script, I immediately run into  
>> permissions issues and several odd things:
>> Checking prerequisites...
>> - ERROR: Bio::Root::Version is not installed
>> (I think you ran Build.PL directly, so will use CPAN to install  
>> prerequisites on demand)
>> CPAN: Storable loaded ok
>> Going to read /root/.cpan/Metadata
>>  Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
>> Warning: You are not allowed to write into directory "/root/.cpan/ 
>> sources/authors".
> [snip]
>
> I'm assuming this is on portal? The CPAN setup for users is a little  
> broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm
>
> $CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"
>
> Then you can run and configure cpan correctly and install  
> Bundle::CPAN. Some of the zlib stuff failed to install for me, but  
> that doesn't seem to matter.
>
> Of course, I guess it makes sense for root to just install all of  
> Bioperl's prereqs anyway, so that testing can be automated in the  
> future.
>
> Anyway, once you have cpan happy 'perl Build.PL' will run fine.  
> Answer 'n' to everything and then your cron job just has to call './ 
> Build dist'.

I agree about setting up the prereqs.  I could also (as mentioned  
before) set this up as root.  However, if we go this route we need to  
have 'perl Build.PL' included in the process in order to ensure a  
clean build process each time and to prevent the script from breaking  
whenever someone decides to change Build.PL.

>> 2) I suspect, even if I worked around permissions and set up the  
>> job as root or admin and worked out why it can't find  
>> 'Bio::Root::Version' (?!?), this would still be a terrific pain in  
>> the *** to deal with as the Build.PL process is expecting answers  
>> for each and every prompt, and the process differs for each  
>> distribution.
>
> You won't be running Build.PL in the cron job.

See above.  I don't want to set up something automated which can't be  
maintained in the long term.

>> passing flags to 'perl Build.PL' would be a nice way to work around  
>> that.  For that I haven't heard a response, so I assume that  
>> functionality isn't there (or am I assuming incorrectly?).
>
> It isn't AFAIK, but my point is that it doesn't need to be (for this  
> particular use-case at least).

See above.  There are very good reasons to allow this (and the  
functionality has been requested before, particularly from the GMOD  
crowd).  If I can pass in a single flag (for instance, --defaults,  
which just uses the default arg for each prompt) then it would make  
it /much/ easier.

>> So, from where I stand, even if using Build.PL is the /proper/ way  
>> to do it, it doesn't work as expected using an automated process  
>> (i.e. cron).  Make sense?
>
> Only if you can't run 'svn update' instead of 'svn co' each night.

I think a single co with updates is feasible (I can do that with the  
current setup; just run the initial co, copy the directory over to a  
temp copy, then go about my business).

I'll leave the nightly build setup as is for now and work on getting  
Build.PL working (something we need anyway for Devel::Cover and  
Pod::Coverage work).

chris


From David.Messina at sbc.su.se  Fri Mar  7 13:14:38 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 19:14:38 +0100
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <loom.20080307T160014-878@post.gmane.org>
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>

Hey Matt,

Your question is a little beyond the scope of this mailing list. I don't
know what your bioinformatics background is, but in my experience it's best
to get started hands-on, either in a class or with someone you can sit down
and work through it with. You'll have a million questions, and a mailing
list isn't really suitable for that.

That being said, I would run the blasts on the command-line, parse out the
best hits with BioPerl, and then use hashes to identify mutual best hits.

Briefly, you have two datasets A & B. Format each dataset into a blast
database using xdformat or formatdb. Run two blasts, one with A as query and
B as database and then one with B as query and A as database. The two output
files, each containing multiple Blast reports, can then be processed with
Bio::SearchIO to extract the best hit for each protein.

Read this tutorial for help with that:
http://www.bioperl.org/wiki/HOWTO:SearchIO

Once you get the best hit for each protein, then you can use Perl to find
every instance where two proteins, one from each set, are each other's best
hit. One way would be to create two hashes, one for each set, with query
proteins as keys and best hits as values, and then step through to find the
reciprocal bests.


Dave


From jay at jays.net  Fri Mar  7 13:51:35 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 12:51:35 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
References: <loom.20080307T160014-878@post.gmane.org>
	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
Message-ID: <47D18EB7.3060906@jays.net>

Dave Messina wrote:
> Your question is a little beyond the scope of this mailing list. I don't
> know what your bioinformatics background is, but in my experience it's best
> to get started hands-on, either in a class or with someone you can sit down
> and work through it with. You'll have a million questions, and a mailing
> list isn't really suitable for that.
>
> That being said, I would run the blasts on the command-line, parse out the
> best hits with BioPerl, and then use hashes to identify mutual best hits.
>   

Hi Matt,

If you're a glutton for punishment and want to see a ball of Perl that 
automates and tracks stats across my version of "reciprocal blasts" 
(mine is called cross_blast()), help yourself:

   svn checkout svn://vc.jays.net/seqlab seqlab

I abandoned my maiden voyage into bioinformatics, called "SeqLab," as a 
stand-alone entity when the subsequent thousand tasks I worked turned 
out to be unrelated to all the software I had built so far. My naive 
grand unification vision for all of bioinformatics didn't quite work out 
as I had planned. -laugh-  Nowadays I just cherry-pick solutions out of 
its guts on demand.  :)

I'm happy to field any questions you have about that code, if it helps 
you any.

Cheers,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Fri Mar  7 14:43:03 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 13:43:03 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <47D18EB7.3060906@jays.net>
References: <loom.20080307T160014-878@post.gmane.org>	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
	<47D18EB7.3060906@jays.net>
Message-ID: <47D19AC7.1060907@jays.net>

Jay Hannah wrote:
> I'm happy to field any questions you have about that code, if it helps 
> you any.

I created a wiki page since I stopped paying the bill on the 
"seqlab.net" domain:   :)

http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

Cheers,

j


From cain.cshl at gmail.com  Fri Mar  7 15:17:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 15:17:29 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <1204921049.6467.9.camel@frissell>

Hi Chris,

Thanks much for this.  I have one observation though: both the http and
ftp directories are empty except for a log file :-/

Also, I saw that you mentioned the 'accept the defaults' option I asked
about in January.  I did implement that on Build.PL at the exact time
that the transition from cvs to svn was happening, so I never got
committed back.  Hopefully I still have it :-)  I'll look around and
commit it when I find it.

Scott

On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
> am every night.
> 
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> 
> The archives are date-stamped and also have the Subversion revision,  
> just in case one wanted to ensure they get the correct version for the  
> bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
> (if there are any).  These are currently derived off the anon. svn  
> repository.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Fri Mar  7 15:25:01 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 14:25:01 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204921049.6467.9.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
Message-ID: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>

I was testing a few things earlier using 'Build dist' which tanked the  
old archives.  I reran the script manually so everything should be up  
now.

If you have the default setting implemented for Build.PL that would be  
great.  There is a lingering minor issue with Data::Dumper error  
output via perl 5.10, but beyond that it should be fine.

chris

On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:

> Hi Chris,
>
> Thanks much for this.  I have one observation though: both the http  
> and
> ftp directories are empty except for a log file :-/
>
> Also, I saw that you mentioned the 'accept the defaults' option I  
> asked
> about in January.  I did implement that on Build.PL at the exact time
> that the transition from cvs to svn was happening, so I never got
> committed back.  Hopefully I still have it :-)  I'll look around and
> commit it when I find it.
>
> Scott
>
> On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
>> We now have nightly bundled archives for bioperl-live, bioperl-db,
>> bioperl-run, and bioperl-network running; these will be updated ~  
>> 1:00
>> am every night.
>>
>> http://www.bioperl.org/DIST/nightly_builds/
>> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>>
>> The archives are date-stamped and also have the Subversion revision,
>> just in case one wanted to ensure they get the correct version for  
>> the
>> bug fix.  They also contain a CHANGELOG file for the last 10  
>> revisions
>> (if there are any).  These are currently derived off the anon. svn
>> repository.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From N.Haigh at sheffield.ac.uk  Fri Mar  7 16:01:43 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri,  7 Mar 2008 21:01:43 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
	<BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
Message-ID: <1204923703.47d1ad37a614a@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

-- snip --

> 
> I'll leave the nightly build setup as is for now and work on getting  
> Build.PL working (something we need anyway for Devel::Cover and  
> Pod::Coverage work).
> 

One of the test metrics of Devel::Cover is Pod::Coverage ....no need to have a seperate Pod::Coverage test :o)

Nath


From cain.cshl at gmail.com  Fri Mar  7 17:25:53 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 17:25:53 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
Message-ID: <1204928753.6467.19.camel@frissell>

OK, I added my 'accept the defaults' option.  Use it like this:

  perl Build.PL --accept 1

Scott

On Fri, 2008-03-07 at 14:25 -0600, Chris Fields wrote:
> I was testing a few things earlier using 'Build dist' which tanked the  
> old archives.  I reran the script manually so everything should be up  
> now.
> 
> If you have the default setting implemented for Build.PL that would be  
> great.  There is a lingering minor issue with Data::Dumper error  
> output via perl 5.10, but beyond that it should be fine.
> 
> chris
> 
> On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:
> 
> > Hi Chris,
> >
> > Thanks much for this.  I have one observation though: both the http  
> > and
> > ftp directories are empty except for a log file :-/
> >
> > Also, I saw that you mentioned the 'accept the defaults' option I  
> > asked
> > about in January.  I did implement that on Build.PL at the exact time
> > that the transition from cvs to svn was happening, so I never got
> > committed back.  Hopefully I still have it :-)  I'll look around and
> > commit it when I find it.
> >
> > Scott
> >
> > On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> >> We now have nightly bundled archives for bioperl-live, bioperl-db,
> >> bioperl-run, and bioperl-network running; these will be updated ~  
> >> 1:00
> >> am every night.
> >>
> >> http://www.bioperl.org/DIST/nightly_builds/
> >> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> >>
> >> The archives are date-stamped and also have the Subversion revision,
> >> just in case one wanted to ensure they get the correct version for  
> >> the
> >> bug fix.  They also contain a CHANGELOG file for the last 10  
> >> revisions
> >> (if there are any).  These are currently derived off the anon. svn
> >> repository.
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From n.haigh at sheffield.ac.uk  Sat Mar  8 07:55:39 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sat, 08 Mar 2008 12:55:39 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>	<47D119A1.10408@sendu.me.uk>	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>	<47D14CC0.8000104@sendu.me.uk>	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>	<47D15731.2050000@sendu.me.uk>	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D28CCB.50507@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:

- -- snip --

> 1)  I am running a 'svn co' on anon. svn for the various distros to a
> temp directory.  This is done using a bash script.  If I attempt to
> change into the distribution directory and run 'perl Build.PL' from the
> bash script, I immediately run into permissions issues and several odd
> things:
> 

- -- snip --

Hi Chris,

Do you need to do any svn commands after the checkout? If not, you can do "svn export" instead:
http://svnbook.red-bean.com/en/1.0/re10.html

This basically recursively gets the URL specified without the .svn dirs. However, you then won't be able to run any svn commands on it, as it won't be
a working copy....save bandwidth and possible post processing to delete all the .svn dirs.

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH0ozL9gTv6QYzVL4RAkvcAJ9eSosx3+YWfbBg/KT6+HZrbweGSgCguLCe
ZYtTxSi5q6iiR+sVGDQEZ68=
=uFNP
-----END PGP SIGNATURE-----


From nm249 at cornell.edu  Sat Mar  8 11:48:44 2008
From: nm249 at cornell.edu (Naama Menda)
Date: Sat, 08 Mar 2008 11:48:44 -0500
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <A042F0A0-F007-45C2-A838-7666A85F5836@cornell.edu>

Hi Hilmar,

I have a loading script that uses Bio::Ontology::OntologyI  for  
parsing  obo files and loading terms  into chado schema.
I'm trying to find all relationship types, and it seems that the  
parser looks at the distinct relationship types used by the terms in  
the file,
but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
This is important for storing the relationships in the right  
context , for example all relationships types defined by Sequence  
Ontology should be stored in
the chado schema using the SO cv_id, while other relationship types,  
not defined as Typedef in the obo file, should be stored using the  
'relationship' cv_id.
Without a way to parse Typedefs, I also cannot use Bio::Ontology for  
parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).

Is there another function in Bio::Ontology that handles Typedefs? If  
not can one be added?

Thanks!
-Naama Menda


From bix at sendu.me.uk  Sat Mar  8 18:30:40 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 08 Mar 2008 23:30:40 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204928753.6467.19.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<1204921049.6467.9.camel@frissell>	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>
Message-ID: <47D321A0.9010209@sendu.me.uk>

Scott Cain wrote:
> OK, I added my 'accept the defaults' option.  Use it like this:
> 
>   perl Build.PL --accept 1

Thanks for that Scott, but can you revert and have another go at that 
commit, because you ended up wiping out the recent commits by Chris and 
myself.

Also, rather than individually alter the Bioperl-specific methods like 
choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
perhaps by overriding prompt() itself? Other questions may get added in 
the future, and some existing questions aren't immediately obvious, so 
it would be nice to be sure an automated process like a cron job /never/ 
gets asked a question.


From hlapp at gmx.net  Sun Mar  9 17:37:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 17:37:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <47D2C36C.2020802@cornell.edu>
References: <47D2C36C.2020802@cornell.edu>
Message-ID: <DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>

Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing  
rather than skipping those could be added to the code; I also once  
started and almost completed a project to integrated the go-perl .obo  
parser into the Bio::OntologyIO framework, but the final touches fell  
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and  
traversing it in some way, go-perl might have all you need. If you  
need more than that, could you elaborate?

	-hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

> Hi Hilmar,
>
> I have a loading script that uses Bio::Ontology::OntologyI  for  
> parsing  obo files and loading terms  into chado schema.
> I'm trying to find all relationship types, and it seems that the  
> parser looks at the distinct relationship types used by the terms  
> in the file,
> but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
> This is important for storing the relationships in the right  
> context , for example all relationships types defined by Sequence  
> Ontology should be stored in
> the chado schema using the SO cv_id, while other relationship  
> types, not defined as Typedef in the obo file, should be stored  
> using the 'relationship' cv_id.
> Without a way to parse Typedefs, I also cannot use Bio::Ontology  
> for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
>
> Is there another function in Bio::Ontology that handles Typedefs?  
> If not can one be added?
>
> Thanks!
> -Naama Menda

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Sun Mar  9 21:34:05 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 21:34:05 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <48F99F4E-F17B-4000-8460-9F2CB9E0D75A@gmail.com>

My main problem is that go-perl does not handle updates, so if I want  
to update GO I need an empty schema. We find it more complicated to  
re-load our annotations than to update cvterms and their related data.
Our loading script compares an existing load of an ontology to the  
obo file and updates/insets/deletes accordingly.
We are now in the process of committing this code to GMOD, and  
thought this will be a good opportunity for adding the Typedef  
parsing option.

Thanks,
-Naama


On Sun, Mar 9, 2008 at 5:37 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing
rather than skipping those could be added to the code; I also once
started and almost completed a project to integrated the go-perl .obo
parser into the Bio::OntologyIO framework, but the final touches fell
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and
traversing it in some way, go-perl might have all you need. If you
need more than that, could you elaborate?

        -hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

 > Hi Hilmar,
 >
 > I have a loading script that uses Bio::Ontology::OntologyI  for
 > parsing  obo files and loading terms  into chado schema.
 > I'm trying to find all relationship types, and it seems that the
 > parser looks at the distinct relationship types used by the terms
 > in the file,
 > but not at the ' [Typedef] ' fields (I used 'get_predicate_terms 
()' ).
 > This is important for storing the relationships in the right
 > context , for example all relationships types defined by Sequence
 > Ontology should be stored in
 > the chado schema using the SO cv_id, while other relationship
 > types, not defined as Typedef in the obo file, should be stored
 > using the 'relationship' cv_id.
 > Without a way to parse Typedefs, I also cannot use Bio::Ontology
 > for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
 >
 > Is there another function in Bio::Ontology that handles Typedefs?
 > If not can one be added?
 >
 > Thanks!
 > -Naama Menda

--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Mar  9 22:13:15 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:13:15 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
Message-ID: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>


On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:

> My main problem is that go-perl does not handle updates, so if I  
> want to update GO I need an empty schema. We find it more  
> complicated to re-load our annotations than to update cvterms and  
> their related data.
> Our loading script compares an existing load of an ontology to the  
> obo file and updates/insets/deletes accordingly.

load_ontology.pl in bioperl-db should have all this functionality,  
though of course that doesn't give you the typedef support (yet).

>
> We are now in the process of committing this code to GMOD

Cool - obviously load_ontology.pl doesn't work off of Chado but  
instead uses BioSQL as the schema (though the ontology model is  
*very* similar between the two).

BTW please keep the Bioperl list in the loop, others may have insight  
too or be interested in the information.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Mar  9 22:43:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:43:13 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
Message-ID: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>


On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
>> My main problem is that go-perl does not handle updates, so if I  
>> want to update GO I need an empty schema. We find it more  
>> complicated to re-load our annotations than to update cvterms and  
>> their related data.
>> Our loading script compares an existing load of an ontology to the  
>> obo file and updates/insets/deletes accordingly.
>
> load_ontology.pl in bioperl-db should have all this functionality,  
> though of course that doesn't give you the typedef support (yet).
>
> Will you add this support to obo.pm? I had a look at it and it  
> seems easy to implement.
> Will there be a patch? Or in the next Bioperl release?

If you have ideas for how to implement this we'd be thrilled if you  
can provide a patch.

Most changes in BioPerl happen because and by people who have an itch  
to scratch. Seems like this one is right down your alley?

I'd in principle be interested in doing this too but can't give any  
promises as to when I might have time (unless I need it myself :)

>
>
>>
>> We are now in the process of committing this code to GMOD
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but  
> instead uses BioSQL as the schema (though the ontology model is  
> *very* similar between the two).
>
> We  store ontologies in Chado, and that was the reason for writing  
> a new loader.

Looking at it it seems you wrote a whole new language binding? Did  
you find it too difficult to build on one of the existing ones (which  
use Class::DBI if I recall correctly, though Scott will have the  
details here)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Sun Mar  9 22:26:04 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 22:26:04 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
Message-ID: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
> My main problem is that go-perl does not handle updates, so if I want to
> update GO I need an empty schema. We find it more complicated to re-load our
> annotations than to update cvterms and their related data.
> Our loading script compares an existing load of an ontology to the obo
> file and updates/insets/deletes accordingly.
>
>
> load_ontology.pl in bioperl-db should have all this functionality, though
> of course that doesn't give you the typedef support (yet).
>

Will you add this support to obo.pm? I had a look at it and it seems easy to
implement.
Will there be a patch? Or in the next Bioperl release?

>
>
> We are now in the process of committing this code to GMOD
>
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> uses BioSQL as the schema (though the ontology model is *very* similar
> between the two).
>

We  store ontologies in Chado, and that was the reason for writing a new
loader.

>
> BTW please keep the Bioperl list in the loop, others may have insight too
> or be interested in the information.
>
> -hilmar
>
> --
>
Thanks!
-Naama

> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


From akarger at CGR.Harvard.edu  Mon Mar 10 09:33:49 2008
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 10 Mar 2008 09:33:49 -0400
Subject: [Bioperl-l] Reciprocal blast
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <72AF30DC2881964CB911FD08E57157E7367BD5@lsdiv-msxbe-001.nucleus.harvard.edu>

<plug>
There's a cut & paste protocol for Reciprocal best hit blast at

http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/Pro
tocols/Sequences.html

Let me know if you need to tweak things.
</plug>

-Amir Karger 

> -----Original Message-----
> From: Matt [mailto:matthewehodges at gmail.com] 
> Sent: Friday, March 07, 2008 11:17 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Reciprocal blast
> 
> Dear experts,
> 
> I want to do a best reciprocal blastp of a fasta protein 
> dataset against the 
> protein models of various species also in fasta format. The 
> aim is o have an 
> output showing presence/not presence. I think this is 
> possible to do using 
> perl, but i'm very much a beginner so any help in this would 
> be greatly 
> appreciated.
> Thanks
> Matt
> 
> 
> 


From Daniel.Gerlach at medecine.unige.ch  Mon Mar 10 12:13:39 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Mon, 10 Mar 2008 17:13:39 +0100
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
Message-ID: <47D55E33.8060205@medecine.unige.ch>

Dear all,

This is a very basic question. I have a tree object in $tree and want to 
save its newick representation in a variable as a string:

my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Unfortunately this does not work and he prints out the newick tree on 
stdout plus the message "Use of uninitialized value in print at ...". He 
also prints out the tree on the stdout if I remove the line "print 
$tree_string". The variable $tree_string seems to be empty.

D.


From naama.menda at gmail.com  Mon Mar 10 11:09:12 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Mon, 10 Mar 2008 11:09:12 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
Message-ID: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> >
> > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
> >
> > My main problem is that go-perl does not handle updates, so if I want to
> > update GO I need an empty schema. We find it more complicated to re-load our
> > annotations than to update cvterms and their related data.
> > Our loading script compares an existing load of an ontology to the obo
> > file and updates/insets/deletes accordingly.
> >
> >
> > load_ontology.pl in bioperl-db should have all this functionality,
> > though of course that doesn't give you the typedef support (yet).
> >
>
> Will you add this support to obo.pm? I had a look at it and it seems easy
> to implement.
> Will there be a patch? Or in the next Bioperl release?
>
>
> If you have ideas for how to implement this we'd be thrilled if you can
> provide a patch.
>
> Most changes in BioPerl happen because and by people who have an itch to
> scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any
> promises as to when I might have time (unless I need it myself :)
>

I'll try to provide a patch for this. I'll let you know how it goes..

>
>
>
> >
> > We are now in the process of committing this code to GMOD
> >
> >
> > Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> > uses BioSQL as the schema (though the ontology model is *very* similar
> > between the two).
> >
>
> We  store ontologies in Chado, and that was the reason for writing a new
> loader.
>
>
> Looking at it it seems you wrote a whole new language binding? Did you
> find it too difficult to build on one of the existing ones (which use
> Class::DBI if I recall correctly, though Scott will have the details here)
>

We already had most of the classes. We use these for other code at SGN,
since it's all OO perl. All we needed to do is to add some methods and
accessors to  our Chado classes, and  write a loading script that stores the
ontology from the db and the ontology from the file in hashrefs, compare the
2 and insert/update accordingly. Our main concern was for updating
pre-loaded ontologies (for new ontology files GMOD's make-ontologies works
great!)


> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> -Naama


From clarsen at vecna.com  Mon Mar 10 11:56:55 2008
From: clarsen at vecna.com (Christopher Larsen)
Date: Mon, 10 Mar 2008 11:56:55 -0400 (EDT)
Subject: [Bioperl-l] Reciprocal best blast hits / Orthology
Message-ID: <49819.64.47.82.110.1205164615.squirrel@mail.vecna.com>

Matt, Dave,

Regarding reciprocal best blast hit, yes -- its beyond the list and heres
how/ where to go. It seems what you are looking for is actually an
Ortholog search. If so there is more to it than reciprocity and
ranking--other groups are using phylo trees and bootstrap values etc.

Perhaps check out the perl written up by David Roos and Chris Stockerts
work: OrthoMCL. Their group is quite helpful as well.

http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi

The perl install goes local and will help you to identify a homolog that
should have the same enzyme function or cellular role. Importantly it
tends to the idea of paralogy and pseudogenes as well so you dont step
into a pit. The site explains more.

The point is, you're on the right track, but theres a group that's been
through what you are doing and can supply you with a working
implementation thats very robust and uses BioPerl modules already, so you
don't have to scratch up some code. Also you can check out INPARANOID for
the same reasons. Having just been through this, I'm just trying to lead
to you where we went. Right now we point OrthoMCL at a whole folder of
proteomes (*.faa) and it groups them accordingly.

Brian O: Don't know if this folds well into your MCL wiki page or not,
apologies.

If the group wants some post-processing code that shows the
presence/absence of proteins in any one group perhaps we can help too as
there are a few things written that take the raw output directly.

Cheers,

Chris L

=========================
Message: 5

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the
protein models of various species also in fasta format. The aim is o have an
output showing presence/not presence. I think this is possible to do using
perl, but i'm very much a beginner so any help in this would be greatly
appreciated.

Thanks

Matt

-- 
Christopher Larsen, Ph.D.
Senior Scientist
Research Grants Manager
Vecna Technologies
5004 Lehigh Ave
College Park, MD 20740
240-737-1625


From Kevin.M.Brown at asu.edu  Mon Mar 10 12:17:11 2008
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Mar 2008 09:17:11 -0700
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <1A4207F8295607498283FE9E93B775B4048A0825@EX02.asurite.ad.asu.edu>

You need to either pass in a FileHandle or a path to an output file else
you are going to see the behavior you are getting.

open my $tree_string, ">TreeFile.txt";
my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');

OR

my $out = new Bio::TreeIO(-file => "TreeFile.txt", -format => 'newick');

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Daniel Gerlach
> Sent: Monday, March 10, 2008 9:14 AM
> To: bioperl-l at portal.open-bio.org
> Subject: [Bioperl-l] Bio::TreeIO - tree object to string
> 
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree 
> and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print 
> at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.
> 
> D.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Mon Mar 10 12:51:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 10 Mar 2008 16:51:59 +0000
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <47D5672F.6000709@sendu.me.uk>

Daniel Gerlach wrote:
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.

The -fh argument is supposed to be a file handle, not a string. You can 
use whatever standard Perl method you like for attaching a filehandle to 
a scalar.

Eg.
my $tree_string = '';
open(my $fake_fh, "+<", \$tree_string);
my $out = new Bio::TreeIO(-fh => $fake_fh, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Alternatively, my $tree_string = $tree->simplify_to_leaves_string() 
might give you want you want.


From stephan.rosecker at ish.de  Mon Mar 10 12:27:57 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Mon, 10 Mar 2008 17:27:57 +0100
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
Message-ID: <web-15326627@mail-be-01.mail01.ish.de>

Dear list,

I try to understand how to fetch unigene-cluster with help of bio-db and a local 
biosql-db, but without success.
I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".

#!/usr/local/bin/perl -w

use strict;
use Bio::DB::BioDB;
use Bio::DB::Query::BioQuery;

my $db = Bio::DB::BioDB->new(
         -database => 'biosql',
         -user   => 'postgres',
         -pass   => 'foo',
         -dbname   => 'bioseqdb',
         -host     => 'foo.bar',
         -port     => 5435,    # optional
         -driver   => 'Pg'
);

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections(
          ["Bio::PrimarySeqI c::subject",
          "Bio::PrimarySeqI p::object",
         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
$query->where(["p.accession_number = 'Hs.2'"]);

my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
my $qres = $adp->find_by_query($query);
my $qres2 = $adp2->find_by_query($query);

while(my $pseq = $qres->next_object()) {
         print $pseq->accession_number,?\n?;
}
while(my $pseq = $qres2->next_object()) {
         print $pseq->accession_number,?\n?;
}

Maybe this way is wrong.
Hope you can help me.

stephan
  

From hlapp at gmx.net  Mon Mar 10 22:53:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 22:53:47 -0400
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
In-Reply-To: <web-15326627@mail-be-01.mail01.ish.de>
References: <web-15326627@mail-be-01.mail01.ish.de>
Message-ID: <A95440B6-AFFA-4F45-B90F-B74D3C0770C7@gmx.net>

Stephan -

what is the result that you are getting? Do you receive an error? Or  
simply nothing?

(BTW note that the object adaptor that you let the query execute will  
determine what you kind of object you get in return. Hence, I'd  
expect your $qres2 to return Bio::ClusterI compliant objects, not  
Bio::PrimarySeqI ones. That is probably not at the root of the  
problem here, though.)

	-hilmar

On Mar 10, 2008, at 12:27 PM, stephan.rosecker wrote:

> Dear list,
>
> I try to understand how to fetch unigene-cluster with help of bio- 
> db and a local biosql-db, but without success.
> I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use Bio::DB::BioDB;
> use Bio::DB::Query::BioQuery;
>
> my $db = Bio::DB::BioDB->new(
>         -database => 'biosql',
>         -user   => 'postgres',
>         -pass   => 'foo',
>         -dbname   => 'bioseqdb',
>         -host     => 'foo.bar',
>         -port     => 5435,    # optional
>         -driver   => 'Pg'
> );
>
> my $query = Bio::DB::Query::BioQuery->new();
>
> $query->datacollections(
>          ["Bio::PrimarySeqI c::subject",
>          "Bio::PrimarySeqI p::object",
>         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
> $query->where(["p.accession_number = 'Hs.2'"]);
>
> my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
> my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
> my $qres = $adp->find_by_query($query);
> my $qres2 = $adp2->find_by_query($query);
>
> while(my $pseq = $qres->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
> while(my $pseq = $qres2->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
>
> Maybe this way is wrong.
> Hope you can help me.
>
> stephan
>  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Mar 10 23:17:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 23:17:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
	<e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
Message-ID: <D355F5D0-756C-4CED-A341-496B23DD5A8B@gmx.net>


On Mar 10, 2008, at 11:09 AM, Naama Menda wrote:

>
>
> On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>>
>> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>>
>>> My main problem is that go-perl does not handle updates, so if I  
>>> want to update GO I need an empty schema. We find it more  
>>> complicated to re-load our annotations than to update cvterms and  
>>> their related data.
>>> Our loading script compares an existing load of an ontology to  
>>> the obo file and updates/insets/deletes accordingly.
>>
>> load_ontology.pl in bioperl-db should have all this functionality,  
>> though of course that doesn't give you the typedef support (yet).
>>
>> Will you add this support to obo.pm? I had a look at it and it  
>> seems easy to implement.
>> Will there be a patch? Or in the next Bioperl release?
>
> If you have ideas for how to implement this we'd be thrilled if you  
> can provide a patch.
>
> Most changes in BioPerl happen because and by people who have an  
> itch to scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any  
> promises as to when I might have time (unless I need it myself :)
>
> I'll try to provide a patch for this. I'll let you know how it goes..

That'd be awesome! Don't hesitate to let us know if you hit bumps.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From stephan.rosecker at ish.de  Tue Mar 11 07:08:08 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Tue, 11 Mar 2008 12:08:08 +0100
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
Message-ID: <web-15341068@mail-be-01.mail01.ish.de>

Dear list,

I have started the "bp_load_seqdatabase.pl" script from the "bioperl-db-1.5.2_100" 
package with the unigene
"Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS is postgres on a 
similar machine.
BioSQL core schema is v1.0.0..

The job runs since friday.

./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb --dbuser foo --dbpass 
bar --driver Pg --format ClusterIO::unigene ../ncbi/Hs.data

Is it normal that it takes so long?
What are your experiences?

best regards
stephan


From diriano at uni-potsdam.de  Tue Mar 11 07:55:54 2008
From: diriano at uni-potsdam.de (=?UTF-8?B?RGllZ28gTWF1cmljaW8gUmlhwpbDsW8gUGFjaMOzwpdu?=)
Date: Tue, 11 Mar 2008 12:55:54 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <47D6734A.5060103@uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using 
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => 
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
   my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
   my $blastresult=$writertxt->to_string($result);
   while (my $hit = $result->next_hit){
     print $hit->name."\n";
   }
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diriano.vcf
Type: text/x-vcard
Size: 324 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080311/85980f6c/attachment-0003.vcf>

From diriano at uni-potsdam.de  Tue Mar 11 09:31:35 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 14:31:35 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205242295.47d689b7186ad@webmail.uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n";
  }
}
-- 

Diego Mauricio Ria?o-Pach?n
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From diriano at uni-potsdam.de  Tue Mar 11 10:25:19 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 15:25:19 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205245519.47d6964fcaa48@webmail.uni-potsdam.de>

Dear all,

Please excuse my previous e-mail, it was incomplete, here it is again:

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n"; #I will do further processing of the HSPs
  }
}

But I do not get any output.  It works if I comment the lines referencing the
$writertxt.  The problem is that I need to extract the whole report to later
insert it ($blastresult) into a database. But I also need to process each hit
and hsp.

Any idea how can I accomplish this?  Any help will be greatly appreciated.

Have a nice day,

Diego
-- 

Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From sac at bioperl.org  Tue Mar 11 15:04:48 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 11 Mar 2008 12:04:48 -0700
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
Message-ID: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>

Ditto. Thanks for biting the bullet, Hilmar.

BTW, I put together a little compendium/review of various bioinformatics
data models a few months ago, where I mention BioSQL among others, but I
never really announced it:

http://biodatamodel.org/

It thought about wikifying it to get the community involved in maintaining
it, but haven't gotten around to it yet.

Feedback is welcome.

Cheers,
Steve

On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:

> Same here.  Great news!
>
> chris
>
> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>
> > BIOSQL V1.0.0 RELEASED
> > http://news.open-bio.org/archives/2008_03.html#000094
> >
> >
> > Congratulations, Hilmar!
> >
> >               -Heikki
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cain.cshl at gmail.com  Tue Mar 11 15:29:41 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 11 Mar 2008 15:29:41 -0400
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D321A0.9010209@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>  <47D321A0.9010209@sendu.me.uk>
Message-ID: <1205263781.6220.37.camel@frissell>

Hi Sendu,

Sorry about that; I diffed the version I had with what was in svn, but
apparently didn't look closely at the results.  Do you happen to know
the best way of reverting with svn?

After it gets reverted one way or the other, I agree that overriding
prompt to include another argument is a good way to go.  That way the
value of $accept can be passed to it and it just does the right thing,
regardless of when/where it is getting called.  I'll do that.

Scott

On Sat, 2008-03-08 at 23:30 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > OK, I added my 'accept the defaults' option.  Use it like this:
> > 
> >   perl Build.PL --accept 1
> 
> Thanks for that Scott, but can you revert and have another go at that 
> commit, because you ended up wiping out the recent commits by Chris and 
> myself.
> 
> Also, rather than individually alter the Bioperl-specific methods like 
> choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
> perhaps by overriding prompt() itself? Other questions may get added in 
> the future, and some existing questions aren't immediately obvious, so 
> it would be nice to be sure an automated process like a cron job /never/ 
> gets asked a question.
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hlapp at gmx.net  Tue Mar 11 17:34:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 11 Mar 2008 17:34:30 -0400
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
In-Reply-To: <web-15341068@mail-be-01.mail01.ish.de>
References: <web-15341068@mail-be-01.mail01.ish.de>
Message-ID: <B4761ACF-F3D2-42A4-88D9-6C9790C4BAFB@gmx.net>

It won't be fast, as it will create about ~6 Mln bioentries in your  
database. However, it running since Friday sounds on the high end.

The first step I recommend doing when running into this kind of  
situation is checking the CPU load that the script generates,  
compared to the load generated by the database server. If the  
script's CPU load is significantly less than ~10% then it is likely  
that your database is too slow.

There are various possible reasons why it may be too slow, ranging  
from limited resources, to grossly suboptimal configuration. If your  
database is running on the same 15GB server then resources should not  
be an issue (assuming that you don't have a totally antiquated CPU  
there). You might still want to check the PostgreSQL config file,  
though. What I would suspect though is that you didn't VACUUM the  
database before and/or during the load. That will make the indexes  
used for lookup increasingly slow as a large amount of data accumulates.

Does this ring a bell?

	-hilmar

On Mar 11, 2008, at 7:08 AM, stephan.rosecker wrote:

> Dear list,
>
> I have started the "bp_load_seqdatabase.pl" script from the  
> "bioperl-db-1.5.2_100" package with the unigene
> "Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS  
> is postgres on a similar machine.
> BioSQL core schema is v1.0.0..
>
> The job runs since friday.
>
> ./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb -- 
> dbuser foo --dbpass bar --driver Pg --format ClusterIO::unigene ../ 
> ncbi/Hs.data
>
> Is it normal that it takes so long?
> What are your experiences?
>
> best regards
> stephan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From vuhlhorn at ramapo.edu  Wed Mar 12 14:51:50 2008
From: vuhlhorn at ramapo.edu (Victoria Lyn Uhlhorn)
Date: Wed, 12 Mar 2008 14:51:50 -0400 (EDT)
Subject: [Bioperl-l] HOWTO:Trees module
Message-ID: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>

The following script:

#!/usr/bin/perl -w

use CGI ':standard';
use Bio::Perl;
use Bio::Align::ProteinStatistics;
use Bio::Tree::DistanceFactory;
use Bio::TreeIO;

print header;

print start_html(-bgcolor=>"pink", -title=>('Phylogenetic Tree'), -style=>{-
src=>$style}, -class=>Ltitle), p(), 'Tree';

print start_form, hr;

my $alnio= Bio::AlignIO->new(-file => '/Users/glitterchix4u/Sites/CGI-
bin/HepatitisSerineProt.clustalw',
                             -format => 'clustalw');
my $profactory = Bio::Tree::DistanceFactory->new(-mnethod => 'NJ');
my $stats = Bio::Align::ProteinStatistics->new;
my $treeout = Bio::TreeIO->new(-format => 'newick');

my $tree;
while(my $aln = $alnio->next_aln)
{
	my $mat = $stats->distance(-method => 'Kimura',
	                           -align => $aln);
	$tree = $profactory->make_tree($mat);
	$treeout->write_tree($tree);
}

#$treeout->print_tree($tree);
print "Tree is: ", $tree->size;

print end_form;
print end_html;

How do I print the tree?  I'm having a hard time printing the tree out.


From bix at sendu.me.uk  Wed Mar 12 19:20:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 12 Mar 2008 23:20:01 +0000
Subject: [Bioperl-l] HOWTO:Trees module
In-Reply-To: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
References: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
Message-ID: <47D86521.1010705@sendu.me.uk>

Victoria Lyn Uhlhorn wrote:
> my $treeout = Bio::TreeIO->new(-format => 'newick');

> 	$treeout->write_tree($tree);

> How do I print the tree?  I'm having a hard time printing the tree out.

Your TreeIO will write its trees to the file or filehandle you give it. 
But you haven't given it one. Give it one are write_tree() will then 
cause the tree to be 'printed' there.

If you want the trees stored in a string so you can print() them, there 
are ways to open a filehandle onto a scalar variable.


From hlapp at gmx.net  Thu Mar 13 18:51:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 18:51:13 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
Message-ID: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>

(this is more of a bioperl question than a biosql one)

The load_ncbi_taxonomy.pl script is designed to update the taxon  
tables in a non-disruptive way, and if there weren't many changes  
shouldn't actually take that long (except that recalculating the  
nested set values may take a couple of minutes).

Bioperl-db will store the taxon information it finds in the  
Bio::Species object if it can't locate the taxon by lookup, and will  
not raise an error. The problem with this is that it relies on the  
Bio::SeqIO parser to have gotten the species and lineage information  
correct, which is sometimes a wrong assumption for exotic species.  
Most often the error will not manifest itself at the time of storing  
the erroneously parsed information, but when it is re-retrieved and  
used to populate a Bio::Species object.

For the SymAtlas project we had this situation (new species in  
sequence updates that the last NCBI taxonomy update hadn't yet  
brought in) quite regularly. I wrote a SQL script would fix those  
'haphazard' additions such that load_ncbi_taxonomy would update them  
to their correct values come the next NCBI taxonomy update. I can  
send you the script (it would be for the Oracle version), but I'm not  
sure this is a widely viable strategy.

	-hilmar

On Mar 13, 2008, at 11:06 AM, Peter wrote:

> Dear list,
>
> One of the unresolved issues with Biopython's BioSQL interface is
> dealing with the NCBI taxon ID when loading sequences into the
> database.
>
> As I understand it, ideally before loading any sequences, the user
> will have loaded in the entire NCBI taxonomy using the
> load_ncbi_taxonomy.pl script, as I described here:
> http://biopython.org/wiki/BioSQL#NCBI_Taxonomy
>
> When a new sequence is added to the database with a known taxon id,
> there is no problem.  But happens if its a recently sequenced organism
> which isn't defined yet in the BioSQL taxonomy tables?  Could/should
> the user re-run load_ncbi_taxonomy.pl, and then load in their new
> sequence?
>
> Right now in Biopython due what appears to have been intended as a
> short term hack, we simple don't record the taxon id at all (!), and I
> would like to fix this (bug 2422).
> http://bugzilla.open-bio.org/show_bug.cgi?id=2422
>
> How do BioPerl et al deal with this issue?  Do they try and update the
> taxonomy tables using the available information in the new record's
> annotation (i.e. the new taxon id and the species name)?  Do they
> lookup the NCBI taxonomy definition via the internet?  Do they throw
> an error and halt?
>
> Thanks,
>
> Peter
> (Biopython)
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Mar 13 19:41:43 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 19:41:43 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
Message-ID: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>


On Mar 13, 2008, at 7:13 PM, Peter wrote:

> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> [...]
>>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>>  tables in a non-disruptive way, and if there weren't many changes
>>  shouldn't actually take that long (except that recalculating the
>>  nested set values may take a couple of minutes).
>
> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> could write some minimal taxonomy entry (without any guess work based
> on the species name), in order to record the sequence's taxon

This is what Bioperl-db does. There isn't any guesswork. If  
Bio::Species has lineage information it will also insert the lineage  
information, though.

> - and then running an improved load_ncbi_taxonomy.pl at a later  
> date would
> sort out the proper taxonomy?

If I remember correctly, the script makes (and hence expects) the  
primary key and the NCBI taxonomy ID to be identical. If your loading  
procedure can achieve that already then load_ncbi_taxonomy.pl should  
pick them up and fix them. You can try that by loading the taxonomy  
through the script, then arbitrarily choose a taxon, create a stub  
bioentry for it and set its taxon_id foreign key to the chosen  
taxon,  change its taxon_name.name to some bogus value (for the  
'scientific name' class, for example) (and feel free to change the  
left_id and right_id values in taxon too), and rerun the script. It  
should fix the change you made, and your bioentry should still point  
to the same taxon (because its primary key did not change, and did  
not get deleted either; otherwise the bioentry would now have a null  
value in the foreign key).

The Bioperl-db way of storing things does not give control over  
primary key assignment to Bioperl-db, so the database will assign it.

> [...]
>>  For the SymAtlas project we had this situation (new species in
>>  sequence updates that the last NCBI taxonomy update hadn't yet
>>  brought in) quite regularly. I wrote a SQL script would fix those
>>  'haphazard' additions such that load_ncbi_taxonomy would update them
>>  to their correct values come the next NCBI taxonomy update. I can
>>  send you the script (it would be for the Oracle version), but I'm  
>> not
>>  sure this is a widely viable strategy.
>
> So this wasn't integrated with load_ncbi_taxonomy.pl at all?

No, but now that you say it I don't see any reason why I couldn't.  
Maybe that's just what I should do.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Thu Mar 13 19:13:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Mar 2008 23:13:32 +0000
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
Message-ID: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>

On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> (this is more of a bioperl question than a biosql one)

Well, yes and no.  And I'm not subscribed to the Bioperl list, nor the
BioJava one, nor the BioRuby one.

>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>  tables in a non-disruptive way, and if there weren't many changes
>  shouldn't actually take that long (except that recalculating the
>  nested set values may take a couple of minutes).

Do you think when faced with a novel taxon id, Biopython/BioPerl/...
could write some minimal taxonomy entry (without any guess work based
on the species name), in order to record the sequence's taxon - and
then running an improved load_ncbi_taxonomy.pl at a later date would
sort out the proper taxonomy?

>  Bioperl-db will store the taxon information it finds in the
>  Bio::Species object if it can't locate the taxon by lookup, and will
>  not raise an error. The problem with this is that it relies on the
>  Bio::SeqIO parser to have gotten the species and lineage information
>  correct, which is sometimes a wrong assumption for exotic species.
>  Most often the error will not manifest itself at the time of storing
>  the erroneously parsed information, but when it is re-retrieved and
>  used to populate a Bio::Species object.

This is what I would like to avoid with Biopython.

>  For the SymAtlas project we had this situation (new species in
>  sequence updates that the last NCBI taxonomy update hadn't yet
>  brought in) quite regularly. I wrote a SQL script would fix those
>  'haphazard' additions such that load_ncbi_taxonomy would update them
>  to their correct values come the next NCBI taxonomy update. I can
>  send you the script (it would be for the Oracle version), but I'm not
>  sure this is a widely viable strategy.

So this wasn't integrated with load_ncbi_taxonomy.pl at all?

Peter


From hlapp at gmx.net  Fri Mar 14 00:00:40 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Mar 2008 00:00:40 -0400
Subject: [Bioperl-l] bioperl basics
In-Reply-To: <20080313.195825.6855.0@webmail20.vgs.untd.com>
References: <20080313.195825.6855.0@webmail20.vgs.untd.com>
Message-ID: <0A104B1F-315F-418C-A6DA-84FF04CC438C@gmx.net>

John - you want to send this to the BioPerl mailing list at
bioperl-l at lists.open-bio.org.

Your problem really is a Perl problem though, not BioPerl. The most  
likely cause is that you don't have the Cache::FileCache module  
installed, so that's what I would do.

The answer to your question for how to change @INC is using -I on the  
command line, 'use lib' in your script, or set the PERL5LIB  
environment variable.

	-hilmar

On Mar 14, 2008, at 1:58 AM, mrphysh at juno.com wrote:

> I am a molecular biologist studying bioinformatics from a Perl  
> background and making progress.  I am realizing that without  
> tapping into the existing infrastructure, I will be writing code  
> for ever.  Bioperl is the path for me.  I am moving forward.
>
> the error I encounter is
>
> can't locate Cache/FileCache in @INC (@INC contains /etc/perl/ /usr/ 
> locaql/lib/perl/5.8.8 .....)    and so forth.
>
> I found the files in a home directory.  I must have told the  
> install to put them there...?
>
>
> anyway:  How do I edit this environmental variable..... @INC.  I  
> cannot find anything in my book.
>
> thanks
> john brigham
>
>
> I will be writing code for years and need to tap into the
> _____________________________________________________________
> Need cash? Click to get an emergency loan, bad credit ok
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> Ioyw6i3mKmyQsg01zMPK1Qa0178ZfajwTEBgEXdzlmb9zLLZc8pLOU/
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From markjschreiber at gmail.com  Fri Mar 14 09:48:38 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 14 Mar 2008 21:48:38 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
Message-ID: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>

>From memory BioJava will add it if it is not already in there. If the
taxid can be found then the system connects you with whatever is in
that taxid, it doesn't overwrite it.

This has two curious side effects. Because the details associated with
a taxid sometimes change (eg common name changes a lot) you can get
connected to an outdated version (if your record is newer than your
NCBI taxonomy) or you can get connected with a version that is newer
than your record which means when you round-trip you don't get
complete identity.

For compatibility across the projects some kind of consensus would be good.

- Mark

On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
>
> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>
> > On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >> [...]
>
> >>  The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>  tables in a non-disruptive way, and if there weren't many changes
> >>  shouldn't actually take that long (except that recalculating the
> >>  nested set values may take a couple of minutes).
> >
> > Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> > could write some minimal taxonomy entry (without any guess work based
> > on the species name), in order to record the sequence's taxon
>
> This is what Bioperl-db does. There isn't any guesswork. If
> Bio::Species has lineage information it will also insert the lineage
> information, though.
>
>
> > - and then running an improved load_ncbi_taxonomy.pl at a later
> > date would
> > sort out the proper taxonomy?
>
> If I remember correctly, the script makes (and hence expects) the
> primary key and the NCBI taxonomy ID to be identical. If your loading
> procedure can achieve that already then load_ncbi_taxonomy.pl should
> pick them up and fix them. You can try that by loading the taxonomy
> through the script, then arbitrarily choose a taxon, create a stub
> bioentry for it and set its taxon_id foreign key to the chosen
> taxon,  change its taxon_name.name to some bogus value (for the
> 'scientific name' class, for example) (and feel free to change the
> left_id and right_id values in taxon too), and rerun the script. It
> should fix the change you made, and your bioentry should still point
> to the same taxon (because its primary key did not change, and did
> not get deleted either; otherwise the bioentry would now have a null
> value in the foreign key).
>
> The Bioperl-db way of storing things does not give control over
> primary key assignment to Bioperl-db, so the database will assign it.
>
> > [...]
>
> >>  For the SymAtlas project we had this situation (new species in
> >>  sequence updates that the last NCBI taxonomy update hadn't yet
> >>  brought in) quite regularly. I wrote a SQL script would fix those
> >>  'haphazard' additions such that load_ncbi_taxonomy would update them
> >>  to their correct values come the next NCBI taxonomy update. I can
> >>  send you the script (it would be for the Oracle version), but I'm
> >> not
> >>  sure this is a widely viable strategy.
> >
> > So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>
> No, but now that you say it I don't see any reason why I couldn't.
> Maybe that's just what I should do.
>
>        -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
>
>
>
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>


From cjfields at uiuc.edu  Fri Mar 14 10:31:09 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 14 Mar 2008 09:31:09 -0500
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
Message-ID: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>

The counter to that perspective (using new sequences with old tax  
info) would be to regularly update NCBI taxonomy, particularly in  
circumstances prior to adding new sequences.  Hilmar mentioned that  
once tax is loaded it doesn't take as long to update, so you could set  
up a cron job to update regularly.

I remember someone mentioning weekly or monthly updates on the list  
quite a while ago, but I'm unsure how often NCBI updates tax  
information (i.e. with every release, monthly, weekly, etc).  I can  
see instances popping up where you used the an up-to-date taxonomy but  
a new sequence contains a tax ID not present.  I think bioperl-db  
handles these but I'm not sure what other Bio* do.

chris

On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:

>> From memory BioJava will add it if it is not already in there. If the
> taxid can be found then the system connects you with whatever is in
> that taxid, it doesn't overwrite it.
>
> This has two curious side effects. Because the details associated with
> a taxid sometimes change (eg common name changes a lot) you can get
> connected to an outdated version (if your record is newer than your
> NCBI taxonomy) or you can get connected with a version that is newer
> than your record which means when you round-trip you don't get
> complete identity.
>
> For compatibility across the projects some kind of consensus would  
> be good.
>
> - Mark
> On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>>
>> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>>
>>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>> [...]
>>
>>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
>>>> tables in a non-disruptive way, and if there weren't many changes
>>>> shouldn't actually take that long (except that recalculating the
>>>> nested set values may take a couple of minutes).
>>>
>>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
>>> could write some minimal taxonomy entry (without any guess work  
>>> based
>>> on the species name), in order to record the sequence's taxon
>>
>> This is what Bioperl-db does. There isn't any guesswork. If
>> Bio::Species has lineage information it will also insert the lineage
>> information, though.
>>
>>
>>> - and then running an improved load_ncbi_taxonomy.pl at a later
>>> date would
>>> sort out the proper taxonomy?
>>
>> If I remember correctly, the script makes (and hence expects) the
>> primary key and the NCBI taxonomy ID to be identical. If your loading
>> procedure can achieve that already then load_ncbi_taxonomy.pl should
>> pick them up and fix them. You can try that by loading the taxonomy
>> through the script, then arbitrarily choose a taxon, create a stub
>> bioentry for it and set its taxon_id foreign key to the chosen
>> taxon,  change its taxon_name.name to some bogus value (for the
>> 'scientific name' class, for example) (and feel free to change the
>> left_id and right_id values in taxon too), and rerun the script. It
>> should fix the change you made, and your bioentry should still point
>> to the same taxon (because its primary key did not change, and did
>> not get deleted either; otherwise the bioentry would now have a null
>> value in the foreign key).
>>
>> The Bioperl-db way of storing things does not give control over
>> primary key assignment to Bioperl-db, so the database will assign it.
>>
>>> [...]
>>
>>>> For the SymAtlas project we had this situation (new species in
>>>> sequence updates that the last NCBI taxonomy update hadn't yet
>>>> brought in) quite regularly. I wrote a SQL script would fix those
>>>> 'haphazard' additions such that load_ncbi_taxonomy would update  
>>>> them
>>>> to their correct values come the next NCBI taxonomy update. I can
>>>> send you the script (it would be for the Oracle version), but I'm
>>>> not
>>>> sure this is a widely viable strategy.
>>>
>>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>>
>> No, but now that you say it I don't see any reason why I couldn't.
>> Maybe that's just what I should do.
>>
>>       -hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>>
>>
>>
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From markjschreiber at gmail.com  Fri Mar 14 20:56:37 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 15 Mar 2008 08:56:37 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
Message-ID: <93b45ca50803141756m3d7f022cnb57bd39f37270682@mail.gmail.com>

I agree. A regular update would be best.

Of course if your BioSQL db is limited to one or a few organisms you can
just keep a fragment of the db.

- Mark

On Fri, Mar 14, 2008 at 10:31 PM, Chris Fields <cjfields at uiuc.edu> wrote:

> The counter to that perspective (using new sequences with old tax
> info) would be to regularly update NCBI taxonomy, particularly in
> circumstances prior to adding new sequences.  Hilmar mentioned that
> once tax is loaded it doesn't take as long to update, so you could set
> up a cron job to update regularly.
>
> I remember someone mentioning weekly or monthly updates on the list
> quite a while ago, but I'm unsure how often NCBI updates tax
> information (i.e. with every release, monthly, weekly, etc).  I can
> see instances popping up where you used the an up-to-date taxonomy but
> a new sequence contains a tax ID not present.  I think bioperl-db
> handles these but I'm not sure what other Bio* do.
>
> chris
>
> On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:
>
> >> From memory BioJava will add it if it is not already in there. If the
> > taxid can be found then the system connects you with whatever is in
> > that taxid, it doesn't overwrite it.
> >
> > This has two curious side effects. Because the details associated with
> > a taxid sometimes change (eg common name changes a lot) you can get
> > connected to an outdated version (if your record is newer than your
> > NCBI taxonomy) or you can get connected with a version that is newer
> > than your record which means when you round-trip you don't get
> > complete identity.
> >
> > For compatibility across the projects some kind of consensus would
> > be good.
> >
> > - Mark
> > On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>
> >>
> >> On Mar 13, 2008, at 7:13 PM, Peter wrote:
> >>
> >>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>>> [...]
> >>
> >>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>>> tables in a non-disruptive way, and if there weren't many changes
> >>>> shouldn't actually take that long (except that recalculating the
> >>>> nested set values may take a couple of minutes).
> >>>
> >>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> >>> could write some minimal taxonomy entry (without any guess work
> >>> based
> >>> on the species name), in order to record the sequence's taxon
> >>
> >> This is what Bioperl-db does. There isn't any guesswork. If
> >> Bio::Species has lineage information it will also insert the lineage
> >> information, though.
> >>
> >>
> >>> - and then running an improved load_ncbi_taxonomy.pl at a later
> >>> date would
> >>> sort out the proper taxonomy?
> >>
> >> If I remember correctly, the script makes (and hence expects) the
> >> primary key and the NCBI taxonomy ID to be identical. If your loading
> >> procedure can achieve that already then load_ncbi_taxonomy.pl should
> >> pick them up and fix them. You can try that by loading the taxonomy
> >> through the script, then arbitrarily choose a taxon, create a stub
> >> bioentry for it and set its taxon_id foreign key to the chosen
> >> taxon,  change its taxon_name.name to some bogus value (for the
> >> 'scientific name' class, for example) (and feel free to change the
> >> left_id and right_id values in taxon too), and rerun the script. It
> >> should fix the change you made, and your bioentry should still point
> >> to the same taxon (because its primary key did not change, and did
> >> not get deleted either; otherwise the bioentry would now have a null
> >> value in the foreign key).
> >>
> >> The Bioperl-db way of storing things does not give control over
> >> primary key assignment to Bioperl-db, so the database will assign it.
> >>
> >>> [...]
> >>
> >>>> For the SymAtlas project we had this situation (new species in
> >>>> sequence updates that the last NCBI taxonomy update hadn't yet
> >>>> brought in) quite regularly. I wrote a SQL script would fix those
> >>>> 'haphazard' additions such that load_ncbi_taxonomy would update
> >>>> them
> >>>> to their correct values come the next NCBI taxonomy update. I can
> >>>> send you the script (it would be for the Oracle version), but I'm
> >>>> not
> >>>> sure this is a widely viable strategy.
> >>>
> >>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
> >>
> >> No, but now that you say it I don't see any reason why I couldn't.
> >> Maybe that's just what I should do.
> >>
> >>       -hilmar
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >>
> >> BioSQL-l mailing list
> >> BioSQL-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From charles-listes+bioperl at plessy.org  Mon Mar 17 00:13:11 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Mon, 17 Mar 2008 13:13:11 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
Message-ID: <20080317041311.GA3784@kunpuu.plessy.org>

Dear Bioperl developpers,

In the Debian Project, we distribute packages for Bioperl and need to
express their dependancy to other Perl modules with "Depends",
"Recommends" and "Suggests" levels. For the moment, everything that is
listed in the "recommends" hash of Build.PL is "Recommended" by our
Debian package. This means that they will be installed by default when
installing Bioperl, but that users can force their removal if needed.

Being "Recommended" also means in Debian that if the recommended module
is not available, then the Debian bioperl package will not reach our
internal quality criteria for being part of our stable release.
Therefore I would like to know if you think that some of the modules
recommeded by Bioperl through the "recommends" hash of Build.PL are less
important than others, i.e. that we can just "Suggest" them in our
dependancy system. "Suggested" packages are not installed by default.

The complete definition of the meaning of "Depends", "Recommends" and
"Suggests" for Debian packages can be found in the section 7.2 of the
Debian policy:

http://www.debian.org/doc/debian-policy/ch-relationships.html

Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
using 1.5.2 in its next stable release. We welcome your comments on this
as well.

The Debian package for Bioperl 1.4: http://packages.debian.org/lenny/bioperl
and for Bioperl 1.5.2: http://packages.debian.org/sid/bioperl

(A copy of this email has been sent to the mailing list of the
Debian-Med project).

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wak?, Saitama, Japan


From David.Messina at sbc.su.se  Mon Mar 17 11:38:28 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 17 Mar 2008 16:38:28 +0100
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <20080317041311.GA3784@kunpuu.plessy.org>
References: <20080317041311.GA3784@kunpuu.plessy.org>
Message-ID: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>

Hi Charles,

Thanks for your note.

All of the BioPerl 'recommended' modules involve optional functionality, so
I would think all of them would map to 'suggested' under Debian so they
won't be installed by default.

For everyone else, this is the list of recommended modules he's talking
about:
Ace
Class::AutoClass
Clone
Convert::Binary::C
Data::Stag::XMLWriter
GD
GD::SVG
Graph
HTML::Entities
HTML::Parser
HTTP::Request::Common
LWP::UserAgent
PostScript::TextBlock
Set::Scalar
SOAP::Lite
Spreadsheet::ParseExcel
Storable
SVG
SVG::Graph
Text::Shellwords
URI::Escape
XML::DOM::XPath
XML::Parser
XML::Parser::PerlSAX
XML::SAX
XML::SAX::Writer
XML::Twig
XML::Writer


Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
> using 1.5.2 in its next stable release. We welcome your comments on this
> as well.
>

I think the consensus here would be that 1.5.2 is the appropriate version of
Bioperl to use in the next stable release of Debian. Although we've started
to work toward Bioperl 1.6, that release will be at least a few months off,
and 1.4, while technically our most recent 'stable' release, is waaay out of
date.


Dave


From mrphysh at juno.com  Mon Mar 17 18:27:21 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Mon, 17 Mar 2008 22:27:21 GMT
Subject: [Bioperl-l] bioperl email list
Message-ID: <20080317.162721.27257.1@webmail19.vgs.untd.com>

Hello bioperl people.

I am a Perl programmer/molecular biologist/nice guy.  I am wandering around within the bioinformatics arena and making progress.

I am realizing that I will be writing code forever unless I can tap into the existing infrastructure.  For me that appears to be bioperl.

I would like to be part of the bioperl community.  I subscribed to the bioperl list and got a return email, but have never received an email.  Did I do something wrong?  Could you look into this please?

The truth is:  these object are blowing me away and I need help.

John S. Brigham
13810 Braun Drive
Golden, Colorado 80401
303-216-0994
mrphysh2juno.com 
_____________________________________________________________
Make money while staying at home.  Click here for information on top-notch home businesses.
http://thirdpartyoffers.juno.com/TGL2121/fc/Ioyw6i3l5e2nlxbjZebFePkm5lBTaKaRlgHsk8Xt4yjn3c9lhQhoRW/


From hlapp at gmx.net  Mon Mar 17 23:44:11 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 17 Mar 2008 23:44:11 -0400
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
Message-ID: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>

On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:

> Hi Charles,
>
> Thanks for your note.
>
> All of the BioPerl 'recommended' modules involve optional  
> functionality, so
> I would think all of them would map to 'suggested' under Debian so  
> they
> won't be installed by default.

I would probably elevate LWP to 'recommended.' Other than that I agree.

	-hilmar

>
> For everyone else, this is the list of recommended modules he's  
> talking
> about:
> Ace
> Class::AutoClass
> Clone
> Convert::Binary::C
> Data::Stag::XMLWriter
> GD
> GD::SVG
> Graph
> HTML::Entities
> HTML::Parser
> HTTP::Request::Common
> LWP::UserAgent
> PostScript::TextBlock
> Set::Scalar
> SOAP::Lite
> Spreadsheet::ParseExcel
> Storable
> SVG
> SVG::Graph
> Text::Shellwords
> URI::Escape
> XML::DOM::XPath
> XML::Parser
> XML::Parser::PerlSAX
> XML::SAX
> XML::SAX::Writer
> XML::Twig
> XML::Writer
>
>
>
> Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
>> using 1.5.2 in its next stable release. We welcome your comments  
>> on this
>> as well.
>>
>
> I think the consensus here would be that 1.5.2 is the appropriate  
> version of
> Bioperl to use in the next stable release of Debian. Although we've  
> started
> to work toward Bioperl 1.6, that release will be at least a few  
> months off,
> and 1.4, while technically our most recent 'stable' release, is  
> waaay out of
> date.
>
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Mar 18 05:29:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 09:29:10 +0000
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
References: <20080317041311.GA3784@kunpuu.plessy.org>	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
Message-ID: <47DF8B66.8020509@sendu.me.uk>

Hilmar Lapp wrote:
> On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:
> 
>> Hi Charles,
>>
>> Thanks for your note.
>>
>> All of the BioPerl 'recommended' modules involve optional 
>> functionality, so
>> I would think all of them would map to 'suggested' under Debian so they
>> won't be installed by default.
> 
> I would probably elevate LWP to 'recommended.' Other than that I agree.

I looked at the most used external modules. Used 6 times or more:

   Data::Dumper => used 55 times
   Carp => used 51 times
   IO::String => used 25 times
   Symbol => used 19 times
   File::Spec => used 17 times
   HTTP::Request::Common => used 17 times
   POSIX => used 12 times
   DB_File => used 11 times
   Fcntl => used 11 times
   IO::File => used 11 times
   Exporter => used 10 times
   File::Temp => used 9 times
   Dumpvalue => used 8 times
   LWP::UserAgent => used 8 times
   Scalar::Util => used 8 times
   URI::Escape => used 8 times
   File::Basename => used 6 times
   File::Path => used 6 times
   XML::Writer => used 6 times

I can never remember how to figure out which of those is included with 
perl 5.6.1.

Except maybe XML::Writer, if we do want to promote anything to 
recommended, I suppose it would be those above.

I also agree with everything Dave said; if it's easier everything can be 
'suggested'.


(I reckon most if not all of the Data::Dumper and Carp usages should be 
removed)


From David.Messina at sbc.su.se  Tue Mar 18 10:30:02 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 15:30:02 +0100
Subject: [Bioperl-l] bioperl email list
In-Reply-To: <20080317.162721.27257.1@webmail19.vgs.untd.com>
References: <20080317.162721.27257.1@webmail19.vgs.untd.com>
Message-ID: <628aabb70803180730g2aae7ae0u18a9e6a360c725fa@mail.gmail.com>

Hi John,

Welcome to BioPerl!


>  I subscribed to the bioperl list and got a return email, but have never
> received an email.


As I'm sure you know, most of the time when email doesn't show up, it's
because it's been filtered as spam. I assuming you looked for this already
though.

If that's not it, then I suggest trying to log in to the mailing list server
here:
http://bioperl.org/mailman/listinfo/bioperl-l

Log in by entering your email address (the one you subscribed with) in the
last field on that page and clicking the "Unsubscribe or edit options"
button.

On the next page that comes up, type your password in the first field on the
page.

If for some reason your haven't been subscribed to the list properly, then
you will get an error here. Otherwise, you will be taken to your membership
configuration page. There you can verify. among other options, that mail
delivery is enabled.


The truth is:  these object are blowing me away and I need help.


BioPerl does have a bit of a learning curve, but fortunately there are some
good tutorials that should help you to get started. If you haven't already,
visit the HOWTO section of bioperl.org. Check out the one on BioPerl for
beginners, and then you might follow up with the SeqIO and SearchIO HOWTOs
which cover how to read and write sequences and sequence alignment program
output.

Also, there's lots of great example code in the examples folder of the
BioPerl distribution. I find looking at how other people use BioPerl is very
helpful in understanding what objects are used for what.

Finally, I'll plug the BioPerl Deobfuscator, which is a class browser for
BioPerl and available at: http://bioperl.org/cgi-bin/deob_interface.cgi

BioPerl classes tend to have multlple levels of inheritance, and the
Deobfuscator lets you see all of the methods available to objects of a given
class.


Dave


From bix at sendu.me.uk  Tue Mar 18 11:32:25 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 15:32:25 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
Message-ID: <47DFE089.1070304@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
>> Or is the split intended to be 'core' == "anything and everything
>> that was in 1.4", '????' == "everything else"? In which case,
>> what's a good name for "modules created after 1.4"? 'crust'? ;)
> 
> Nah, "icing".
> 
> a module "use" map might be very useful to help identify "core" vs.
> other layers of mantle/crust/icing.
> 
> http://www.perlmonks.org/?node_id=87329 
> http://search.cpan.org/src/NEILB/pmusage-1.2/

Thanks for those. Neither could quite cope with BioPerl, but I've munged
them together and hacked up 'module_usage.pl' which I've just committed
to the maintenance directory of bioperl-live.

module_usage.pl ../Bio

Produces:
  *warning, may crash your browser; download it and view in a dedicated
image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg
http://bix.sendu.me.uk/files/module_usage.txt

First I considered what modules each BioPerl package (aka class, module)
'uses' (what modules does it load via 'use', 'require' or inherit from
via 'use base', excluding external (non-BioPerl) modules), then grouped
together packages that have identical usage. The graph shows all the
groups with more than one member as nodes and edges from them pointing
to the individual packages that they use. The set of those individual
packages pointed to by groups also have edges showing their
use-relationship to other members of the set (only). Members of the set
are also shaded in red. The saturation of the shade indicates how many
packages use that package (so dark red packages are used a lot).

(I had to simplify in this way because otherwise GraphViz bailed on me.
If anyone can come with nicer simplification/visualisation systems,
please do! It's important to note that there is lots of information loss
in my scheme, so you can't rely on the graph alone.)

Getting to the question on how to decide what is 'core' and on what
basis to split things up, first consider the darker red packages. Next
consider how many groups point to it. Finally consider the membership of
those groups: are they all highly related, or are they from different
'parts' of BioPerl?

For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
pointing to it, but all the members of those groups are
Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
Bio::Graphics?) could be split off cleanly if desired and not kept in
core. Bio::SimpleAlign, on the other hand, whilst not being quite as
dark a red, has 7 attached groups with members from Bio::AlignIO,
Bio::Search and Bio::Tools. You could easily argue it is more
fundamental to BioPerl and should be in core. In turn, the things that
Bio::SimpleAlign points to would also have to be in core.

I haven't done any full analysis along these lines and leave as an
exercise for the interested reader for now ;)


Chris Fields wrote:
> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> 
> I'm pretty flexible on any of that; it's a proposal only and I think
> some of it may be wrongheaded, but hey, I'm willing to take a few
> rotten tomatoes.  The key issue is we should try to work out what we
> mean by 'core' or the core library.  I have a rather extreme view of
> it as being the bare essentials without external, non-perl core
> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> and required modules for those classes) but I'm sure others would
> lump in parsers, DB functionality, etc.  I basically suggest placing
> those (and any stable but potentially non-core code) in a
> 'bioperl-main', with any unstable or untested code going into a
> 'bioperl-unstable'.

My thoughts are along these lines:
# I agree that core should have no external dependencies
# I agree that it might mostly be interfaces
# It should represent a framework with all the interfaces (that have
   stable APIs), directory structure and base classes that everything
   else relies on
# It might not do much useful bioinformatics, but provides just about
   everything needed for a dev to create a new module that does


> In essence, bioperl-main would require core and resemble a stable
> release; bioperl-unstable would require bioperl-main (and core) and
> resemble a dev release.  Not sure how versioning would go or if this
> is a viable option at all, but it's worth discussing.

# I agree that this 3-way split seems reasonable
# bioperl-main would consist primarily of the 'leaves' of the module
   tree, mostly parsers and the like which, whilst 'stable' and tested
   should still be split away from core because the data sources they
   parse could change format slightly
# bioperl-unstable, better bioperl-bleed, would feature brand-new
   stuff, be it new parsers for totally new formats, new APIs that do
   something not thought of before etc. When they are complete, bug-free
   and have stood the test of time they get moved into bioperl-main.
   (It is not a place for all new commits; bug fixes to something in
   bioperl-main would be committed to bioperl-main)
# The current splits (bioperl-run, bioperl-network etc.) do not get
   their own core and bleed variant. Anything they need for core
   functionality would enter the single bioperl-core, anything new
   would enter the single bioperl-bleed, and anything stable would
   be in their own bioperl-[package]

Discuss :)


From snoze.pa at gmail.com  Tue Mar 18 14:27:47 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 13:27:47 -0500
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
Message-ID: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>

Thanks hilmar. I am still wondering if my old problem was fixed. It is
related to when NR databases mixes files from different databases.

On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org> wrote:

> Ditto. Thanks for biting the bullet, Hilmar.
>
> BTW, I put together a little compendium/review of various bioinformatics
> data models a few months ago, where I mention BioSQL among others, but I
> never really announced it:
>
> http://biodatamodel.org/
>
> It thought about wikifying it to get the community involved in maintaining
> it, but haven't gotten around to it yet.
>
> Feedback is welcome.
>
> Cheers,
> Steve
>
> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>
> > Same here.  Great news!
> >
> > chris
> >
> > On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
> >
> > > BIOSQL V1.0.0 RELEASED
> > > http://news.open-bio.org/archives/2008_03.html#000094
> > >
> > >
> > > Congratulations, Hilmar!
> > >
> > >               -Heikki
> > >
> > > --
> > > ______ _/      _/_____________________________________________________
> > >      _/      _/
> > >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> > >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> > >  _/  _/  _/  University of Western Cape, South Africa
> > >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From darin.london at duke.edu  Tue Mar 18 14:16:58 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 18 Mar 2008 13:16:58 -0500
Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200803181816.m2IIGwOL007248@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. Many Open Source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation and dissemination of new computational methods as well as informatics infrastructure. Since the work of the Open Source Bioinformatics Community represents some of the most cutting edge of Bioinformatics in general, the overall theme for the conference this year is "Tackling Hard Problems with Emerging Technologies". Topics under this umbrella include cyberinfrastructure, grid computing and workflow management and discovery, and visualization. We will also have a series of update talks about the main Open Source Bioinformatics Software suites.

One of the hallmarks of BOSC is the coming together of the open source developer community in one location. A face-to-face meeting of this community creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done.

This year, BOSC is accepting abstract submissions on the conference theme "Tackling Hard Problems with Emerging Technologies". The conference theme reflects that there are new technologies emerging on both the scientific front (new sequencing technologies, etc.) and the IT front (workflows, mashup/web 2.0, improvements in all of the major programming languages, etc.), which may allow the open source community to solve problems that were previously intractable. Abstracts may be submitted for the following topics.

1. Cyberinfrastructure - We are interested in presentations on topics dealing with the development of infrastructure on the web to facilitate software and data re-use (mashups, or traditional), interoperability and inter-process communication, system/service discovery, and data movement and modeling in distributed systems. This may include peer-to-peer systems of data transfer, Web Services, various flavors of data representation (SOAP, JSON, XML, others), and technologies commonly referred to under the Web 2.0 paradigm (e.g. folksonomies/tagging, user-based content generation, content feeds, and Social Networking).

2. Grid Computing and Workflow Management and Discovery - We particularly invite talks that report progress in making workflow systems easier to use and on how to do distributed-collaborative research , e.g. workflows that encompass the coordination of systems running in different parts of the world.

3. Visualization - Visualization is a maturing area of open source software development. We particularly invite talks that demonstrate innovative visualization systems in the context of workflows.

4. Open Source Software - Speakers will present talks on the use, development, or philosophy of open source software in bioinformatics.

5. Bio* Open Source Project Updates - We invite abstracts from the representatives of the open source projects sponsored by or affiliated to the O|B|F (see Projects).


Please consult the official BOSC 2008 website at http://www.open-bio.org/wiki/Upcoming_BOSC_conference  for all updates and extra information.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From hlapp at gmx.net  Tue Mar 18 15:07:54 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 18 Mar 2008 15:07:54 -0400
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
	<10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
Message-ID: <0B1635C3-0BD4-449C-9C52-FD8D07E9D669@gmx.net>

Can you point me to the bug report or a mailing list thread?

	-hilmar

On Mar 18, 2008, at 2:27 PM, snoze pa wrote:

> Thanks hilmar. I am still wondering if my old problem was fixed. It is
> related to when NR databases mixes files from different databases.
>
> On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org>  
> wrote:
>
>> Ditto. Thanks for biting the bullet, Hilmar.
>>
>> BTW, I put together a little compendium/review of various  
>> bioinformatics
>> data models a few months ago, where I mention BioSQL among others,  
>> but I
>> never really announced it:
>>
>> http://biodatamodel.org/
>>
>> It thought about wikifying it to get the community involved in  
>> maintaining
>> it, but haven't gotten around to it yet.
>>
>> Feedback is welcome.
>>
>> Cheers,
>> Steve
>>
>> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu>  
>> wrote:
>>
>>> Same here.  Great news!
>>>
>>> chris
>>>
>>> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>>>
>>>> BIOSQL V1.0.0 RELEASED
>>>> http://news.open-bio.org/archives/2008_03.html#000094
>>>>
>>>>
>>>> Congratulations, Hilmar!
>>>>
>>>>               -Heikki
>>>>
>>>> --
>>>> ______ _/      _/ 
>>>> _____________________________________________________
>>>>      _/      _/
>>>>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>>>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>>>   _/  _/  _/  SANBI, South African National Bioinformatics  
>>>> Institute
>>>>  _/  _/  _/  University of Western Cape, South Africa
>>>>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>>>> ___ _/_/_/_/_/ 
>>>> ________________________________________________________
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From snoze.pa at gmail.com  Tue Mar 18 16:33:08 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 15:33:08 -0500
Subject: [Bioperl-l] NCBI taxonomy database
Message-ID: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>

Dear Users,
 How can i use NCBI taxonomy database in bioperl? any suggestions!!!
thanks in advance
s


From aaron.j.mackey at gsk.com  Tue Mar 18 12:23:41 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 18 Mar 2008 12:23:41 -0400
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
Message-ID: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>

Very cool.  I can envision this being printed as a laminated poster to put 
up next to the periodic table of Perl Elements (
http://www.ozonehouse.com/mark/blog/code/PeriodicTable.html)

One GraphViz trick you could try would be to group Bio::X::* (nodes and 
your collection groups sharing common Bio::X:: prefixes) together as 
subgraphs; that should quickly show you which edges go outside of the 
various "domains", and which are entirely self contained.

you could also try to distinguish "use base" relationships (i.e. 
inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
Bio::X" wrapped in an eval (optional use if available) by various edge 
colorings -- this might help to further break things up if we can guess at 
the intended "use" of any Bio::X by Bio::Y.

-Aaron

"Sendu Bala" <bix at sendu.me.uk> wrote on 03/18/2008 11:32:25 AM:

> aaron.j.mackey at gsk.com wrote:
> >> Or is the split intended to be 'core' == "anything and everything
> >> that was in 1.4", '????' == "everything else"? In which case,
> >> what's a good name for "modules created after 1.4"? 'crust'? ;)
> > 
> > Nah, "icing".
> > 
> > a module "use" map might be very useful to help identify "core" vs.
> > other layers of mantle/crust/icing.
> > 
> > http://www.perlmonks.org/?node_id=87329 
> > http://search.cpan.org/src/NEILB/pmusage-1.2/
> 
> Thanks for those. Neither could quite cope with BioPerl, but I've munged
> them together and hacked up 'module_usage.pl' which I've just committed
> to the maintenance directory of bioperl-live.
> 
> module_usage.pl ../Bio
> 
> Produces:
>   *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
> 
> First I considered what modules each BioPerl package (aka class, module)
> 'uses' (what modules does it load via 'use', 'require' or inherit from
> via 'use base', excluding external (non-BioPerl) modules), then grouped
> together packages that have identical usage. The graph shows all the
> groups with more than one member as nodes and edges from them pointing
> to the individual packages that they use. The set of those individual
> packages pointed to by groups also have edges showing their
> use-relationship to other members of the set (only). Members of the set
> are also shaded in red. The saturation of the shade indicates how many
> packages use that package (so dark red packages are used a lot).
> 
> (I had to simplify in this way because otherwise GraphViz bailed on me.
> If anyone can come with nicer simplification/visualisation systems,
> please do! It's important to note that there is lots of information loss
> in my scheme, so you can't rely on the graph alone.)
> 
> Getting to the question on how to decide what is 'core' and on what
> basis to split things up, first consider the darker red packages. Next
> consider how many groups point to it. Finally consider the membership of
> those groups: are they all highly related, or are they from different
> 'parts' of BioPerl?
> 
> For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
> pointing to it, but all the members of those groups are
> Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
> Bio::Graphics?) could be split off cleanly if desired and not kept in
> core. Bio::SimpleAlign, on the other hand, whilst not being quite as
> dark a red, has 7 attached groups with members from Bio::AlignIO,
> Bio::Search and Bio::Tools. You could easily argue it is more
> fundamental to BioPerl and should be in core. In turn, the things that
> Bio::SimpleAlign points to would also have to be in core.
> 
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)
> 
> 
> Chris Fields wrote:
> > http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> > 
> > I'm pretty flexible on any of that; it's a proposal only and I think
> > some of it may be wrongheaded, but hey, I'm willing to take a few
> > rotten tomatoes.  The key issue is we should try to work out what we
> > mean by 'core' or the core library.  I have a rather extreme view of
> > it as being the bare essentials without external, non-perl core
> > dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> > and required modules for those classes) but I'm sure others would
> > lump in parsers, DB functionality, etc.  I basically suggest placing
> > those (and any stable but potentially non-core code) in a
> > 'bioperl-main', with any unstable or untested code going into a
> > 'bioperl-unstable'.
> 
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>    stable APIs), directory structure and base classes that everything
>    else relies on
> # It might not do much useful bioinformatics, but provides just about
>    everything needed for a dev to create a new module that does
> 
> 
> > In essence, bioperl-main would require core and resemble a stable
> > release; bioperl-unstable would require bioperl-main (and core) and
> > resemble a dev release.  Not sure how versioning would go or if this
> > is a viable option at all, but it's worth discussing.
> 
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>    tree, mostly parsers and the like which, whilst 'stable' and tested
>    should still be split away from core because the data sources they
>    parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>    stuff, be it new parsers for totally new formats, new APIs that do
>    something not thought of before etc. When they are complete, bug-free
>    and have stood the test of time they get moved into bioperl-main.
>    (It is not a place for all new commits; bug fixes to something in
>    bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>    their own core and bleed variant. Anything they need for core
>    functionality would enter the single bioperl-core, anything new
>    would enter the single bioperl-bleed, and anything stable would
>    be in their own bioperl-[package]
> 
> Discuss :)
> 


From David.Messina at sbc.su.se  Tue Mar 18 17:23:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 22:23:18 +0100
Subject: [Bioperl-l] NCBI taxonomy database
In-Reply-To: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
References: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
Message-ID: <628aabb70803181423g305db155r9d66c114f38c64b6@mail.gmail.com>

Hi snoze,

I think you will want to take a look at the docs for the Bio::Taxon module
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html


and these scripts:
scripts/taxa/local_taxonomydb_query.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/local_taxonomydb_query.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Script
that accesses a local taxonomy database and retrieves species or
TaxonIDs<http://www.bioperl.org/wiki/TaxonID>.
 scripts/taxa/query_entrez_taxa.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/query_entrez_taxa.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Demonstrate
how to retrieve the NCBI <http://www.bioperl.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species. Also
retrieve
TaxonID <http://www.bioperl.org/wiki/TaxonID> for a given accession
number<http://www.bioperl.org/wiki/Accession_number>.
 scripts/taxa/taxid4species.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/taxid4species.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Retrieve
the NCBI <http://en.wikipedia.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species.

Dave


From alexl at users.sourceforge.net  Wed Mar 19 04:32:38 2008
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 19 Mar 2008 01:32:38 -0700
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk> (Sendu Bala's message of "Tue\,
	18 Mar 2008 15\:32\:25 +0000")
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <42eja7azbt.fsf@allele2.eebweb.arizona.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> # I agree that this 3-way split seems reasonable # bioperl-main
SB> would consist primarily of the 'leaves' of the module tree, mostly
SB> parsers and the like which, whilst 'stable' and tested should
SB> still be split away from core because the data sources they parse
SB> could change format slightly # bioperl-unstable, better
SB> bioperl-bleed, would feature brand-new stuff, be it new parsers
SB> for totally new formats, new APIs that do something not thought of
SB> before etc. When they are complete, bug-free and have stood the
SB> test of time they get moved into bioperl-main.  (It is not a place
SB> for all new commits; bug fixes to something in bioperl-main would
SB> be committed to bioperl-main) # The current splits (bioperl-run,
SB> bioperl-network etc.) do not get their own core and bleed
SB> variant. Anything they need for core functionality would enter the
SB> single bioperl-core, anything new would enter the single
SB> bioperl-bleed, and anything stable would be in their own
SB> bioperl-[package]

SB> Discuss :)

While on the subject of how to split up the bioperl package, spare a
thought for upstream package maintainers.  The Fedora package for the
bioperl "core" that I now maintain is currently a single package which
makes it easy to get reviewed, included in the distribution and
updated/maintained.  (bioperl-run is a separate package).

While I agree that bioperl is now perhaps a little too monolithic, I
thinking splitting it up in a too fine-grained manner like CPAN might
go too far the other way.  For Fedora, each package would then need to
be reviewed and updated separately.  Similar issues might apply for
other distros (such as Debian/Ubuntu).

I think something similar to the three-way split proposed sounds like
a good compromise, so long as everything that a "basic" user of
Bioperl can install most of the functionality in the current "bioperl"
package in (at most) 2-3 packages.  

One model to look at might be the gstreamer model which has a "core"
(gstreamer) and "gstreamer-plugins-base", "gstreamer-plugins-good",
"gstreamer-plugins-bad" and "gstreamer-plugins-ugly" modules for
plugins, see:

http://gstreamer.net/

Alex


From charles-listes+bioperl at plessy.org  Wed Mar 19 06:01:59 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 19 Mar 2008 19:01:59 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <47DF8B66.8020509@sendu.me.uk>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
	<47DF8B66.8020509@sendu.me.uk>
Message-ID: <20080319100159.GD29304@kunpuu.plessy.org>

Le Tue, Mar 18, 2008 at 09:29:10AM +0000, Sendu Bala a ?crit :
> 
> I looked at the most used external modules. Used 6 times or more:
> 
>   Data::Dumper => used 55 times
>   Carp => used 51 times
>   IO::String => used 25 times
>   Symbol => used 19 times
>   File::Spec => used 17 times
>   HTTP::Request::Common => used 17 times
>   POSIX => used 12 times
>   DB_File => used 11 times
>   Fcntl => used 11 times
>   IO::File => used 11 times
>   Exporter => used 10 times
>   File::Temp => used 9 times
>   Dumpvalue => used 8 times
>   LWP::UserAgent => used 8 times
>   Scalar::Util => used 8 times
>   URI::Escape => used 8 times
>   File::Basename => used 6 times
>   File::Path => used 6 times
>   XML::Writer => used 6 times

Dear Sendu,

thanks a lot for this analysis ! We will downgrade all modules except
those you listed to the priority 'Suggested'. In terms of Debian package,
it means keeping only libio-string-perl, libwww-perl, liburi-perl and
libxml-writer-perl in our 'Recommends' field, as the others are provided
by our perl package itself.

Thanks a lot for the advice,

-- 
Charles Plessy
Debian-Med packaging team
Wak?, Saitama, Japan


From bix at sendu.me.uk  Wed Mar 19 09:27:11 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 19 Mar 2008 13:27:11 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
References: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
Message-ID: <47E114AF.8030801@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
> One GraphViz trick you could try would be to group Bio::X::* (nodes and 
> your collection groups sharing common Bio::X:: prefixes) together as 
> subgraphs; that should quickly show you which edges go outside of the 
> various "domains", and which are entirely self contained.

Not quite sure if I used the 'trick' you were thinking of, but I now 
'cluster' them as you describe. It's no longer quite as attractively 
proportioned, but I suppose it's more useful :)


> you could also try to distinguish "use base" relationships (i.e. 
> inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
> Bio::X" wrapped in an eval (optional use if available) by various edge 
> colorings -- this might help to further break things up if we can guess at 
> the intended "use" of any Bio::X by Bio::Y.

I haven't distinguished the eval require cases, but now edges are green 
for inheritance and blue for use/require.


I updated the jpeg:
*warning, may crash your browser; download it and view in a dedicated
  image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg


If someone wants to mess with the script so it will output a sane ps 
file for conversion to pdf, please do so. I can't figure out how to get 
it to work correctly.


From Jorge.DUARTE at biogemma.com  Wed Mar 19 11:32:44 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:32:44 +0100
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
Message-ID: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>

Dear Bioperl-users,

could someone give me a hint on how to find SNPs in alignments using 
bioperl objects ?

I found several modules capable of representing Sequence Variations,
but could not understand how to go from an "Align" object to a "Variation" 
object.

Any help would be much appreciated,

Thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com


From avilella at gmail.com  Wed Mar 19 12:59:47 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 19 Mar 2008 16:59:47 +0000
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
In-Reply-To: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
References: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
Message-ID: <358f4d650803190959h744f33f5ha345897565a071b0@mail.gmail.com>

Hi Jorge,

(good to see an ex-EBI in the bioperl-ml :-) )

You can use the method aln_to_population in Bio::PopGen::Utilities:

my $pop = Bio::PopGen::Utilities->aln_to_population($aln);

http://www.bioperl.org/wiki/HOWTO:PopGen#Allele_data_from_Alignments_using_Bio::AlignIO_and_Bio::PopGen::Utilities

Cheers,

    Albert.

On Wed, Mar 19, 2008 at 3:32 PM,  <Jorge.DUARTE at biogemma.com> wrote:
> Dear Bioperl-users,
>
>  could someone give me a hint on how to find SNPs in alignments using
>  bioperl objects ?
>
>  I found several modules capable of representing Sequence Variations,
>  but could not understand how to go from an "Align" object to a "Variation"
>  object.
>
>  Any help would be much appreciated,
>
>  Thanks,
>
>  Jorge.
>
>  ---
>  Jorge Duarte
>  Bioinformatics Software Engineer
>  BIOGEMMA
>  Z.I. Du Br?zet
>  8, Rue des Fr?res Lumi?re
>  63028 CLERMONT FERRAND Cedex 2
>  FRANCE
>  Tel : +33 (0)4 73 39 60 73
>  Fax : +33 (0)4 73 39 60 71
>  E-mail : jorge.duarte at biogemma.com
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jorge.DUARTE at biogemma.com  Wed Mar 19 11:22:57 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:22:57 +0100
Subject: [Bioperl-l] Using Bioperl book
Message-ID: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>

Hello,

i just found on amazon something about a book "Using Bioperl", published 
on the 1st of March 2008 but which is no more available.

Does anyone know how to get it ?

Many thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com

*****************************************************************
       Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
         dans les destinataires lors du premier contact
*****************************************************************


From jason at bioperl.org  Wed Mar 19 13:54:16 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Mar 2008 10:54:16 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>

it's probably more than 6 months out. We still haven't finished  
writing it as life and work continues to intrude on book writing.

-jason
On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:

> Hello,
>
> i just found on amazon something about a book "Using Bioperl",  
> published
> on the 1st of March 2008 but which is no more available.
>
> Does anyone know how to get it ?
>
> Many thanks,
>
> Jorge.
>
> ---
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
>
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From smarkel at accelrys.com  Wed Mar 19 13:53:53 2008
From: smarkel at accelrys.com (Scott Markel)
Date: Wed, 19 Mar 2008 10:53:53 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <OF4972A434.60C53CF4-ON88257411.006214B5-88257411.00625152@accelrys.com>

Jorge,

This is a book that Jason Stajich, Ewan Birney, and I are writing.
We're behind.  So it's not that the book is no longer available,
but that it's not yet available.  Hopefully later this year or
early in 2009.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Board of Directors: International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


bioperl-l-bounces at lists.open-bio.org wrote on 19.03.2008 08:22:57:

> Hello,
> 
> i just found on amazon something about a book "Using Bioperl", published 

> on the 1st of March 2008 but which is no more available.
> 
> Does anyone know how to get it ?
> 
> Many thanks,
> 
> Jorge.
> 
> --- 
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
> 
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From me at hongyu.org  Thu Mar 20 14:54:53 2008
From: me at hongyu.org (Hongyu Zhang)
Date: Thu, 20 Mar 2008 11:54:53 -0700 (PDT)
Subject: [Bioperl-l] Bio::DB::GenBank module
Message-ID: <501654.68882.qm@web51412.mail.re2.yahoo.com>

Dear all,

It seems that some of the important methods in Bio::DB::GenBank module was discontinued right now, such as get_Seq_by_acc(). The corresponding methods have empty content underneath its names. How come?
 
Best,

Hongyu Zhang, Ph.D.
 Ceres Inc., Thousand Oaks, CA
Cell: 805-405-5394
Fax: 866-447-8750


From joseph.fass at gmail.com  Thu Mar 20 18:10:33 2008
From: joseph.fass at gmail.com (Joseph Fass)
Date: Thu, 20 Mar 2008 15:10:33 -0700
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality?
Message-ID: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>

I've written code to trim a certain number of bases (and, possibly,
associated qualities) from fasta (or fastq) format sequences, using:

$seq->seq($seq->subseq($a+1,$len-$b));
and, if it's fastq:
$seq->qual($seq->subqual($a+1,$len-$b));
where:
$len = $seq->length; # defined before changing $seq->seq
$a is the number of bases to trim off the beginning of the sequence
$b is the number of bases to trim off the end of the sequence

The code works for sequences, but for qualities I get a trimmed series of
quality characters that is the correct length and is at the correct
position, but has a number of characters (equal to $a) at the *end* of the
series changed to '!' ... i.e.:

@fake header 1
tcggacaatatatat
+
fjasfiojeq%!@%@

becomes:

@fake header 1 trimmed by 4 at beginning and 3 at end
acaatata
+fake header 1 trimmed by 4 at beginning and 3 at end
fioj!!!!

Since the relevant section of code is short, I'll post it:

my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format);
my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format);
my $seq_length;
while (my $seq = $in->next_seq()) {
  $seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at
end");
  $seq_length = $seq->length;
  $seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e));
  if ($format eq 'fastq') { # if fastq, trim qualities then write out in
fastq format
    $seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e));
    $out->write_fastq($seq); }
  else {$out->write_seq($seq);} # just write out sequence in fasta format
}

Why should the same process work for ->seq and ->subseq but not ->qual and
->subqual?  Please enlighten me ...


-- 
Joseph Fass
jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional)
970.227.5928 (c) || 530.752.2698 (w)


From hlapp at gmx.net  Thu Mar 20 18:49:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 18:49:41 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
References: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
Message-ID: <0F80B40B-0232-4367-8433-992588B6E71B@gmx.net>

Hi Erik, thanks for the report. Given the error message, it looks  
more like the integer (which in reality is a string) can't be  
automatically converted to a string.

That would be equally interesting, though. DBI I thought used to bind  
all parameters as string by default, but maybe that has changed?

The parameter values are indeed all bound generically (and the query  
is created dynamically too), and I'm leaving it up to the DBD drivers  
to do the "Right Thing". I could obviously force everything into type  
string, but that is likely to have it's own repercussions on various  
RDBMSs.

So could you file this as a bug report on bugzilla.open-bio.org  
(category bioperl-db, this is actually not a BioSQL problem), and run  
the following test on your 8.3 instance (which minor version actually?):

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

SELECT * from t1 WHERE a = 1::text;
SELECT * from t1 WHERE b = 1::text;
SELECT * from t1 WHERE c = integer '1';

DROP TABLE t1;

These work all fine on my 8.1.4 instance.

	-hilmar

On Mar 20, 2008, at 3:24 PM, Erik wrote:
> Hi,
>
> (latest BioSQL, bioperl-db, and bioperl-live installed.)
>
> Postgres 8.3 will not auto-cast text (='character
> varying') to integer any longer, which causes test
> t/16odba.t to fail:
>
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: error while executing query in
> Bio::DB::BioSQL::SeqAdaptor::find_by_query: ERROR:
> operator does not exist: character varying = integer
> LINE 1: ...eq.taxon_id FROM bioentry seq WHERE
> seq.identifier = 5456929
>
> It seems likely to cause many similar statements to fail;
> how should this be solved?
>
> I tried to fix it but I couldn't find the place where the
> statement/clauses are put together.
>
>
> Thanks,
>
> Erik Rijkers
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 19:30:03 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 00:30:03 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
 integer any longer
Message-ID: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>

On Thu, March 20, 2008 23:49, Hilmar Lapp wrote:
> Hi Erik, thanks for the report. Given the error message,
> it looks
> more like the integer (which in reality is a string) can't
> be automatically converted to a string.

you are right, of course :)


Here is the postgres 8.3.1 result of your sql statements:

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

SELECT * from t1 WHERE a = 1::text;     -- ok
SELECT * from t1 WHERE b = 1::text;     -- ok
SELECT * from t1 WHERE c = integer '1'; -- ok

The failure is always (virtually) the same:
ERROR:  operator does not exist: character varying = integer
LINE 1: SELECT * from t1 WHERE a = 1;
                                 ^
HINT:  No operator matches the given name and argument
type(s). You might need to add explicit type casts.


Then there is the cast function: for instance, I can let
the test in t/16odba.t proceed faultlessly with

 $seq = $biodb->get_Seq_by_id( "cast(5456929 as text)" );


I am also doubtful/curious as to how this would affect the
various loading scripts which I was going to use - I want
to set up a GBrowse with human/mouse/flybase sequence
annotation to show ChipSeq data against.

But one thing at a time, I guess...


> So could you file this as a bug report on
> bugzilla.open-bio.org
> (category bioperl-db, this is actually not a BioSQL
> problem),

I'll make an entry in bugzilla/bioperl-db.


Thanks for you quick reply!


Erik Rijkers


From David.Messina at sbc.su.se  Thu Mar 20 19:39:49 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 00:39:49 +0100
Subject: [Bioperl-l] Bio::DB::GenBank module
In-Reply-To: <501654.68882.qm@web51412.mail.re2.yahoo.com>
References: <501654.68882.qm@web51412.mail.re2.yahoo.com>
Message-ID: <628aabb70803201639y33df19a6ib83967c33dd90b7f@mail.gmail.com>

Hi Hongyu,

Those methods are inherited. get_Seq_by_acc(), for example, comes from
Bio::DB::WebDBSeqI.

The BioPerl Deobfuscator is one way to see where the methods a given class
has are actually coded. Here's the Deobfuscator view of
Bio::DB::GenBank.<http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ADB%3A%3AGenBank&sort_order=by+method&search_string=Bio%3A%3ADB%3A%3AGenBank>


Dave


From hlapp at gmx.net  Thu Mar 20 20:34:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 20:34:42 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
References: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
Message-ID: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>


On Mar 20, 2008, at 7:30 PM, Erik wrote:
> Here is the postgres 8.3.1 result of your sql statements:
>
> CREATE TABLE t1 (a varchar(10), b text, c integer);
>
> SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
> SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
> SELECT * from t1 WHERE c = '1'; -- ok
>
> [...]
> The failure is always (virtually) the same:
> ERROR:  operator does not exist: character varying = integer
> LINE 1: SELECT * from t1 WHERE a = 1;
>                                  ^
> HINT:  No operator matches the given name and argument
> type(s). You might need to add explicit type casts.


So it's indeed the backend that changed behavior. It's actually  
documented as I see now:

http://www.postgresql.org/docs/8.3/static/release-8-3.html

scroll to section E.2.2. Migration to Version 8.3, E.2.2.1. General,  
and the first item there:

<quote>
Non-character data types are no longer automatically cast to TEXT  
(Peter, Tom)

Previously, if a non-character value was supplied to an operator or  
function that requires text input, it was automatically cast to text,  
for most (though not all) built-in data types. This no longer  
happens: an explicit cast to text is now required for all non- 
character-string types.
</quote>

I can see the arguments there but this will prevent upgrading to 8.3  
for many many applications, and the comments from the Pg developers  
('fix your SQL to use casts') that I've seen there on the mailing  
lists are just not helpful. Fixing SQL is for many legacy  
applications is just not an option.

In the case of Bioperl-db it's very non-trivial, because all of a  
sudden we would be changing from a hands-off and let-the-driver- 
figure-it-out approach to forcing types everywhere.

So I think at this point with this change I have to declare Bioperl- 
db officially incompatible with PostgreSQL 8.3+ until we've found a  
solution to this, which is too bad because it seems 8.3 has some  
really nice performance features added.

One possible solution might be to create a CAST in the database  
(namely the one that was taken away, restoring behavior to pre-8.3).  
Another possibility is to move the parameter binding method into the  
driver adaptor which would then delegate to the DBI method but would  
be overridden for the PostgreSQL adapter to force all bindings to  
type string.

Which leads me back to the surprise observation that the parameter  
was bound as an integer in the first place, when DBD::Pg used to bind  
everything as string unless you told it otherwise. Which DBD::Pg  
version is it that you are using? I would suspect (or hope) that  
maybe there is soon an update release of DBD::Pg that fixes this  
problem by going back to binding everything as string by default (and  
as the tests show PostgreSQL will still convert strings to integer if  
necessary).

Depending on what I (or can someone else update us on this?) find out  
for the DBD::Pg plans, I'll probably start looking into moving the  
parameter binding into the driver adapters. Though it does feel  
pathetic that this is now also not transparent between drivers.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 20:51:43 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 01:51:43 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
Message-ID: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>

On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>
> So I think at this point with this change I have to
> declare Bioperl-
> db officially incompatible with PostgreSQL 8.3+ until
> we've found a
> solution to this, which is too bad because it seems 8.3
> has some
> really nice performance features added.

Pg 8.3 is indeed very noticably faster, and it has other
excellent new features like full text indexing. (This also
makes that downgrading is not really an option)


> Which DBD::Pg version is it that you are using?

DBD::Pg 2.3.0


Thanks,

Erik Rijkers


From hlapp at gmx.net  Thu Mar 20 21:36:50 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 21:36:50 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
References: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
Message-ID: <071CB899-AB3E-40B8-9477-82AE98DB88B1@gmx.net>


On Mar 20, 2008, at 8:51 PM, Erik wrote:
> On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>>
>> So I think at this point with this change I have to declare  
>> Bioperl-db officially incompatible with PostgreSQL 8.3+ until  
>> we've found a solution to this, which is too bad because it seems  
>> 8.3 has some really nice performance features added.
>
> Pg 8.3 is indeed very noticably faster, and it has other
> excellent new features like full text indexing. (This also
> makes that downgrading is not really an option)

Right, I saw that too. It is, however, just migrated from what was a  
contrib module before, so downgrading and using the contrib module is  
an option.

Furthermore, folding these new features together with a behavior  
change that is backwards incompatible was a choice the PostgreSQL  
people made, not we.

We also aren't doing poor typing that deserves fixing; we're just not  
doing any typing by treating everything as a string. This is the Perl  
paradigm.

At this point it's actually unclear to me how this new behavior is  
compatible with untyped scripting languages unless you know the type  
of each column that you're binding a value for, because if you  
actually force typecasts to string for everything you get an error if  
an integer is indeed what's needed.

I'm wondering what I'm missing.

	-hilmar

BTW what does the following query yield on your 8.3.1 database:

select s.typname as source, t.typname as target, f.proname as  
function, c.castcontextfrom pg_cast c, pg_type s, pg_type t, pg_proc  
f where c.castsource = s.oid and c.casttarget = t.oid and c.castfunc  
= f.oidand t.typname = 'text';

On my 8.1.4 database I get:

   source    | target | function | castcontext
-------------+--------+----------+-------------
  bpchar      | text   | text     | i
  char        | text   | text     | i
  name        | text   | text     | i
  int8        | text   | text     | i
  int2        | text   | text     | i
  int4        | text   | text     | i
  oid         | text   | text     | i
  float4      | text   | text     | i
  float8      | text   | text     | i
  macaddr     | text   | text     | e
  cidr        | text   | text     | e
  inet        | text   | text     | e
  date        | text   | text     | i
  time        | text   | text     | i
  timestamp   | text   | text     | i
  timestamptz | text   | text     | i
  interval    | text   | text     | i
  timetz      | text   | text     | i
  numeric     | text   | text     | i
(19 rows)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From greg at turnstep.com  Thu Mar 20 22:41:10 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Fri, 21 Mar 2008 02:41:10 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>
Message-ID: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Which leads me back to the surprise observation that the parameter
> was bound as an integer in the first place, when DBD::Pg used to bind
> everything as string unless you told it otherwise. Which DBD::Pg
> version is it that you are using? I would suspect (or hope) that
> maybe there is soon an update release of DBD::Pg that fixes this
> problem by going back to binding everything as string by default (and
> as the tests show PostgreSQL will still convert strings to integer if
> necessary).
>
> Depending on what I (or can someone else update us on this?) find out
> for the DBD::Pg plans, I'll probably start looking into moving the
> parameter binding into the driver adapters. Though it does feel
> pathetic that this is now also not transparent between drivers.

What you are probably looking for is already there, namely:

$dbh->{pg_server_prepare} = 0;

There's good reasons for the casting enforcement in 8.3, although I've
been a sharp critic of the change, and certainly of the suddeness
of it. Another solution to consider is adding the casts back in:

http://people.planetpostgresql.org/peter/index.php?/archives/2008/03.html
(the March 4th entry)

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803202237
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfjIBYACgkQvJuQZxSWSsiamwCdEbNrC4F4oU7AGHrbHAm1YNXG
HbUAoIRJtGW4brvMKklxZYG6pusbcTqf
=Zawx
-----END PGP SIGNATURE-----


From David.Messina at sbc.su.se  Fri Mar 21 04:36:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 09:36:16 +0100
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or
	Bio::Seq::SeqWithQuality?
In-Reply-To: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
References: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
Message-ID: <628aabb70803210136p11de495p26d0ffaebbc3370e@mail.gmail.com>

Hi Joseph,

This looks like a bug; I saw the same thing here. Could you please submit
this to the bug tracker <http://www.bioperl.org/wiki/Bugs> along with your
test code?


Thanks,
Dave


From hlapp at gmx.net  Fri Mar 21 08:52:39 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 21 Mar 2008 08:52:39 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
References: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
Message-ID: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>

Hi Greg - thanks for your email, it's very helpful.

On Mar 20, 2008, at 10:41 PM, Greg Sabino Mullane wrote:
>>
>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

So disabling server-side prepares will leave values quoted? Having  
server-side prepares would be very useful though, especially for  
Bioperl-db with its many lookup queries that all use similar  
parameter values.

>
> There's good reasons for the casting enforcement in 8.3

I do understand that, but it's also a sharp contrast to other RDBMSs  
that doesn't it make it easier for people to choose Pg when they  
should, and doesn't help writing cross-platform database applications  
either.

> although I've been a sharp critic of the change, and certainly of  
> the suddeness
> of it. Another solution to consider is adding the casts back in:
>
> http://people.planetpostgresql.org/peter/index.php?/archives/ 
> 2008/03.html
> (the March 4th entry)


Thanks for this, that helps a lot.

Do you have links to some of the key threads showing what rationale  
went into the decision? (Or should I just search for your name?) I'd  
like to read up on that first before pouring more oil into the fire.  
I suspect that many of those who made the decision are never faced  
with needing to write cross-RDBMS code.

Also, I wonder why this wasn't made a configurable option so it can  
be disabled by a simple config file change (such as the move away  
from automatic OID columns). But obviously this is the wrong list for  
discussing this (though Bioperl-db *is* one of those pieces of  
software that must be cross-RDBMS).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From baucom at msg.ucsf.edu  Fri Mar 21 16:13:00 2008
From: baucom at msg.ucsf.edu (Albion Baucom)
Date: Fri, 21 Mar 2008 13:13:00 -0700
Subject: [Bioperl-l] SearchIO Performance
Message-ID: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>

Hi. I am pretty new to BioPerl, and have a question about performance  
with regard to Blast (nucleotide) file parsing. My Blast result files  
usually have close to 100 or more sequence hits. Each sequence is  
about 1400 nucleotides long.

After profiling code I wrote, I find that calling the next_result()  
function after creating a search object takes substantially longer  
than non-OO, quick and dirty code I am using to parse the same Blast  
files.

What is substantially longer? Well, the existing code takes about 0.25  
seconds, and the BioPerl call takes about 4.5 seconds. I find that to  
be a dramatic difference, and that kind of time difference becomes  
significant when I have to parse 30 Blast files in a row. I understand  
that SearchIO is parsing the entire file and storing it all for easy  
retrieval later, and maybe this time penalty is what I have to pay for  
that convenience and organization.

I am just wondering if there is anything other than writing custom  
code based on BioPerl to speed this up. Something I might not be aware  
of that I can do ahead of time, or during parsing, to limit what is  
parsed, or facilitate the parsing process. For instance, is there a  
way to "look ahead" and simply parse alignments that meet a specific  
expectancy cutoff?

I confess I have not read the documentation thoroughly (although  
obviously enough to make it do what I want), but am certainly willing  
to do so if someone can point me in the right direction.

Thanks

Albion


From jason at bioperl.org  Fri Mar 21 17:40:00 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 21 Mar 2008 14:40:00 -0700
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
Message-ID: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>


On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:

> Hi. I am pretty new to BioPerl, and have a question about  
> performance with regard to Blast (nucleotide) file parsing. My  
> Blast result files usually have close to 100 or more sequence hits.  
> Each sequence is about 1400 nucleotides long.
>
> After profiling code I wrote, I find that calling the next_result()  
> function after creating a search object takes substantially longer  
> than non-OO, quick and dirty code I am using to parse the same  
> Blast files.
>
> What is substantially longer? Well, the existing code takes about  
> 0.25 seconds, and the BioPerl call takes about 4.5 seconds. I find  
> that to be a dramatic difference, and that kind of time difference  
> becomes significant when I have to parse 30 Blast files in a row. I  
> understand that SearchIO is parsing the entire file and storing it  
> all for easy retrieval later, and maybe this time penalty is what I  
> have to pay for that convenience and organization.
>
> I am just wondering if there is anything other than writing custom  
> code based on BioPerl to speed this up. Something I might not be  
> aware of that I can do ahead of time, or during parsing, to limit  
> what is parsed, or facilitate the parsing process. For instance, is  
> there a way to "look ahead" and simply parse alignments that meet a  
> specific expectancy cutoff?
>
> I confess I have not read the documentation thoroughly (although  
> obviously enough to make it do what I want), but am certainly  
> willing to do so if someone can point me in the right direction.
>
We are quite aware of the speed issues.  This is discussed on the  
wiki in brief detail.
http://bioperl.org/wiki/Why_BioPerl_is_slow

It boils down to the object creation not the parsing (relatively  
speaking).  It takes a while because we're creating a lot of objects  
under the hood for each alignment.  Sendu has written a pull parser  
that doesn't require creation of all the objects until the user  
requests them.
As I've said in the past, if someone wrote SearchIO event-listener  
that created lightweight objects (or just hashes) instead this would  
also provide a substantial speedup.

In the fall I did some experimentation with array-based instead of  
hash-based feature objects got a pretty decent speedup as well, but  
just haven't had any time to roll out a more substantial  
prototyping.  For the inner-loops of things it may make sense to  
substitute a less-flexible but super-fast object.

I always advocate thinking about what your needs are - if you just  
want start/stop of alignments, you can grab this out of a blast  
format  table with the -m9 (NCBI) or --mformat =3 (WUBLAST) and you  
can write a fast parser that uses 'split'.


> Thanks
>
> Albion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From er at xs4all.nl  Fri Mar 21 17:43:47 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 22:43:47 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
 swissprot
Message-ID: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>

Hi,

PostgreSQL 8.3.1
DBD::Pg 2.3.0
perl 5.8.8

(The following error may have to do with the 8.3 problems
that I reported yesterday (bug 2472) - I don't know)

 I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
problem.

Then I ran scripts/biosql/load_seqdatabase.pl as:

perl scripts/biosql/load_seqdatabase.pl \
  -driver Pg \
  -dbuser xxxxxxx \
  -dbname bioseqdb \
  -namespace swissprot \
  -format swiss \
   /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat

It took two hours to load 26504 records (7%) of
uniprot_sprot.dat (is it expected to be so slow?), then
failed with:

Could not store Q2UXW0:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Species) failed to insert or to
be found by unique key
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK: Bio::DB::Persistent::PersistentObject::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: scripts/biosql/load_seqdatabase.pl:630
-----------------------------------------------------------


I don't know if this is directly related to the 8.3
casting problems I reported yesterday (bug 2472), or a
separate Bio::Species issue


regards,

Erik Rijkers


From bix at sendu.me.uk  Fri Mar 21 19:17:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 21 Mar 2008 23:17:59 +0000
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
	<8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
Message-ID: <47E44227.3050002@sendu.me.uk>

Jason Stajich wrote:
> 
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
> 
>> Hi. I am pretty new to BioPerl, and have a question about performance 
>> with regard to Blast (nucleotide) file parsing.
[...]
>> What is substantially longer? Well, the existing code takes about 0.25 
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to 
>> be a dramatic difference, and that kind of time difference becomes 
>> significant when I have to parse 30 Blast files in a row. I understand 
>> that SearchIO is parsing the entire file and storing it all for easy 
>> retrieval later, and maybe this time penalty is what I have to pay for 
>> that convenience and organization.
[...]
> Sendu has written a pull parser that 
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that 
> created lightweight objects (or just hashes) instead this would also 
> provide a substantial speedup.

Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the 
format to 'blast_pull'. Depending on the cirumstance and thoughtful 
usage, you can see orders of magnitude speed up.

http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html

The only disadvantage to the normal parser is that the pull parser 
currently only supports NCBI BLASTN and BLASTP.


From hlapp at gmx.net  Sat Mar 22 14:18:45 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 14:18:45 -0400
Subject: [Bioperl-l] Call for Student Applications - NESCent participates in
	the Google Summer of Code
In-Reply-To: <0025B440-EF1E-4632-9DB4-B98489BF3550@duke.edu>
Message-ID: <5AC4F213-8D88-41C6-B380-59B2EF7831F0@gmx.net>

Hi all - just wanted to draw your attention to our Google Summer of  
Code participation this year. One of the projects deals directly with  
BioPerl, another one builds on BioSQL (and could be implemented  
taking advantage of BioPerl or Bio::Phylo, or Biojava).

Cheers,

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================

Phyloinformatics Summer of Code 2008
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

*** Please disseminate this announcement widely to appropriate students
at your institution ***

The National Evolutionary Synthesis Center (NESCent: http:// 
www.nescent.org/) is participating in 2008 for the second year as a  
mentoring organization in the Google Summer of Code (http:// 
code.google.com/soc). Through this program, Google provides  
undergraduate, masters, and PhD students with a unique opportunity to  
obtain hands-on experience writing and extending open-source software  
under the mentorship of experienced developers from around the world.

Our goal in participating is to train future researchers and  
developers to not only have awareness and understanding of the value  
of open-source and collaboratively developed software, but also to  
gain the programming and remote collaboration skills needed to  
successfully contribute to such projects. Students will receive a  
stipend from Google, and may work from their home, or home  
institution, for the duration of the 3 month program. Students will  
each have one or more dedicated mentors with expertise in  
phylogenetic methods and open-source software development.

NESCent is particularly targeting students interested in both  
evolutionary biology and software development. Project ideas (see URL  
below) range from visualizing phylogenetic data in R, to development  
of a Mesquite module, web-services for phylogenetic data providers or  
geophylogeny mashups, implementing phyloXML support, navigating  
databases of networks, topology queries for PhyloCode registries, to  
phylogenetic tree mining in a MapReduce framework, and more.

The project ideas are flexible and many can be adjusted in scope to  
match the skills of the student. If the program sounds interesting to  
you but you are unsure whether you have the necessary skills, please  
email the mentors at the address below.  We will work with you to  
find a project that fits your interests and skills.

INQUIRIES:
Email any questions, including self-proposed project ideas, to  
phylosoc {at}
nescent {dot} org.

TO APPLY:
Apply on-line at the Google Summer of Code website
(http://code.google.com/soc/2008), where you will also find GSoC program
rules and eligibility requirements.  The 1-week application period for
students opens on Monday March 24th and runs through Monday, March  
31st, 2008.

Hilmar Lapp and Todd Vision
US National Evolutionary Synthesis Center

=====
URLs:
=====

2008 NESCent Phyloinformatics Summer of Code:
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

Eligibility requirements:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_eligibility

Stipends:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_administrivia

To sign up for quarterly NESCent newsletters: with announcements about
upcoming programs at the Center:
http://www.nescent.org/about/contact.php


From hlapp at gmx.net  Sat Mar 22 15:30:07 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 15:30:07 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <14191FB6-A8DF-4F60-9FF7-FDAA8F3974B8@gmx.net>

Hi Erik,

I suspect that's a seperate Bio::Species issue. If you query your  
BioSQL database for the existence of the taxon:

SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

do you get a result? If not, then for some reason the taxon wasn't  
yet included in the NCBI taxonomy that you loaded.

If yes, then somehow BioPerl didn't properly parse out the taxonID  
from the record. There should have been another message preceding the  
error below, could you post that too? Otherwise, can you rerun with -- 
printerror as command line argument?

Note also that you can always specify --safe to go past any loading  
error. In fact that's what I recommend doing unless you want to debug  
why a particular record doesn't load.

BTW I would recommend that you restore the CASTs that were removed in  
Pg 8.3; otherwise you may hit random issues in Bioperl-db whenever a  
parameter value for a string-type column happens to be a number.  
(taxon.ncbi_taxon_id is of type integer)

See http://people.planetpostgresql.org/peter/index.php?/archives/18- 
Readding-implicit-casts-in-PostgreSQL-8.3.html

as per Greg's email.

	-hilmar

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> Hi,
>
> PostgreSQL 8.3.1
> DBD::Pg 2.3.0
> perl 5.8.8
>
> (The following error may have to do with the 8.3 problems
> that I reported yesterday (bug 2472) - I don't know)
>
>  I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
> problem.
>
> Then I ran scripts/biosql/load_seqdatabase.pl as:
>
> perl scripts/biosql/load_seqdatabase.pl \
>   -driver Pg \
>   -dbuser xxxxxxx \
>   -dbname bioseqdb \
>   -namespace swissprot \
>   -format swiss \
>    /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat
>
> It took two hours to load 26504 records (7%) of
> uniprot_sprot.dat (is it expected to be so slow?), then
> failed with:
>
> Could not store Q2UXW0:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: create: object (Bio::Species) failed to insert or to
> be found by unique key
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:206
> STACK: Bio::DB::Persistent::PersistentObject::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:244
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:169
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:251
> STACK: Bio::DB::Persistent::PersistentObject::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:271
> STACK: scripts/biosql/load_seqdatabase.pl:630
> -----------------------------------------------------------
>
>
> I don't know if this is directly related to the 8.3
> casting problems I reported yesterday (bug 2472), or a
> separate Bio::Species issue
>
>
> regards,
>
> Erik Rijkers
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Mar 22 16:01:51 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 16:01:51 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
	swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <69D3EA33-810B-40EA-8687-752FA1A34FBF@gmx.net>

Forgot to respond to this:

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> It took two hours to load 26504 records (7%) of uniprot_sprot.dat  
> (is it expected to be so slow?)


The last time I used to load those regularly it was a bit faster (~ 5  
seqs/s) but it is in a ballpark that wouldn't raise a red flag for me.

BTW you can make it print statistics using the --logchunk N option,  
where N is the number of seqs after which you want the current count  
and the #recs/s printed.

You may get it to be faster if you tune the database (e.g., make sure  
there is enough memory for index reorganization, transaction log and  
tablespace datafile are on separate disks, etc; fiddling with the  
query optimizer has probably little effect as almost all queries are  
simple lookups or inserts).

That all said, the strength of load_seqdatabase.pl isn't speed. It  
doesn't make use of any bulk upload optimizations, and therefore the  
initial load of a very large database will take its time. The power  
is more in subsequent updates where you can configure what you want  
to happen, and during which the database is never in an inconsistent  
state, so it can run in the background.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Sat Mar 22 16:34:14 2008
From: er at xs4all.nl (Erik)
Date: Sat, 22 Mar 2008 21:34:14 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>

On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:

> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

No, I don't seem to have that particular id, although I
ran the ncbi load script yesterday just before the sprot. 
Btw, in the meantime I figured out that it was a parsing
error choking on an unexpected period.

You asked for preceding errors, but there were none.

I have now restarted the same uniprot_sprot.dat load with
--safe, which if I understand you correctly will just skip
any non-parsable records.


And wrt the postgres 8.3 casting:

I only added the first cast of the list of Peter Eisentraut:

CREATE FUNCTION pg_catalog.text(integer) RETURNS text
STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT
textin(int4out($1));'; --added 20080322
CREATE CAST (integer AS text) WITH FUNCTION
pg_catalog.text(integer) AS IMPLICIT; --added 20080322

I hope eventually a more durable solution will be found -
I fear this reinstalling of old casting functionality will
generate unexpected problems of it's own.

But it seems a good intermediary solution; with it, the
previously failing t/16odba.t succeeds...


Thank you,

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 17:16:18 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 17:16:18 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
References: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
Message-ID: <3C253027-5A2B-4C0C-9DF6-A0DA84CC96A8@gmx.net>


On Mar 22, 2008, at 4:34 PM, Erik wrote:
> On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:
>
>> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;
>
> No, I don't seem to have that particular id, although I
> ran the ncbi load script yesterday just before the sprot.

Odd. It's on the NCBI taxonomy browser. Maybe just was added the  
other day?

> Btw, in the meantime I figured out that it was a parsing
> error choking on an unexpected period.

Do you want to report that to the BioPerl category on bugzilla.open- 
bio.org?

>
> You asked for preceding errors, but there were none.
>
> I have now restarted the same uniprot_sprot.dat load with
> --safe, which if I understand you correctly will just skip
> any non-parsable records.

And all records that cause some other database error when inserting.  
Note that if you didn't erase the previously loaded records, you will  
either need to choose a new namespace, or, better, use the --lookup  
and --noupdate flags.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mrphysh at juno.com  Sat Mar 22 18:39:22 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Sat, 22 Mar 2008 22:39:22 GMT
Subject: [Bioperl-l] these objects are pretty cool
Message-ID: <20080322.163922.21808.1@webmail01.vgs.untd.com>


I am starting to understand how to use the objects.  

I am the sort who wants to understand how things work, at least on some level.  I think my understanding wold be increased with knowledge of the actual contents of the object.  My book says they are hashes and that makes sense:  field-value.......field-value.......field-value 

But as far as I can tell they cannot be taken apart like a regular hash.

How can I print out the contents?  this makes sense to me:
xxxxxxxxxxxxxxxx
use Bio::SeqIO;

#these objects were made for file input...conversion...file output

while ( my $seq = $out->next_seq() ) {print "$seq\n"; }                        

#or..how about this?

foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }   
xxxxxxxxxxxxxx
this gives back:

Bio::Seq::RichSeq=HASH(0x860dcdc)
Bio::Seq=HASH(0x85f5a20)

Is there some way to slice up the object and look at the parts?

John 
_____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2121/fc/REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/


From jason at bioperl.org  Sat Mar 22 18:46:46 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 22 Mar 2008 15:46:46 -0700
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <20080322.163922.21808.1@webmail01.vgs.untd.com>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
Message-ID: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>

it's got methods that you need to call to get the data.  Did you try  
looking at any of the howtos - they discuss this sort of thing.

http://bioperl.org/wiki/HOWTOs

-jason
On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:

>
> I am starting to understand how to use the objects.
>
> I am the sort who wants to understand how things work, at least on  
> some level.  I think my understanding wold be increased with  
> knowledge of the actual contents of the object.  My book says they  
> are hashes and that makes sense:  field-value.......field- 
> value.......field-value
>
> But as far as I can tell they cannot be taken apart like a regular  
> hash.
>
> How can I print out the contents?  this makes sense to me:
> xxxxxxxxxxxxxxxx
> use Bio::SeqIO;
>
> #these objects were made for file input...conversion...file output
>
> while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>
> #or..how about this?
>
> foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
> xxxxxxxxxxxxxx
> this gives back:
>
> Bio::Seq::RichSeq=HASH(0x860dcdc)
> Bio::Seq=HASH(0x85f5a20)
>
> Is there some way to slice up the object and look at the parts?
>
> John
> _____________________________________________________________
> Click to get a free auto insurance quotes from top companies.
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robfsouza at gmail.com  Sat Mar 22 19:11:49 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 22 Mar 2008 20:11:49 -0300
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
	<DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
Message-ID: <af6a4f100803221611w393e2dd1r965c11afd45978aa@mail.gmail.com>

Hi John,

The methods usually are designed to provide simple means to access the
object's internal data structure. They are intended to hide the
internal data structure, replacing the need to understand it by a
binding to the documented class methods.

If the documentation does not satisfy you and you still want to take a
look at its data structure, try dumping the object with a code like

use Bio::SeqIO;
use Data::Dumper;
while ( my $seq = $out->next_seq() ) { print Dumper($seq),"\n"; }

and check ou chapters four and five of Programming Perl.
Best,
Robson

PS: watch out for lots of printed output...

2008/3/22, Jason Stajich <jason at bioperl.org>:
> it's got methods that you need to call to get the data.  Did you try
>  looking at any of the howtos - they discuss this sort of thing.
>
>  http://bioperl.org/wiki/HOWTOs
>
>
>  -jason
>
> On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:
>
>  >
>  > I am starting to understand how to use the objects.
>  >
>  > I am the sort who wants to understand how things work, at least on
>  > some level.  I think my understanding wold be increased with
>  > knowledge of the actual contents of the object.  My book says they
>  > are hashes and that makes sense:  field-value.......field-
>  > value.......field-value
>  >
>  > But as far as I can tell they cannot be taken apart like a regular
>  > hash.
>  >
>  > How can I print out the contents?  this makes sense to me:
>  > xxxxxxxxxxxxxxxx
>  > use Bio::SeqIO;
>  >
>  > #these objects were made for file input...conversion...file output
>  >
>  > while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>  >
>  > #or..how about this?
>  >
>  > foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
>  > xxxxxxxxxxxxxx
>  > this gives back:
>  >
>  > Bio::Seq::RichSeq=HASH(0x860dcdc)
>  > Bio::Seq=HASH(0x85f5a20)
>  >
>  > Is there some way to slice up the object and look at the parts?
>  >
>  > John
>  > _____________________________________________________________
>  > Click to get a free auto insurance quotes from top companies.
>  > http://thirdpartyoffers.juno.com/TGL2121/fc/
>  > REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>  >
>  >
>  >
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From er at xs4all.nl  Sat Mar 22 19:36:13 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 00:36:13 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>

Hi Hilmar,

> either need to choose a new namespace, or, better, use the
> --lookup and --noupdate flags.

scripts/biosql/load_seqdatabase.pl is now churning along
to load uniprot_sprot.dat.  I'll try to gather up the
rejected records for further inspection / parser
improvement.

The next thing is performance, it's really intolerably
slow, and I don't think the database is the bottleneck -
isn't it more likely bioperl object heaviness?  I get
continuous near 100% load for 1 cpu (this machine has 2
cpus).

I could give it 10 or more processors; I am thinking I
could cut up the input into 10 (or more) chunks.

Is there anything specific in bioperl/biosql that knows
how to use multiple cores?

thank you very much for your help

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 21:40:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 21:40:55 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
References: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
Message-ID: <1B55060F-534D-4732-B428-4863DD098654@gmx.net>


On Mar 22, 2008, at 7:36 PM, Erik wrote:
> The next thing is performance, it's really intolerably
> slow, and I don't think the database is the bottleneck -
> isn't it more likely bioperl object heaviness?  I get
> continuous near 100% load for 1 cpu (this machine has 2
> cpus).


Is the database on the same machine? If yes, and a significant  
fraction (~30-50% or even more) of the load are generated by the perl  
script, rather than almost everything coming from the postmaster,  
then indeed the database is not the bottleneck.

Of course, the bioperl object creation overhead takes a toll too. I  
would be surprised though if BioPerl can't parse more than 3.6  
records/s on a modern CPU; you can convince yourself of that though  
by writing a simple script along the lines of the following and see  
how fast that goes:

my $seqio = Bio::SeqIO->new(-file => '<uniprot_sprot.dat', -format =>  
'swiss);
my $n = 0;
while (my $seq = $seqio->next_seq) {
	$n++;
	# print something every 5,000 sequences or so
}

But maybe load_seqdatabase.pl or even BioSQL or BioPerl aren't  
suitable for your use-case?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Mar 23 10:09:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:09:56 -0500
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
	<0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
Message-ID: <4C401D4F-064C-43F9-A37C-14FA65A96657@uiuc.edu>

Maybe something to discuss at BOSC?

chris

On Mar 19, 2008, at 12:54 PM, Jason Stajich wrote:

> it's probably more than 6 months out. We still haven't finished  
> writing it as life and work continues to intrude on book writing.
>
> -jason
> On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:
>
>> Hello,
>>
>> i just found on amazon something about a book "Using Bioperl",  
>> published
>> on the 1st of March 2008 but which is no more available.
>>
>> Does anyone know how to get it ?
>>
>> Many thanks,
>>
>> Jorge.
>>
>> ---
>> Jorge Duarte
>> Bioinformatics Software Engineer
>> BIOGEMMA
>> Z.I. Du Br?zet
>> 8, Rue des Fr?res Lumi?re
>> 63028 CLERMONT FERRAND Cedex 2
>> FRANCE
>> Tel : +33 (0)4 73 39 60 73
>> Fax : +33 (0)4 73 39 60 71
>> E-mail : jorge.duarte at biogemma.com
>>
>> *****************************************************************
>>       Pour toute demande de support merci d'inclure
>> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>>         dans les destinataires lors du premier contact
>> *****************************************************************
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Mar 23 10:17:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:17:56 -0500
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <FC757880-6FD8-44A5-97BE-F5F37A738C2B@uiuc.edu>


On Mar 18, 2008, at 10:32 AM, Sendu Bala wrote:

> aaron.j.mackey at gsk.com wrote:
>>> Or is the split intended to be 'core' == "anything and everything
>>> that was in 1.4", '????' == "everything else"? In which case,
>>> what's a good name for "modules created after 1.4"? 'crust'? ;)
>> Nah, "icing".
>> a module "use" map might be very useful to help identify "core" vs.
>> other layers of mantle/crust/icing.
>> http://www.perlmonks.org/?node_id=87329 http://search.cpan.org/src/NEILB/pmusage-1.2/
>
> Thanks for those. Neither could quite cope with BioPerl, but I've  
> munged
> them together and hacked up 'module_usage.pl' which I've just  
> committed
> to the maintenance directory of bioperl-live.
>
> module_usage.pl ../Bio
>
> Produces:
> *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
>
> ...
>
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)

I'm coming into this late (just got back) but I agree, this would be  
very useful.  Your updates based on Aaron's comments help quite a bit.

> Chris Fields wrote:
>> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
>> I'm pretty flexible on any of that; it's a proposal only and I think
>> some of it may be wrongheaded, but hey, I'm willing to take a few
>> rotten tomatoes.  The key issue is we should try to work out what we
>> mean by 'core' or the core library.  I have a rather extreme view of
>> it as being the bare essentials without external, non-perl core
>> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
>> and required modules for those classes) but I'm sure others would
>> lump in parsers, DB functionality, etc.  I basically suggest placing
>> those (and any stable but potentially non-core code) in a
>> 'bioperl-main', with any unstable or untested code going into a
>> 'bioperl-unstable'.
>
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>  stable APIs), directory structure and base classes that everything
>  else relies on
> # It might not do much useful bioinformatics, but provides just about
>  everything needed for a dev to create a new module that does

Yes, that's essentially the idea.

>> In essence, bioperl-main would require core and resemble a stable
>> release; bioperl-unstable would require bioperl-main (and core) and
>> resemble a dev release.  Not sure how versioning would go or if this
>> is a viable option at all, but it's worth discussing.
>
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>  tree, mostly parsers and the like which, whilst 'stable' and tested
>  should still be split away from core because the data sources they
>  parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>  stuff, be it new parsers for totally new formats, new APIs that do
>  something not thought of before etc. When they are complete, bug-free
>  and have stood the test of time they get moved into bioperl-main.
>  (It is not a place for all new commits; bug fixes to something in
>  bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>  their own core and bleed variant. Anything they need for core
>  functionality would enter the single bioperl-core, anything new
>  would enter the single bioperl-bleed, and anything stable would
>  be in their own bioperl-[package]
>
> Discuss :)

We can work on updating the plan via the wiki as well as the mail  
list.  I find it easier to track; we can always link back to the mail  
list when needed.

http://www.bioperl.org/wiki/Proposed_1.6_core_modules
http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules

chris


From er at xs4all.nl  Sun Mar 23 14:16:05 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 19:16:05 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
> aren't suitable for your use-case?

well, that may turn out to be the case, but I'm not quite
deterred yet.

I am in a situation like many others, I think: microarray,
mass spec, and chipseq (Solexa) data all need
annotation,and while it is easy to retrieve some useful
records from public data sources (entrez, ensembl &
biomart, etc.), it is not so easy to have such high
atomicity in the locally stored annotation data that
fine-grained filtering and sorting on a sql level becomes
possible.  I hope the bioperl parsers, together with the
biosql schema, will give SQL access to all or most data
bits.

And I understand GBrowse can run on top of BioSQL/Pg too,
albeit somewhat preliminary; this is another usage I will
need.

btw, should not all those references to postgres 7.3 be
upgraded to something newer, like 8.2.7 (maybe not yet 8.3
heh) ?  7.3 is not supported anymore by the pg project.

Sprot loaded in 20 hours. Only 170 were rejected - not too
bad.

Thanks,

Erik Rijkers


From hlapp at gmx.net  Sun Mar 23 15:22:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 23 Mar 2008 15:22:46 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
References: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
Message-ID: <EF4031EF-FBB5-493E-8D3C-8530E2ADFC20@gmx.net>


On Mar 23, 2008, at 2:16 PM, Erik wrote:
> On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
>> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
>> aren't suitable for your use-case?
>
> well, that may turn out to be the case, but I'm not quite
> deterred yet.
>
> I am in a situation like many others, I think: microarray,
> mass spec, and chipseq (Solexa) data all need
> annotation,and while it is easy to retrieve some useful
> records from public data sources (entrez, ensembl &
> biomart, etc.), it is not so easy to have such high
> atomicity in the locally stored annotation data that
> fine-grained filtering and sorting on a sql level becomes
> possible.  I hope the bioperl parsers, together with the
> biosql schema, will give SQL access to all or most data
> bits.

If you mean annotation by data bits then yes, it should be fairly  
normalized (possibly more normalized than you want, in fact).

Also, using BioSQL as the sequence and sequence annotation model add- 
on to some other database holding your lab data is what many others  
have used it for too.

>
> And I understand GBrowse can run on top of BioSQL/Pg too,
> albeit somewhat preliminary; this is another usage I will
> need.

It can, though keep in mind that that's not the use-case it (BioSQL)  
was built for. If you need to have rapid access to genome intervals  
with 10s of thousands of features and their annotation, you'll have  
start thinking about a more de-normalized data store to run this off  
of, such as populating a native GBrowse GFF store.

>
> btw, should not all those references to postgres 7.3 be
> upgraded to something newer, like 8.2.7 (maybe not yet 8.3
> heh) ?  7.3 is not supported anymore by the pg project.

Oops, indeed. Where are they?

>
> Sprot loaded in 20 hours. Only 170 were rejected - not too
> bad.

That's great. Would be nice if you can provide some rough summary as  
to why they were rejected (if that's obvious), such as taxon errors,  
or other errors.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From sac at bioperl.org  Sun Mar 23 18:20:43 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Sun, 23 Mar 2008 15:20:43 -0700
Subject: [Bioperl-l] HitTableWriter error
In-Reply-To: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
References: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
Message-ID: <8f200b4c0803231520o5082f9f5mf7be8cd061faa98f@mail.gmail.com>

Hi Owen

Sorry, I don't have time to look into this right now, but two thoughts:

1) The ResultTableWriter is intended to convert standard blast reports into
a tabular format. So if you already have tabular results, you are sort of
using it "off label", but in principle, it should work.

2) The iteration method is only available to BlastHit objects, since it is
only relevant to PSI-blast results. The fact that you got this error when
working with blast results indicates that the parser did not generate the
correct object type for your hits, using the GenericHit when it should have
used BlastHit. This is just a hunch and would be worth following up on. Can
you submit this as a bug report?

Thanks,
Steve


On Thu, Mar 20, 2008 at 4:56 AM, Owen Lancaster <ol8 at leicester.ac.uk> wrote:

> Hello
>
> I hope you don't mind me emailing you but I have come across a problem
> when trying to use HitTableWriter. The error can be seen below - the
> input for the script is the BLAST tabular output (specified with the -
> m 8 option) from a blastn search.
>
> If you have any idea what the problem might be I would much appreciate
> it! Hope you can help...
>
> Thanks
>
> Owen
>
>
> Using default column map.
>
> ------------- EXCEPTION  -------------
> MSG: Trouble in ResultTableWriter::_set_row_data_func() eval: Can't
> locate object method "iteration" via package
> "Bio::Search::Hit::GenericHit" at (eval 97) line 1, <GEN1> line 2.
>
>
>
> STACK Bio::SearchIO::Writer::ResultTableWriter::__ANON__ /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/ResultTableWriter.pm:328
> STACK Bio::SearchIO::Writer::HitTableWriter::to_string /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/HitTableWriter.pm:268
> STACK Bio::SearchIO::write_result /Library/Perl/5.8.8/Bio/SearchIO.pm:
> 331
> STACK Bio::SearchIO::blast::write_result /Library/Perl/5.8.8/Bio/
> SearchIO/blast.pm:2208
> STACK toplevel ./generate_discordant_tails.pl:62
>
> --------------------------------------
>
>


From greg at turnstep.com  Sun Mar 23 20:42:36 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Mon, 24 Mar 2008 00:42:36 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>
Message-ID: <4ab14dcc59d7566b55ba87027055e9fd@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

> So disabling server-side prepares will leave values quoted? Having
> server-side prepares would be very useful though, especially for
> Bioperl-db with its many lookup queries that all use similar
> parameter values.

Yes, it forces DBD::Pg to do the quoting itself, which basically means
that everything is shipped to the server as a single SQL string, and
no placeholders are used. In the grand scheme of things, the speed
difference is not large for most queries. Certainly one way would be
to turn this on for 8.3 and above, and slowly migrate the queries/schema
over time.

>> There's good reasons for the casting enforcement in 8.3

> I do understand that, but it's also a sharp contrast to other RDBMSs
> that doesn't it make it easier for people to choose Pg when they
> should, and doesn't help writing cross-platform database applications
> either.

I'm not overly familiar with how other databases treat this, but I've
heard DB2 can be a stickler about this too. I've not dug into the bioperl
code in a while, to be honest, so I'm not sure what sort of queries we're
talking about. Certainly long-term the code and schema should move away
from implicit casting. Maybe a better short-term solution is addind
the more obvious casts (e.g. text<->int) back in.

> Do you have links to some of the key threads showing what rationale
> went into the decision? (Or should I just search for your name?) I'd
> like to read up on that first before pouring more oil into the fire.
> I suspect that many of those who made the decision are never faced
> with needing to write cross-RDBMS code.
>
> Also, I wonder why this wasn't made a configurable option so it can
> be disabled by a simple config file change (such as the move away
> from automatic OID columns). But obviously this is the wrong list for
. discussing this (though Bioperl-db *is* one of those pieces of
> software that must be cross-RDBMS).

I did ask about that, and was told it would not have been easy to do so.
But I agree, a phasing in period (heck, even a warning) would have been
nice. Feel free to pour some oil on the fire, I think this is one of
many apps that has been affected. (I've run across two other major
cross-DB apps (Interchange and MediaWiki) that are struggling with the
same pain. I managed to painfully fix the latter, but the former is way
too complex to tackle at the moment).

I could not find the thread(s?) I weighed in on, but you can find some
relevant discussions by googling "strict-typing benefits grokbase"

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803232039
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfm+NAACgkQvJuQZxSWSsi4ogCdGNWvCJIzXxb+YKzdm6wwxQMv
p3AAnizkWXoo/rvxv4KVdC8tD0vF87k3
=dNYi
-----END PGP SIGNATURE-----


From er at xs4all.nl  Sun Mar 23 20:45:49 2008
From: er at xs4all.nl (Erik)
Date: Mon, 24 Mar 2008 01:45:49 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <19067.156.83.1.157.1206319549.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 20:22, Hilmar Lapp wrote:
>
> On Mar 23, 2008, at 2:16 PM, Erik wrote:
>> Sprot loaded in 20 hours. Only 170 were rejected - not
>> too bad.
>
> That's great. Would be nice if you can provide some rough
> summary as  to why they were rejected (if that's
obvious), such as taxon errors,
> or other errors.

see

   http://bugzilla.open-bio.org/show_bug.cgi?id=2474

So I think one easy improvement will be to enlarge that 
varchar(40) column, dbxref.accession. See the following:

select
	dbname
,	accession
,	length(accession)
from 	dbxref
where accession ~ 'Cyc'
order by length(accession) desc
limit 100

patch attached. (which will probably get bug 2389 resolved)

It seems to me bioentry.accession (maybe identifier too?)
needs a similar enlargement.


thanks,

Erikjan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biosqldb-pl.sql.diff
Type: application/octet-stream
Size: 535 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080324/a564d7ee/attachment-0003.obj>

From Russell.Smithies at agresearch.co.nz  Wed Mar 26 22:13:16 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 27 Mar 2008 15:13:16 +1300
Subject: [Bioperl-l] Bioinformatician wanted
Message-ID: <D5DBA313349A4B458528BE63B387F36C06A09345@imail.agresearch.co.nz>

Dear colleagues,

 
It would be appreciated if you could bring the following position to the
attention of potential candidates.

 
Bioinformatician wanted


Many of the world's most amazing scientific discoveries are the result
of someone 'taking a closer look'. It's this inquisitive nature and
relentless search for answers that fuels scientific advancement. And
it's also what we'd like you to apply to us, right now. Mind you, you
won't need to look too hard to discover that AgResearch is the best
place to break all new ground in your career.  


At first glance you'll see we are New Zealand's largest research
institute -world leaders in pastoral research working at the
leading-edge of innovation. Scratch the surface a little more and you'll
find all the diversity and intellectual challenge a Bioinformatician
could ask for. This is a highly collaborative role where you'll be
involved in everything from the analysis of genomic data to the design,
development, implementation and testing of bioinformatics tools.
Knowledge sharing is a pivotal component of our success, so you can also
look forward to acting in consultant capacity (both internally and
externally) and the autonomy to contribute to scientific publications. 

 
Our people are at the pinnacle in their professions, so with your
biological background, higher qualification in bioinformatics or
computing and your experience in contributing bioinformatics expertise
to research groups, you'll not only fit right in, you'll hit the ground
running. An outstanding communicator, time manager and relationship
builder, you'll also come to us with a thorough knowledge of Unix,
pipeline-development, web based technologies and scripting and
programming languages. 

 
AgResearch is a unique organisation at the forefront of our field, and
as far as your future's concerned, that makes us well worth a closer
look. There are many benefits waiting to be discovered here, so isn't it
time you experienced them? The job description is available online and
applications are invited at www.agresearch.co.nz/recruitment/ Reference
AGR661, or contact Nauman Maqbool for further information.

 
Applications close 11 April 2008.

 
Regards,
Russell


Russell Smithies 
Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 
AgResearch Farming Food and Health. First 
Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi 
Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From Marc.Logghe at ablynx.com  Thu Mar 27 09:26:24 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 14:26:24 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
Message-ID: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>

Hi all,

I am a little bit confused about the above mentioned seq_inds() method.
At first, I had the impression that the method returns an array of
positions in the hsp (hit or query) sequence.

At least that is what one would expect looking at the example usage in
the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
second code block).

Am I correct in believing you can only do this if your hsp query stretch
starts at position 1 of the query sequence?

I think seq_inds() returns a list of positions relative to the query/hit
sequence. So, the code shown in the HOWTO is a kind of special case.

However, I do not understand how seq_inds() is dealing with gaps.

An example. If you blast the worm protein ZK822.4 against swissprot
using blastp at ncbi you get this hsp as top:

 
>sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461
 
 Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
stats.
 Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
 
Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

 
Now, if you call seq_inds(query => 'gap') on that particular hsp object,
you get these positions: 417, 431, 432. Obviously, there is no gap in
the original query sequence at these positions. 
How do you have to read these numbers ? Remark also that for instance
417 is the res just in front of the gap.

Regards,

Marc

 
From bix at sendu.me.uk  Thu Mar 27 10:46:35 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 27 Mar 2008 14:46:35 +0000
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <47EBB34B.8010606@sendu.me.uk>

Marc Logghe wrote:
> Hi all,
> 
> I am a little bit confused about the above mentioned seq_inds() method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.

Yes...


> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
> 
> Am I correct in believing you can only do this if your hsp query stretch
> starts at position 1 of the query sequence?

No...


> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
>             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions. 
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.

Its purpose is to let you know the position in query or subject 
coordinates where something interesting happened in the alignment. So 
seq_inds(query => 'gap') is telling you all the places that a gap starts 
in the alignment in terms of the query coordinates. Hence 417 etc.


(Actually, does 432 make sense? Shouldn't it be 431 twice?)


From Marc.Logghe at ablynx.com  Thu Mar 27 11:09:56 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 16:09:56 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <47EBB34B.8010606@sendu.me.uk>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
Message-ID: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>

Hi Sendu, Chris

> > At least that is what one would expect looking at the example usage
in
> > the HOWTOs
(http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> > second code block).
> >
> > Am I correct in believing you can only do this if your hsp query
stretch
> > starts at position 1 of the query sequence?
> 
> No...

<HOWTO>
# put all the conserved matches in query strand into an array
my @str_array = split "",$hsp->query_string;
foreach ( $hsp->seq_inds('query','conserved') ){
  push @conserved,$str_array[$_ - 1];
}
</HOWTO>

$hsp->query_string will return
'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'

In my example using the 'gap' class (instead of 'conserved'), @str_array
will contain 417, 431 and 432. The off-by-one indices do not exist in
that array.
Therefore, I still think the howto shows a special case where the hsp
query sequence starts at 1 (compared to 402 in my particular example). 


> 
> 
> > Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
453
> >             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> > Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
193
> >
> > Now, if you call seq_inds(query => 'gap') on that particular hsp
object,
> > you get these positions: 417, 431, 432. Obviously, there is no gap
in
> > the original query sequence at these positions.
> > How do you have to read these numbers ? Remark also that for
instance
> > 417 is the res just in front of the gap.
> 
> Its purpose is to let you know the position in query or subject
> coordinates where something interesting happened in the alignment. So
> seq_inds(query => 'gap') is telling you all the places that a gap
starts
> in the alignment in terms of the query coordinates. Hence 417 etc.

So, this means you have to interpret that as a gap is coming after 417 ?

> 
> 
> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
Don't know, depends on how you have to 'read' this.
Thanks for looking into this.
Regards,
Marc


From cjfields at uiuc.edu  Thu Mar 27 11:05:59 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 10:05:59 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <489252B3-2255-45C3-9219-A8F8A0978B89@uiuc.edu>

According to the GenericHSP::seq_inds() POD, seq_inds() reports  
residue positions (indices) for the query/subject based on identity/ 
conservation, i.e. these are fro the original sequence positions as  
determined by the HSP data, not alignment column positions.  'gaps'  
should be reported at the position prior to where a gap is inserted.   
However I think something is getting borked when the gap length is  
longer than one, so I would partially qualify this as a bug.

Example: When I ran this using bioperl-live it gives a different set  
of gaps indices which appear to be correct.  I reran the BLASTP using  
the web form using your query against swissprot and parsed it.  I got  
slightly different results for the BLAST report (probably differences  
in the query sequence):

 >gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461

  Score = 35.8 bits (81),  Expect = 0.47, Method: Composition-based  
stats.
  Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)

Query  394  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  445
             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

.....

seq_inds('query' => 'gaps') reports 409,423, and 424, which is  
partially correct, e.g. there is a gap inserted after position 409 and  
423 in the query.  However, no gap is present after 424; I think this  
occurs b/c the gap length is 2.  The other HSPs report similar problems.

chris

P.S. Just saw than Sendu posted; I agree, seq. positions with gap  
lengths > 1 should be repeated.  Should be easy to fix that.

On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote:

> Hi all,
>
> I am a little bit confused about the above mentioned seq_inds()  
> method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.
>
> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
>
> Am I correct in believing you can only do this if your hsp query  
> stretch
> starts at position 1 of the query sequence?
>
> I think seq_inds() returns a list of positions relative to the query/ 
> hit
> sequence. So, the code shown in the HOWTO is a kind of special case.
>
> However, I do not understand how seq_inds() is dealing with gaps.
>
> An example. If you blast the worm protein ZK822.4 against swissprot
> using blastp at ncbi you get this hsp as top:
>
>
>
>> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
> Length=1461
>
> Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
> stats.
> Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
>
> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL   
> 453
>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL   
> 193
>
>
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp  
> object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions.
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.
>
> Regards,
>
> Marc
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar 27 12:04:20 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 11:04:20 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
	<03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
Message-ID: <FF47D72B-7CBA-491B-9C9B-9B32FD2AA8BE@uiuc.edu>


On Mar 27, 2008, at 10:09 AM, Marc Logghe wrote:

> Hi Sendu, Chris
>
>>> At least that is what one would expect looking at the example usage
> in
>>> the HOWTOs
> (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> second code block).
>>>
>>> Am I correct in believing you can only do this if your hsp query
> stretch
>>> starts at position 1 of the query sequence?
>>
>> No...
>
> <HOWTO>
> # put all the conserved matches in query strand into an array
> my @str_array = split "",$hsp->query_string;
> foreach ( $hsp->seq_inds('query','conserved') ){
>  push @conserved,$str_array[$_ - 1];
> }
> </HOWTO>
>
> $hsp->query_string will return
> 'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'
>
> In my example using the 'gap' class (instead of 'conserved'),  
> @str_array
> will contain 417, 431 and 432. The off-by-one indices do not exist in
> that array.
> Therefore, I still think the howto shows a special case where the hsp
> query sequence starts at 1 (compared to 402 in my particular example).

We'll have to look at it; it should probably be clarified particularly  
in reference to 'gaps' and use of seq positions vs. HSP (or alignment)  
positions.

Think of it this way; seq_inds() takes 'identical', 'conserved', etc.,  
all of which refer to the original positions (indices) of the sequence  
which fall into the particular category asked for.  In these cases we  
are using the coordinates for query/hit directly from the HSP info in  
the report.  This is done with the express purpose of mapping  
attributes back to the original sequence, be it the query or subject.

Gaps, however, are tricky, since sequence coordinates refer to  
residues (not gaps) when using BLAST.  In this case we use the  
sequence position prior to the gap to note where a gap is inserted.   
The previous results, then, would be wrong as there is no gap inserted  
after 432.  I just committed a fix which just repeats the position  
based on the number of gaps.

>>> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
> 453
>>>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
>>> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
> 193
>>>
>>> Now, if you call seq_inds(query => 'gap') on that particular hsp
> object,
>>> you get these positions: 417, 431, 432. Obviously, there is no gap
> in
>>> the original query sequence at these positions.
>>> How do you have to read these numbers ? Remark also that for
> instance
>>> 417 is the res just in front of the gap.
>>
>> Its purpose is to let you know the position in query or subject
>> coordinates where something interesting happened in the alignment. So
>> seq_inds(query => 'gap') is telling you all the places that a gap
> starts
>> in the alignment in terms of the query coordinates. Hence 417 etc.
>
> So, this means you have to interpret that as a gap is coming after  
> 417 ?

Yes.

>> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
> Don't know, depends on how you have to 'read' this.
> Thanks for looking into this.
> Regards,
> Marc

Repeating the position based on the number of gaps is now the default  
in bioperl-live.  Just working on fixing problems with collapsing  
numbers and tests and everything should be fine.

chris


From hiekeen at gmail.com  Sat Mar 29 12:09:18 2008
From: hiekeen at gmail.com (Jinyan Huang)
Date: Sun, 30 Mar 2008 00:09:18 +0800
Subject: [Bioperl-l] Gene Id converts.
Message-ID: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>

Hi,

I have a list of gene bank accession id. I want to convert these ids to NCBI id.

For example:

>From NM_011917 to 2919914.

How can I do it?

Thanks


-- 
Best regards,
Jinyan Huang (ekeen)
School of Life Sciences and Technology, 1302 Room
Tongji University
Siping Road 1239, Shanghai 200092
P.R. China
Tel :0086-21-65981041
Msn: hiekeen at hotmail.com
eMail: hiekeen at gmail.com


From cjfields at uiuc.edu  Sat Mar 29 13:42:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 29 Mar 2008 12:42:50 -0500
Subject: [Bioperl-l] Gene Id converts.
In-Reply-To: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
References: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
Message-ID: <FC7E8E14-83E7-4611-868B-0D1390488C82@uiuc.edu>

There are the GenBank LiveLists (updated every Sunday), which has  
accession/version/UID mappings for nuc and protein GenBank records.  I  
haven't used it personally but it's worth a look:

ftp://ftp.ncbi.nih.gov/genbank/livelists/
There is also gene2accession, which contains mappings between  
accession and UID (though this is more EntrezGene-related, I believe):
ftp://ftp.ncbi.nih.gov/gene/DATA/

Both have documentation detailing formats.  I would recommend using  
one of the above two on a local database setup if you plan on  
converting a large number of accessions.

Bio::DB::EUtilities can also do this but is web-based via eutils.   
There are a couple of stub examples in the Cookbook HOWTO under  
'efetch' on converting accessions to UID (and vice versa), though note  
there is no one-to-one correspondence.  You can also convert UIDs to  
accessions using 'esummary' but the converse (accession to GI)  
requires, strangely, using efetch to grab the UIDs first, then re- 
retrieving the acccessions via esummary for one-to-one correspondence.
http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

chris

On Mar 29, 2008, at 11:09 AM, Jinyan Huang wrote:

> Hi,
>
> I have a list of gene bank accession id. I want to convert these ids  
> to NCBI id.
>
> For example:
>
>> From NM_011917 to 2919914.
>
> How can I do it?
>
> Thanks
>
>
> -- 
> Best regards,
> Jinyan Huang (ekeen)
> School of Life Sciences and Technology, 1302 Room
> Tongji University
> Siping Road 1239, Shanghai 200092
> P.R. China
> Tel :0086-21-65981041
> Msn: hiekeen at hotmail.com
> eMail: hiekeen at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pradel.d at gmail.com  Mon Mar 31 10:35:26 2008
From: pradel.d at gmail.com (Damien Pradel)
Date: Mon, 31 Mar 2008 16:35:26 +0200
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
Message-ID: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>

Hello,

I use the SeqIO module in order to  parse EMBL files.
Unfortunately I got a problem: the ID was not recognised because instead of
the ID value  I get the answer "unknown_id" ...

So to solve this problem I have modified the file embl.pm located in
directory SeqIO at the line 189 as follow :

if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

in :
if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

With this modification, the ID value is correctly collected.

Hope it will help.

Damien


From golharam at umdnj.edu  Mon Mar 31 15:31:56 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 31 Mar 2008 15:31:56 -0400
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module
Message-ID: <47F13C2C.4070909@umdnj.edu>

I have a (very) basic SAX implementation of a SeqIO module to parse 
GenBank XML records.  Right now, it only reads in basic information 
regarding the sequence and the sequence itself.

It does not yet parse the features table.  Should I submit it to be 
included in bioperl or wait until I implement more for the features 
table?  I'm not sure when I'll get around to it though

Ryan


From cjfields at uiuc.edu  Mon Mar 31 16:05:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 15:05:51 -0500
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO
	module
In-Reply-To: <47F13C2C.4070909@umdnj.edu>
References: <47F13C2C.4070909@umdnj.edu>
Message-ID: <4A3D5CD8-13D7-4CBF-B89A-CE81B8804C61@uiuc.edu>

You can submit it either to me directly or to bugzilla (start a new  
bug report as an enhancement request, then attach the relevant files).

Does it have a test suite available?  If not, you should try setting  
one up:

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

chris

On Mar 31, 2008, at 2:31 PM, Ryan Golhar wrote:

> I have a (very) basic SAX implementation of a SeqIO module to parse  
> GenBank XML records.  Right now, it only reads in basic information  
> regarding the sequence and the sequence itself.
>
> It does not yet parse the features table.  Should I submit it to be  
> included in bioperl or wait until I implement more for the features  
> table?  I'm not sure when I'll get around to it though
>
> Ryan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Mar 31 19:58:44 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 18:58:44 -0500
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
In-Reply-To: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
References: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
Message-ID: <629B8FAF-3A1F-41E2-BFF1-A709DDE56A09@uiuc.edu>

The parser no longer has this line; it has been updated to work with  
both old and new format EMBL.  You might want to try updating from  
Subversion or install the nightly build.

http://bioperl.org/DIST/nightly_builds/

chris

On Mar 31, 2008, at 9:35 AM, Damien Pradel wrote:
> Hello,
>
> I use the SeqIO module in order to  parse EMBL files.
> Unfortunately I got a problem: the ID was not recognised because  
> instead of
> the ID value  I get the answer "unknown_id" ...
>
> So to solve this problem I have modified the file embl.pm located in
> directory SeqIO at the line 189 as follow :
>
> if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> in :
> if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> With this modification, the ID value is correctly collected.
>
> Hope it will help.
>
> Damien
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dfog22 at hotmail.com  Wed Mar 26 10:13:25 2008
From: dfog22 at hotmail.com (MathGon)
Date: Wed, 26 Mar 2008 07:13:25 -0700 (PDT)
Subject: [Bioperl-l]  File concatenation
Message-ID: <16301515.post@talk.nabble.com>


For my first post, I will introduce myself. I'm a PhD student in microbiology
focusing in horizontal gene transfer in hyperthermophilic Archaea;

I retrieve a genbank file for each contig of an unfinished genome. I want to
produce a unique genbank file by concatenation. I didn't manage to find a
such script and I'm not enough trained in perl to write it...
Have you got an other solution or a script for me?

Best regards...
-- 
View this message in context: http://www.nabble.com/File-concatenation-tp16301515p16301515.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From anjan.purkayastha at gmail.com  Mon Mar  3 17:31:11 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 12:31:11 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
Message-ID: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>

hi
i am tried to use the perl wrappers for EMBOSS with:

use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
use Bio::Factory::EMBOSS;

however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
directory mentioned above.

so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
attached error message.

any ideas on what i need to do to  make this work?
all advice will be appreciated.

tia,

anjan


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emboss_install_error_message.rtf
Type: application/rtf
Size: 123212 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080303/08ee41b2/attachment-0004.rtf>

From cjfields at uiuc.edu  Mon Mar  3 18:54:06 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 12:54:06 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <AC7CFEBA-5699-45A0-9426-8885AEA78216@uiuc.edu>

You'll need to install bioperl-run.  Bio::Factory::EMBOSS is in  
bioperl-run, not the main bioperl distribution (aka bioperl-core).

chris

On Mar 3, 2008, at 11:31 AM, ANJAN PURKAYASTHA wrote:

> hi
> i am tried to use the perl wrappers for EMBOSS with:
>
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
>
> however it seems that Bio::Factory::EMBOSS cannot be found in the  
> bioperl
> directory mentioned above.
>
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i  
> got the
> attached error message.
>
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
>
> tia,
>
> anjan
>
>
> -- 
> ANJAN PURKAYASTHA, PhD.
> Senior Computational Biologist
> ==========================
>
> 1101 King Street, Suite 310,
> Alexandria, VA 22314.
> 703.518.8040 (office)
> 703.740.6939 (mobile)
>
> email:
> anjan at vbi.vt.edu;
> anjan.purkayastha at gmail.com
>
> http://www.vbi.vt.edu
>
> ==========================
> < 
> emboss_install_error_message 
> .rtf>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From David.Messina at sbc.su.se  Mon Mar  3 19:34:20 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 3 Mar 2008 20:34:20 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>

Hi Anjan,

Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
rather part of bioperl-run. For some reason CPAN went for the old (1.4)
version of bioperl-run rather than the current 1.5.2.

And indeed, I seem to run into the same problem:
cpan> d /bioperl/

Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found

but when I ask in a different way the right distributions show up. [Sendu,
any idea what's going on here?]

cpan> ls
SENDU
 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz

So try doing

cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz

Or if CPAN refuses to cooperate, you can grab it from here:
http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release


Dave


From arareko at campus.iztacala.unam.mx  Mon Mar  3 19:25:14 2008
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 03 Mar 2008 13:25:14 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
Message-ID: <47CC509A.10306@campus.iztacala.unam.mx>

Hi Anjan,

It looks like you are using the latest BioPerl developer release
(bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
then you should try installing the latest BioPerl-run as well
(bioperl-run-1.5.2_100). After you install it, you'll have to modify
your 'use lib' pragma for your script to work as you expect:

use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
use Bio::Factory::EMBOSS;

Hope this helps.

Regards,
Mauricio.


ANJAN PURKAYASTHA wrote:
> hi
> i am tried to use the perl wrappers for EMBOSS with:
> 
> use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> use Bio::Factory::EMBOSS;
> 
> however it seems that Bio::Factory::EMBOSS cannot be found in the bioperl
> directory mentioned above.
> 
> so i tried to install Bio::Factory::EMBOSS from the cpan website. i got the
> attached error message.
> 
> any ideas on what i need to do to  make this work?
> all advice will be appreciated.
> 
> tia,
> 
> anjan
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Mon Mar  3 20:05:16 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 3 Mar 2008 14:05:16 -0600
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<628aabb70803031134g3263e94fge7131f8862434b23@mail.gmail.com>
Message-ID: <43EC247B-EC01-483D-82B1-D861590A141A@uiuc.edu>


On Mar 3, 2008, at 1:34 PM, Dave Messina wrote:

> Hi Anjan,
>
> Bio::Factory::EMBOSS is not part of the BioPerl core distribution, but
> rather part of bioperl-run. For some reason CPAN went for the old  
> (1.4)
> version of bioperl-run rather than the current 1.5.2.
>
> And indeed, I seem to run into the same problem:
> cpan> d /bioperl/
>
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
>
> but when I ask in a different way the right distributions show up.  
> [Sendu,
> any idea what's going on here?]

It's marked as a developer release, which I think requires a full path  
(as you have below) and not just the package name.

chris

> cpan> ls
> SENDU
> 5919092 2007-02-14 SENDU/bioperl-1.5.2_102.tar.gz
>  320154 2006-12-06 SENDU/bioperl-db-1.5.2_100.tar.gz
>   99082 2006-12-06 SENDU/bioperl-network-1.5.2_100.tar.gz
>  942093 2006-12-06 SENDU/bioperl-run-1.5.2_100.tar.gz
>
> So try doing
>
> cpan> install SENDU/bioperl-run-1.5.2_100.tar.gz
>
> Or if CPAN refuses to cooperate, you can grab it from here:
> http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release
>
>
> Dave


From anjan.purkayastha at gmail.com  Mon Mar  3 19:57:33 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Mon, 3 Mar 2008 14:57:33 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <47CC509A.10306@campus.iztacala.unam.mx>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
Message-ID: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>

guys,
thanks! i got bioperl-run to work.
next question, let's say i want to run the palindrome program in emboss
using the bioperl wrapper.   now, palindrome takes in a list of parameter
values- these are fed into emboss as a key-value hash. where do i find the
correct names of the keys to create the input hash?
tia.

anjan

On Mon, Mar 3, 2008 at 2:25 PM, Mauricio Herrera Cuadra <
arareko at campus.iztacala.unam.mx> wrote:

> Hi Anjan,
>
> It looks like you are using the latest BioPerl developer release
> (bioperl-1.5.2_102) from CPAN, to have Bio::Factory::EMBOSS available
> then you should try installing the latest BioPerl-run as well
> (bioperl-run-1.5.2_100). After you install it, you'll have to modify
> your 'use lib' pragma for your script to work as you expect:
>
> use lib "/Users/anjan/perl_directory/bioperl-run-1.5.2_100/";
> use Bio::Factory::EMBOSS;
>
> Hope this helps.
>
> Regards,
> Mauricio.
>
>
> ANJAN PURKAYASTHA wrote:
> > hi
> > i am tried to use the perl wrappers for EMBOSS with:
> >
> > use lib "/Users/anjan/perl_directory/bioperl-1.5.2_102/";
> > use Bio::Factory::EMBOSS;
> >
> > however it seems that Bio::Factory::EMBOSS cannot be found in the
> bioperl
> > directory mentioned above.
> >
> > so i tried to install Bio::Factory::EMBOSS from the cpan website. i got
> the
> > attached error message.
> >
> > any ideas on what i need to do to  make this work?
> > all advice will be appreciated.
> >
> > tia,
> >
> > anjan
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 08:48:15 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 09:48:15 +0100
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
Message-ID: <47CD0CCF.4060306@medecine.unige.ch>

Hello,

Trying to run Bio::TreeIO by this command:

perl -e 'use Bio::TreeIO'

I get the following error:

Weak references are not implemented in the version of perl at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Tree/Node.pm line 76.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO/TreeEventBuilder.pm line 65.
Compilation failed in require at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
BEGIN failed--compilation aborted at 
/usr/lib/perl5/site_perl/5.8.8/Bio/TreeIO.pm line 77.
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
recent version of bioperl around 5 month ago. Any suggestions of why 
this module can't be loaded correctly?

Greetings, Daniel


From bix at sendu.me.uk  Tue Mar  4 11:55:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 11:55:32 +0000
Subject: [Bioperl-l] Bio::TreeIO rises error "Weak references are not
 implemented in the version of perl"
In-Reply-To: <47CD0CCF.4060306@medecine.unige.ch>
References: <47CD0CCF.4060306@medecine.unige.ch>
Message-ID: <47CD38B4.1070200@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Trying to run Bio::TreeIO by this command:
> 
> perl -e 'use Bio::TreeIO'
> 
> I get the following error:
> 
> Weak references are not implemented in the version of perl 
> [...]
> I am running perl v5.8.8 on Fedora 8 on a 64bit machine. I installed a 
> recent version of bioperl around 5 month ago. Any suggestions of why 
> this module can't be loaded correctly?

Redhat/Fedora apparently has Perl issues.

First try installing the latest version of Scalar::Util yourself:

perl -MCPAN -e shell
force install Scalar::Util

If that doesn't work, you'll have to download and compile Perl yourself 
from source (don't use Fedora's installation system).


From apapanicolaou at ice.mpg.de  Tue Mar  4 12:03:27 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:03:27 +0100
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
Message-ID: <47CD3A8F.9050902@ice.mpg.de>

hello all,

1) I was wondering if you would you know what this error means and had 
time to help...

Use of uninitialized value in concatenation (.) or string at 
/usr/local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287

line 287 is
            else {
                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
            }

this is the header
# $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
#
# BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm


this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8, 
bioperl: tried with both 1.5.2_102 from cvs and checked out svn version 
today)

use Bio::SearchIO::Writer::GbrowseGFF;
use Bio::SearchIO;
if ($program eq "blastn"){
#my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                        -file   => ">$infile.$query.blast.gff");
#my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
 my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
                        -output_cigar   => 1,
                        -output_signif  => 1,
                           -file   => ">>$infile.blast.gff");
$out_gff->write_result($result);
$out_gff_whole->write_result($result);
}


Where $result is a blast result...

The aim is to parse a multi-query blast report and split it into 
different queries and make another file with all the queries. I'm sure 
i'm forgetting something but I can't figure what...

The GFF file is produced, but I do get the error above...

2) Finally, there is a small bug but I don't think it comes from this 
module? The id attribute is printed out e.g iD=match_sequence31 with iD 
wrongly capitalised...

many thanks for your time
alexie

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From apapanicolaou at ice.mpg.de  Tue Mar  4 12:04:16 2008
From: apapanicolaou at ice.mpg.de (Alexie Papanicolaou)
Date: Tue, 04 Mar 2008 13:04:16 +0100
Subject: [Bioperl-l] Gbrowse.pm followup
Message-ID: <47CD3AC0.4080801@ice.mpg.de>

Oh the iD bug is fixed in the svn developer branch.

ta
a

-- 
--
Alexie Papanicolaou
Entomology
Max Planck Institute for Chemical Ecology
Hans Knoell Str 8
Jena 07745
Germany
Email apapanicolaou at ice.mpg.de
Tel +493641571561


From cjfields at uiuc.edu  Tue Mar  4 13:16:04 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 07:16:04 -0600
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <4A68AA28-E508-4257-86E1-393CA9B74082@uiuc.edu>

I have run into a number of problems with the GbrowseGFF module myself  
(I think I committed the ID fix, actually).  It works but needs  
revision and needs better conformity with GFF3.

You can post (1) as a bug and well look into it when we can.  It's  
possible (depending on how extensive the fix is) this may have to wait  
until 1.7.

chris

On Mar 4, 2008, at 6:03 AM, Alexie Papanicolaou wrote:

> hello all,
>
> 1) I was wondering if you would you know what this error means and  
> had time to help...
>
> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>          else {
>              $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>          }
>
> this is the header
> # $Id: GbrowseGFF.pm,v 1.15.4.1 2006/10/02 23:10:27 sendu Exp $
> #
> # BioPerl module Bio::SearchIO::Writer::GbrowseGFF.pm
>
>
> this is how I call it... ( 2.6.18-6-amd64, x86_64, perl, v5.8.8,  
> bioperl: tried with both 1.5.2_102 from cvs and checked out svn  
> version today)
>
> use Bio::SearchIO::Writer::GbrowseGFF;
> use Bio::SearchIO;
> if ($program eq "blastn"){
> #my $out_gff  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                      -file   => ">$infile.$query.blast.gff");
> #my $out_gff_whole  = new Bio::SearchIO(-writer => $writer_gff,
> my $out_gff_whole = new Bio::SearchIO(-output_format  => 'GbrowseGFF',
>                      -output_cigar   => 1,
>                      -output_signif  => 1,
>                         -file   => ">>$infile.blast.gff");
> $out_gff->write_result($result);
> $out_gff_whole->write_result($result);
> }
>
>
>
> Where $result is a blast result...
>
> The aim is to parse a multi-query blast report and split it into  
> different queries and make another file with all the queries. I'm  
> sure i'm forgetting something but I can't figure what...
>
> The GFF file is produced, but I do get the error above...
>
> 2) Finally, there is a small bug but I don't think it comes from  
> this module? The id attribute is printed out e.g iD=match_sequence31  
> with iD wrongly capitalised...
>
> many thanks for your time
> alexie
>
> -- 
> --
> Alexie Papanicolaou
> Entomology
> Max Planck Institute for Chemical Ecology
> Hans Knoell Str 8
> Jena 07745
> Germany
> Email apapanicolaou at ice.mpg.de
> Tel +493641571561
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 12:35:03 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 13:35:03 +0100
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
	alignment
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
Message-ID: <47CD41F7.2000401@medecine.unige.ch>

Hello,

Is it possible to remove only columns containing e.g. more than 75% gaps 
from an alignment? I was thinking at

$aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])

This would allow me to remove all gaps or gap-only columns but not using 
a threshold.

Greetings, Daniel


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 13:46:33 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 14:46:33 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
Message-ID: <47CD52B9.5060906@medecine.unige.ch>

Hello,

I would like to use bioperl to calculate a branch length score for a 
given set of nodes and a tree. I know how to get the total branch length 
by using $tree->total_branch_length, but how could I get the length of 
the subtree spanning some given nodes which are dispersed over the whole 
tree (a subset of nodes from the tree which are not monophyletic)?

Greetings, Daniel


From bix at sendu.me.uk  Tue Mar  4 14:37:53 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:37:53 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD52B9.5060906@medecine.unige.ch>
References: <47CD52B9.5060906@medecine.unige.ch>
Message-ID: <47CD5EC1.2020103@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> I would like to use bioperl to calculate a branch length score for a 
> given set of nodes and a tree. I know how to get the total branch length 
> by using $tree->total_branch_length, but how could I get the length of 
> the subtree spanning some given nodes which are dispersed over the whole 
> tree (a subset of nodes from the tree which are not monophyletic)?

One 'cheat' way of doing it might be to use splice(-keep_ids => 
\@node_ids) or similar, then run total_branch_length() on that. No idea 
if it will actually give you the right answer though. Let us know! :)


From bix at sendu.me.uk  Tue Mar  4 14:26:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 14:26:10 +0000
Subject: [Bioperl-l] Remove columns containing more than 75% gaps in an
 alignment
In-Reply-To: <47CD41F7.2000401@medecine.unige.ch>
References: <200502151616.j1FGGnKr023827@portal.open-bio.org>
	<47CD41F7.2000401@medecine.unige.ch>
Message-ID: <47CD5C02.8060306@sendu.me.uk>

Daniel Gerlach wrote:
> Hello,
> 
> Is it possible to remove only columns containing e.g. more than 75% gaps 
> from an alignment? I was thinking at
> 
> $aln2 = $aln->remove_gaps('-'[,$all_gaps_columns])
> 
> This would allow me to remove all gaps or gap-only columns but not using 
> a threshold.

Well, you can use gap_col_matrix() to decide which columns you don't 
want, and then use remove_columns().


From hlapp at gmx.net  Tue Mar  4 15:24:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 10:24:13 -0500
Subject: [Bioperl-l] Bio/SearchIO/Writer/GbrowseGFF.pm
In-Reply-To: <47CD3A8F.9050902@ice.mpg.de>
References: <47CD3A8F.9050902@ice.mpg.de>
Message-ID: <87808BE4-B6A3-4C7F-A6DC-42ED2686375B@gmx.net>


On Mar 4, 2008, at 7:03 AM, Alexie Papanicolaou wrote:

> Use of uninitialized value in concatenation (.) or string at /usr/ 
> local/share/perl/5.8.8/Bio/SearchIO/Writer/GbrowseGFF.pm line 287
>
> line 287 is
>            else {
>                $tags{'Target'} = "$prefix:$seqname $qpmax $qpmin";
>            }


Note that this is a warning, not an error. However, if none of  
$prefix, $seqname, $qpmax, $qpmin can be undefined (or be equal to an  
empty string, which they will default to if undefined) at this  
position, then there is a problem (and it is before the above line).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Mar  4 16:02:02 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 4 Mar 2008 11:02:02 -0500
Subject: [Bioperl-l] branch length score - total length of the spanning
	subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>


On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:

> Daniel Gerlach wrote:
>> Hello,
>> I would like to use bioperl to calculate a branch length score for  
>> a given set of nodes and a tree. I know how to get the total  
>> branch length by using $tree->total_branch_length, but how could I  
>> get the length of the subtree spanning some given nodes which are  
>> dispersed over the whole tree (a subset of nodes from the tree  
>> which are not monophyletic)?
>
> One 'cheat' way of doing it might be to use splice(-keep_ids =>  
> \@node_ids) or similar, then run total_branch_length() on that. No  
> idea if it will actually give you the right answer though. Let us  
> know! :)

Related to that, will contract_linear_paths() actually do the right  
thing and adjust branch lengths if it removes internal nodes with  
outdegree 1?

Rutger - does Bio::Phylo handle this correctly?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From Daniel.Gerlach at medecine.unige.ch  Tue Mar  4 16:12:53 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Tue, 04 Mar 2008 17:12:53 +0100
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <47CD5EC1.2020103@sendu.me.uk>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
Message-ID: <47CD7505.5080105@medecine.unige.ch>

Hello,

Thanks for the quick answer. I tried:

use Bio::TreeIO;
my $treeio = Bio::TreeIO->new(-format => 'newick',
                   -fh => \*DATA);
my $tree = $treeio->next_tree;
print $tree->total_branch_length,"\n";
$tree->splice(-keep_id => [A,B,E]);
print $tree->total_branch_length,"\n";

__DATA__
(((A:5,B:5)x:2,(C:4,D:4)y:1)z:3,E:10);

Which gives me the message "MSG: After splicing, the original root was 
removed but there are multiple candidates for the new root!" however the 
root E was not removed.

If I do it the complementary way by splicing out all unwanted nodes - 
splice(-remove_id => [C,D]) - I get what I want:

34
25

Greetings, Daniel

Sendu Bala wrote:
> Daniel Gerlach wrote:
>> Hello,
>>
>> I would like to use bioperl to calculate a branch length score for a 
>> given set of nodes and a tree. I know how to get the total branch 
>> length by using $tree->total_branch_length, but how could I get the 
>> length of the subtree spanning some given nodes which are dispersed 
>> over the whole tree (a subset of nodes from the tree which are not 
>> monophyletic)?
> 
> One 'cheat' way of doing it might be to use splice(-keep_ids => 
> \@node_ids) or similar, then run total_branch_length() on that. No idea 
> if it will actually give you the right answer though. Let us know! :)


From bix at sendu.me.uk  Tue Mar  4 16:37:47 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 04 Mar 2008 16:37:47 +0000
Subject: [Bioperl-l] branch length score - total length of the spanning
 subtree
In-Reply-To: <BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
References: <47CD52B9.5060906@medecine.unige.ch> <47CD5EC1.2020103@sendu.me.uk>
	<BC04FC01-B12E-43A0-9A8A-B461667BCAA8@gmx.net>
Message-ID: <47CD7ADB.6050808@sendu.me.uk>

Hilmar Lapp wrote:
> 
> On Mar 4, 2008, at 9:37 AM, Sendu Bala wrote:
> 
>> Daniel Gerlach wrote:
>>> Hello,
>>> I would like to use bioperl to calculate a branch length score for a 
>>> given set of nodes and a tree. I know how to get the total branch 
>>> length by using $tree->total_branch_length, but how could I get the 
>>> length of the subtree spanning some given nodes which are dispersed 
>>> over the whole tree (a subset of nodes from the tree which are not 
>>> monophyletic)?
>>
>> One 'cheat' way of doing it might be to use splice(-keep_ids => 
>> \@node_ids) or similar, then run total_branch_length() on that. No 
>> idea if it will actually give you the right answer though. Let us 
>> know! :)
> 
> Related to that, will contract_linear_paths() actually do the right 
> thing and adjust branch lengths if it removes internal nodes with 
> outdegree 1?

I think ultimately it boils down to remove_Descendent() being called as 
appropriate which does the branch length alteration. From a glance I 
can't answer your question with certainly, but it 'should' do the right 
thing. It needs to be tested; when I implemented these things I was only 
concerned with tree topology, not branch lengths or anything else.


From David.Messina at sbc.su.se  Tue Mar  4 20:47:06 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 4 Mar 2008 21:47:06 +0100
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
Message-ID: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>

> where do i find the
> correct names of the keys to create the input hash?


I've never used this module, but from a quick look at the code it appears to
pass on any parameters to palindrome.

I'm guessing you've already done this, but have you tried using the
parameter names and values that palindrome itself asks for?


Dave


From cjfields at uiuc.edu  Tue Mar  4 21:34:21 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 4 Mar 2008 15:34:21 -0600
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
Message-ID: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>

I don't know what the current status is for OBDA, but we have several  
bugs listed for Bio::DB::Flat which need someone versed in OBDA to  
look at them (they are all interrelated):

http://bugzilla.open-bio.org/show_bug.cgi?id=2336
http://bugzilla.open-bio.org/show_bug.cgi?id=2337
http://bugzilla.open-bio.org/show_bug.cgi?id=2338
http://bugzilla.open-bio.org/show_bug.cgi?id=2339

If anyone has any input I would greatly appreciate it.  I have been  
trying to stomp as many bugs as possible so we can work on a new  
release.

chris


From bosborne11 at verizon.net  Tue Mar  4 21:42:05 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 04 Mar 2008 16:42:05 -0500
Subject: [Bioperl-l] OBDA, Bio::DB::Flat, and bioperl
In-Reply-To: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
References: <FDC728F3-F96C-4072-B448-2F7BF0D11922@uiuc.edu>
Message-ID: <BAD3A0C8-69C4-45FF-A666-527289428A58@verizon.net>

Chris,

I'll take a look at them this weekend.

Brian O.


On Mar 4, 2008, at 4:34 PM, Chris Fields wrote:

> I don't know what the current status is for OBDA, but we have  
> several bugs listed for Bio::DB::Flat which need someone versed in  
> OBDA to look at them (they are all interrelated):
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2336
> http://bugzilla.open-bio.org/show_bug.cgi?id=2337
> http://bugzilla.open-bio.org/show_bug.cgi?id=2338
> http://bugzilla.open-bio.org/show_bug.cgi?id=2339
>
> If anyone has any input I would greatly appreciate it.  I have been  
> trying to stomp as many bugs as possible so we can work on a new  
> release.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From anjan.purkayastha at gmail.com  Tue Mar  4 23:52:09 2008
From: anjan.purkayastha at gmail.com (ANJAN PURKAYASTHA)
Date: Tue, 4 Mar 2008 18:52:09 -0500
Subject: [Bioperl-l] problem with Bio::Tools::EMBOSS
In-Reply-To: <628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
References: <f03f40920803030931r7c97159j2d6a4b74e2917184@mail.gmail.com>
	<47CC509A.10306@campus.iztacala.unam.mx>
	<f03f40920803031157oda9e0ffubc3bfb9070022238@mail.gmail.com>
	<628aabb70803041247l6c5a52d6n5cab24e7059f15fb@mail.gmail.com>
Message-ID: <f03f40920803041552u49a68a64gf31b90cd0159ce19@mail.gmail.com>

guys,
thanks for all your inputs.
i went to the following site:
http://www.koders.com/perl/fid5F28A3DDD453F0DB4995B7DDF304B02DBBACE0A0.aspx?s=calculate

they have the key names for most of the emboss programs.

thanks,
anjan

On Tue, Mar 4, 2008 at 3:47 PM, Dave Messina <David.Messina at sbc.su.se>
wrote:

>
> where do i find the
> > correct names of the keys to create the input hash?
>
>
>
> I've never used this module, but from a quick look at the code it appears
> to pass on any parameters to palindrome.
>
> I'm guessing you've already done this, but have you tried using the
> parameter names and values that palindrome itself asks for?
>
>
> Dave
>
>


-- 
ANJAN PURKAYASTHA, PhD.
Senior Computational Biologist
==========================

1101 King Street, Suite 310,
Alexandria, VA 22314.
703.518.8040 (office)
703.740.6939 (mobile)

email:
anjan at vbi.vt.edu;
anjan.purkayastha at gmail.com

http://www.vbi.vt.edu

==========================


From staffa at niehs.nih.gov  Wed Mar  5 23:43:30 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 18:43:30 -0500
Subject: [Bioperl-l] SeqIO
Message-ID: <C3F49A52.88AD%staffa@niehs.nih.gov>

So the Howto says that Bio::SeqIO will read almost any known format
including GCG. 
So I create a GCG file with Seqlab and try to printout its sequence as a
string. ( I did guess at the way to get the sequence string:

#!/usr/bin/perl -w
use strict;
$| = 1;
use Bio::SeqIO;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
my $seqio_object = Bio::SeqIO->new(-file => $file);
my $seq_object = $seqio_object->next_seq;
my $sequence = $seq_object->seq;
print "$sequence\n";
my $status = &windowscore($sequence);
}

But what it returned was the entire contents of the file with no format
decoding. Have I been deluded?

NewDNALength:810March5,200818:26Type:NCheck:3368..1TGTTCGAATTCCGTGCGGTCCACCT
CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From cjfields at uiuc.edu  Thu Mar  6 02:22:53 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 20:22:53 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>

I thought GCG format changed somewhere along the way but I maybe I'm  
wrong?  Regardless, you'll have to post this as a bug (along with an  
example file).

Also, kind of odd that the sequence data wasn't checked...

chris

On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Thu Mar  6 02:33:48 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 5 Mar 2008 18:33:48 -0800
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>

probably you should try specifying the format explicitly first- as in  
(-format => 'gcg')

-j
On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:

> I thought GCG format changed somewhere along the way but I maybe  
> I'm wrong?  Regardless, you'll have to post this as a bug (along  
> with an example file).
>
> Also, kind of odd that the sequence data wasn't checked...
>
> chris
>
> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>
>> So the Howto says that Bio::SeqIO will read almost any known format
>> including GCG.
>> So I create a GCG file with Seqlab and try to printout its  
>> sequence as a
>> string. ( I did guess at the way to get the sequence string:
>>
>> #!/usr/bin/perl -w
>> use strict;
>> $| = 1;
>> use Bio::SeqIO;
>> my $number_of_files = @ARGV;
>> if(!$number_of_files){print "no files entered\n";exit:}
>> foreach my $file (@ARGV){
>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>> my $seq_object = $seqio_object->next_seq;
>> my $sequence = $seq_object->seq;
>> print "$sequence\n";
>> my $status = &windowscore($sequence);
>> }
>>
>> But what it returned was the entire contents of the file with no  
>> format
>> decoding. Have I been deluded?
>>
>> NewDNALength:810March5,200818:26Type:NCheck: 
>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG 
>> CGAAGGT
>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC 
>> GGCTGCT
>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT 
>> GCAGAGC
>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG 
>> GCCAGCG
>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG 
>> TCCCCTG
>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4 
>> 51GGCAG
>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG 
>> AGACATC
>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG 
>> CCGCCC6
>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT 
>> TCATGCG
>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG 
>> CAGCCGC
>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA 
>> GGG
>>
>>
>>
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Mar  6 02:01:07 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:01:07 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F49A52.88AD%staffa@niehs.nih.gov>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
Message-ID: <19DC527F-3D34-4F3E-9B4C-D2C6011A2C8F@verizon.net>

Nick,

Take a look at the GCG files that are used in the SeqIO tests:

bioperl-live//t/data/test.gcg
bioperl-live//t/data/test_badlf.gcg

Does the file that you created have a format like the format in those  
files? I'm guessing you're going to say 'yes', from the looks of your  
output.

Brian O.


On Mar 5, 2008, at 6:43 PM, Staffa, Nick (NIH/NIEHS) wrote:

> So the Howto says that Bio::SeqIO will read almost any known format
> including GCG.
> So I create a GCG file with Seqlab and try to printout its sequence  
> as a
> string. ( I did guess at the way to get the sequence string:
>
> #!/usr/bin/perl -w
> use strict;
> $| = 1;
> use Bio::SeqIO;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $seqio_object = Bio::SeqIO->new(-file => $file);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> my $status = &windowscore($sequence);
> }
>
> But what it returned was the entire contents of the file with no  
> format
> decoding. Have I been deluded?
>
> NewDNALength:810March5,200818:26Type:NCheck: 
> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGGCGAAGGT
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGCGGCTGCT
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGTGCAGAGC
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGGGCCAGCG
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAGTCCCCTG
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG451GGCAG
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGGAGACATC
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGGCCGCCC6
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCTTCATGCG
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCGCAGCCGC
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCAGGG
>
>
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From staffa at niehs.nih.gov  Thu Mar  6 03:09:11 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Wed, 05 Mar 2008 22:09:11 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <797E42FC-C59F-4431-BAF1-11D3FAE9F9D0@bioperl.org>
Message-ID: <C3F4CA87.8074%staffa@niehs.nih.gov>

Verily,
One interpretation of the docs might be: will read any format if the format
is specified.
I was hoping that I could write a program that one needn't specify format.
It'd be more user-friendly and useful.


On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:

> probably you should try specifying the format explicitly first- as in
> (-format => 'gcg')
> 
> -j
> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> 
>> I thought GCG format changed somewhere along the way but I maybe
>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>> with an example file).
>> 
>> Also, kind of odd that the sequence data wasn't checked...
>> 
>> chris
>> 
>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>> 
>>> So the Howto says that Bio::SeqIO will read almost any known format
>>> including GCG.
>>> So I create a GCG file with Seqlab and try to printout its
>>> sequence as a
>>> string. ( I did guess at the way to get the sequence string:
>>> 
>>> #!/usr/bin/perl -w
>>> use strict;
>>> $| = 1;
>>> use Bio::SeqIO;
>>> my $number_of_files = @ARGV;
>>> if(!$number_of_files){print "no files entered\n";exit:}
>>> foreach my $file (@ARGV){
>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>> my $seq_object = $seqio_object->next_seq;
>>> my $sequence = $seq_object->seq;
>>> print "$sequence\n";
>>> my $status = &windowscore($sequence);
>>> }
>>> 
>>> But what it returned was the entire contents of the file with no
>>> format
>>> decoding. Have I been deluded?
>>> 
>>> NewDNALength:810March5,200818:26Type:NCheck:
>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>> CGAAGGT
>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>> GGCTGCT
>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>> GCAGAGC
>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>> GCCAGCG
>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>> TCCCCTG
>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>> 51GGCAG
>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>> AGACATC
>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>> CCGCCC6
>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>> TCATGCG
>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>> CAGCCGC
>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>> GGG
>>> 
>>> 
>>> 
>>> Nick Staffa
>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>> Scientific Computing Support Group
>>> NIEHS Information Technology Support Services Contract
>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>> National Institute of Environmental Health Sciences
>>> National Institutes of Health
>>> Research Triangle Park, North Carolina
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at uiuc.edu  Thu Mar  6 03:44:14 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:44:14 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
	<1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>
Message-ID: <9146DF9D-C0D6-4F18-9B7E-7BB42FCE0737@uiuc.edu>

Heh, good one!  Though Jason may have worked out the issue (not  
indicating the format explicitly).  Would be worth looking at the  
tested files.

As for dinosaurs, well I can't talk ...

chris

On Mar 5, 2008, at 8:49 PM, Brian Osborne wrote:

> Chris,
>
> Many many years ago, when dinosaurs roamed the earth, only about  
> half of the formats had their own tests. A primitive being saw this  
> and created simple tests for all the 'missing' formats. His thought  
> probably was 'this is better than nothing'. In fact this being  
> assumed that GCG was an outdated and unused format, even as long ago  
> as that time was.
>
> The origins of so much of what we now know as 'Bioperl' are  
> frequently mysterious, or incomprehensible to modern day humans...
>
> Brian O.
>
> On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:
>
>> Also, kind of odd that the sequence data wasn't checked...


From bosborne11 at verizon.net  Thu Mar  6 02:49:26 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 05 Mar 2008 21:49:26 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
References: <C3F49A52.88AD%staffa@niehs.nih.gov>
	<C057CBE1-87E5-4195-94D6-60E706511F4E@uiuc.edu>
Message-ID: <1B139406-897F-496F-8709-FCAAD4EFEDE3@verizon.net>

Chris,

Many many years ago, when dinosaurs roamed the earth, only about half  
of the formats had their own tests. A primitive being saw this and  
created simple tests for all the 'missing' formats. His thought  
probably was 'this is better than nothing'. In fact this being assumed  
that GCG was an outdated and unused format, even as long ago as that  
time was.

The origins of so much of what we now know as 'Bioperl' are frequently  
mysterious, or incomprehensible to modern day humans...

Brian O.

On Mar 5, 2008, at 9:22 PM, Chris Fields wrote:

> Also, kind of odd that the sequence data wasn't checked...


From cjfields at uiuc.edu  Thu Mar  6 03:54:15 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 5 Mar 2008 21:54:15 -0600
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <67C6AE9D-3934-4717-A97A-4C31DB4F7E33@uiuc.edu>

You can leave off the format, but you must append the correct file  
extension for the parser to determine the correct format ('.gcg' for  
GCG, for example).  There is also Bio::Tools::GuessSeqFormat though it  
doesn't cover all formats.

chris

On Mar 5, 2008, at 9:09 PM, Staffa, Nick (NIH/NIEHS) wrote:

> Verily,
> One interpretation of the docs might be: will read any format if the  
> format
> is specified.
> I was hoping that I could write a program that one needn't specify  
> format.
> It'd be more user-friendly and useful.
>
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>
>> probably you should try specifying the format explicitly first- as in
>> (-format => 'gcg')
>>
>> -j
>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>
>>> I thought GCG format changed somewhere along the way but I maybe
>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>> with an example file).
>>>
>>> Also, kind of odd that the sequence data wasn't checked...
>>>
>>> chris
>>>
>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>
>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>> including GCG.
>>>> So I create a GCG file with Seqlab and try to printout its
>>>> sequence as a
>>>> string. ( I did guess at the way to get the sequence string:
>>>>
>>>> #!/usr/bin/perl -w
>>>> use strict;
>>>> $| = 1;
>>>> use Bio::SeqIO;
>>>> my $number_of_files = @ARGV;
>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>> foreach my $file (@ARGV){
>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>> my $seq_object = $seqio_object->next_seq;
>>>> my $sequence = $seq_object->seq;
>>>> print "$sequence\n";
>>>> my $status = &windowscore($sequence);
>>>> }
>>>>
>>>> But what it returned was the entire contents of the file with no
>>>> format
>>>> decoding. Have I been deluded?
>>>>
>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>> CGAAGGT
>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>> GGCTGCT
>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>> GCAGAGC
>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>> GCCAGCG
>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>> TCCCCTG
>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>> 51GGCAG
>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>> AGACATC
>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>> CCGCCC6
>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>> TCATGCG
>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>> CAGCCGC
>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>> GGG
>>>>
>>>>
>>>>
>>>> Nick Staffa
>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>> Scientific Computing Support Group
>>>> NIEHS Information Technology Support Services Contract
>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>> National Institute of Environmental Health Sciences
>>>> National Institutes of Health
>>>> Research Triangle Park, North Carolina
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at gmail.com  Thu Mar  6 08:16:25 2008
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 6 Mar 2008 16:16:25 +0800
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
Message-ID: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>

Dear experts,

Is there any? The TRANSFAC text file which contain entry like this.
Especially we wich to capture the PWM for each of the Transcription
factor.

Regards,
Edward

__BEGIN__
VV  TRANSFAC MATRIX TABLE, Release 11.1 - licensed - 2007-03-31, (C)
Biobase GmbH
XX
//
AC  M00001
XX
ID  V$MYOD_01
XX
DT  19.10.1992 (created); ewi.
DT  22.10.1997 (updated); dbo.
CO  Copyright (C), Biobase GmbH.
XX
NA  MyoD
XX
DE  myoblast determination gene product
XX
BF  T00526; MyoD; Species: mouse, Mus musculus.
BF  T09177; MyoD; Species: mouse, Mus musculus.
XX
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G

....etc....


From watashi at post.com  Thu Mar  6 12:06:42 2008
From: watashi at post.com (Masa Masa)
Date: Thu, 6 Mar 2008 07:06:42 -0500
Subject: [Bioperl-l] failure of add_seqfeature
Message-ID: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>

Dear experts,

Would anybody know why the following codes generate an error of:


------------- EXCEPTION  -------------
MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained within parent feature, and expansion is not valid
STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
STACK toplevel test.pl:118

--------------------------------------
15616 15693
79568 83016

=================


use Bio::Graphics;                                                                    
use Bio::SeqFeature::Generic;                                                         
use Bio::SeqIO;                         


my $bsg = 'Bio::SeqFeature::Generic';                                                 
                                                                                      
my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], -display_name=>'U');      
                                                                                      
for (my $i=0; $i < @from; $i++) {                                                     
print "$from[$i] $to[$i]\n";                                                            
    $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to[$i]));            
    if ($i > 10) {                                                                    
       exit;                                                                                 
    }
}                

-- 
Want an e-mail address like mine?
Get a free e-mail account today at www.mail.com!


From heikki at sanbi.ac.za  Thu Mar  6 12:20:03 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 6 Mar 2008 14:20:03 +0200
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F4CA87.8074%staffa@niehs.nih.gov>
References: <C3F4CA87.8074%staffa@niehs.nih.gov>
Message-ID: <200803061420.04123.heikki@sanbi.ac.za>


Nick,

This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:

/Length: .*Type: .*Check: .*\.\.$/

It is the second  line in GCG file. If first line matches to some other format 
regex, this will not not be evaluated.

Let us know,

	-Heikki

On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> Verily,
> One interpretation of the docs might be: will read any format if the format
> is specified.
> I was hoping that I could write a program that one needn't specify format.
> It'd be more user-friendly and useful.
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> > probably you should try specifying the format explicitly first- as in
> > (-format => 'gcg')
> >
> > -j
> >
> > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >> I thought GCG format changed somewhere along the way but I maybe
> >> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >> with an example file).
> >>
> >> Also, kind of odd that the sequence data wasn't checked...
> >>
> >> chris
> >>
> >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>> So the Howto says that Bio::SeqIO will read almost any known format
> >>> including GCG.
> >>> So I create a GCG file with Seqlab and try to printout its
> >>> sequence as a
> >>> string. ( I did guess at the way to get the sequence string:
> >>>
> >>> #!/usr/bin/perl -w
> >>> use strict;
> >>> $| = 1;
> >>> use Bio::SeqIO;
> >>> my $number_of_files = @ARGV;
> >>> if(!$number_of_files){print "no files entered\n";exit:}
> >>> foreach my $file (@ARGV){
> >>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>> my $seq_object = $seqio_object->next_seq;
> >>> my $sequence = $seq_object->seq;
> >>> print "$sequence\n";
> >>> my $status = &windowscore($sequence);
> >>> }
> >>>
> >>> But what it returned was the entire contents of the file with no
> >>> format
> >>> decoding. Have I been deluded?
> >>>
> >>> NewDNALength:810March5,200818:26Type:NCheck:
> >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>> CGAAGGT
> >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>> GGCTGCT
> >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>> GCAGAGC
> >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>> GCCAGCG
> >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>> TCCCCTG
> >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>> 51GGCAG
> >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>> AGACATC
> >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>> CCGCCC6
> >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>> TCATGCG
> >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>> CAGCCGC
> >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>> GGG
> >>>
> >>>
> >>>
> >>> Nick Staffa
> >>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>> Scientific Computing Support Group
> >>> NIEHS Information Technology Support Services Contract
> >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>> National Institute of Environmental Health Sciences
> >>> National Institutes of Health
> >>> Research Triangle Park, North Carolina
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From bix at sendu.me.uk  Thu Mar  6 13:07:21 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 13:07:21 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
Message-ID: <47CFEC89.1000705@sendu.me.uk>

Edward Wijaya wrote:
> Dear experts,
> 
> Is there any? The TRANSFAC text file which contain entry like this.
> Especially we wich to capture the PWM for each of the Transcription
> factor.

Yes; I've written a module to do this, I just haven't committed it yet 
because certain things aren't quite right in terms of the API. But to 
just grab the PWM it should work fine. If you want I can email you the 
modules.


From sdavis2 at mail.nih.gov  Thu Mar  6 13:40:25 2008
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 6 Mar 2008 08:40:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47CFEC89.1000705@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
Message-ID: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>

On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> Edward Wijaya wrote:
>  > Dear experts,
>  >
>  > Is there any? The TRANSFAC text file which contain entry like this.
>  > Especially we wich to capture the PWM for each of the Transcription
>  > factor.
>
>  Yes; I've written a module to do this, I just haven't committed it yet
>  because certain things aren't quite right in terms of the API. But to
>  just grab the PWM it should work fine. If you want I can email you the
>  modules.

I believe there are a set of non-bioperl modules called TFBS.  See
here (although I'm not sure this is the most up-to-date site):

http://tfbs.genereg.net/

Sean


From David.Messina at sbc.su.se  Thu Mar  6 14:55:24 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Thu, 6 Mar 2008 15:55:24 +0100
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <628aabb70803060655k5245296etf5ee2f31755230d3@mail.gmail.com>

Hi Masa,

Could you give us a little more information?

A complete test case (the code you included doesn't run because for example
the @from array doesn't exist) and input file would be helpful, as well as
the version of BioPerl you are using.

Dave


From staffa at niehs.nih.gov  Thu Mar  6 15:23:34 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 10:23:34 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <200803061420.04123.heikki@sanbi.ac.za>
Message-ID: <C3F576A6.88C9%staffa@niehs.nih.gov>

Here's the scoop:
When I use Jason's suggestion, (-format => 'gcg'),
My program works without complaint on the original file that looks like:
!!NA_SEQUENCE 1.0
   NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..

       1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
et c.

BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
(which should be retro-gcg format (before version 11?)),
my program runs, but there IS a complaint:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line 1.
BUT 
If I remove (-format => 'gcg'),  I get no complaint, but the sequence
returned still has its numbers imbedded. This effects my calculations.

Thanks, at least i know what my options are.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:

> 
> Nick,
> 
> This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:
> 
> /Length: .*Type: .*Check: .*\.\.$/
> 
> It is the second  line in GCG file. If first line matches to some other format
> regex, this will not not be evaluated.
> 
> Let us know,
> 
> -Heikki
> 
> On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
>> Verily,
>> One interpretation of the docs might be: will read any format if the format
>> is specified.
>> I was hoping that I could write a program that one needn't specify format.
>> It'd be more user-friendly and useful.
>> 
>> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>>> probably you should try specifying the format explicitly first- as in
>>> (-format => 'gcg')
>>> 
>>> -j
>>> 
>>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>>> I thought GCG format changed somewhere along the way but I maybe
>>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>>> with an example file).
>>>> 
>>>> Also, kind of odd that the sequence data wasn't checked...
>>>> 
>>>> chris
>>>> 
>>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>>> including GCG.
>>>>> So I create a GCG file with Seqlab and try to printout its
>>>>> sequence as a
>>>>> string. ( I did guess at the way to get the sequence string:
>>>>> 
>>>>> #!/usr/bin/perl -w
>>>>> use strict;
>>>>> $| = 1;
>>>>> use Bio::SeqIO;
>>>>> my $number_of_files = @ARGV;
>>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>>> foreach my $file (@ARGV){
>>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>>> my $seq_object = $seqio_object->next_seq;
>>>>> my $sequence = $seq_object->seq;
>>>>> print "$sequence\n";
>>>>> my $status = &windowscore($sequence);
>>>>> }
>>>>> 
>>>>> But what it returned was the entire contents of the file with no
>>>>> format
>>>>> decoding. Have I been deluded?
>>>>> 
>>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>>> CGAAGGT
>>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>>> GGCTGCT
>>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>>> GCAGAGC
>>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>>> GCCAGCG
>>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>>> TCCCCTG
>>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>>> 51GGCAG
>>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>>> AGACATC
>>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>>> CCGCCC6
>>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>>> TCATGCG
>>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>>> CAGCCGC
>>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>>> GGG
>>>>> 
>>>>> 
>>>>> 
>>>>> Nick Staffa
>>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>>> Scientific Computing Support Group
>>>>> NIEHS Information Technology Support Services Contract
>>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>>> National Institute of Environmental Health Sciences
>>>>> National Institutes of Health
>>>>> Research Triangle Park, North Carolina
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From hlapp at gmx.net  Thu Mar  6 15:26:52 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 10:26:52 -0500
Subject: [Bioperl-l] failure of add_seqfeature
In-Reply-To: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
References: <20080306120642.4800F16427A@ws1-4.us4.outblaze.com>
Message-ID: <6BD917FC-803E-471B-A0C4-219286E53C47@gmx.net>

It seems you are adding subfeatures with a location that is not  
within their parent feature location. If that's indeed what you want  
to do, add the 'EXPAND' argument.

Excerpted from the POD of Bio::SeqFeature::Generic:

         Usage   : $feat->add_SeqFeature($subfeat);
                   $feat->add_SeqFeature($subfeat,'EXPAND')
         Function: adds a SeqFeature into the subSeqFeature array.
                   with no 'EXPAND' qualifer, subfeat will be tested
                   as to whether it lies inside the parent, and throw
                   an exception if not.

                   If EXPAND is used, the parent's start/end/strand will
                   be adjusted so that it grows to accommodate the new
                   subFeature


On Mar 6, 2008, at 7:06 AM, Masa Masa wrote:

> Dear experts,
>
> Would anybody know why the following codes generate an error of:
>
>
> ------------- EXCEPTION  -------------
> MSG: Bio::SeqFeature::Generic=HASH(0x94583c0) is not contained  
> within parent feature, and expansion is not valid
> STACK Bio::SeqFeature::Generic::add_SeqFeature /usr/lib/perl5/ 
> site_perl/5.8.0/Bio/SeqFeature/Generic.pm:767
> STACK toplevel test.pl:118
>
> --------------------------------------
> 15616 15693
> 79568 83016
>
> =================
>
>
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use Bio::SeqIO;
>
>
> my $bsg = 'Bio::SeqFeature::Generic';
>
> my $unseqfea = $bsg->new( -start=>$from[$i], -end=>$to[$i], - 
> display_name=>'U');
>
> for (my $i=0; $i < @from; $i++) {
> print "$from[$i] $to[$i]\n";
>     $unseqfea->add_SeqFeature($bsg->new(-start=>$from[$i],-end=>$to 
> [$i]));
>     if ($i > 10) {
>        exit;
>     }
> }
>
> -- 
> Want an e-mail address like mine?
> Get a free e-mail account today at www.mail.com!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Thu Mar  6 15:41:49 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 06 Mar 2008 15:41:49 +0000
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>	
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
Message-ID: <47D010BD.4000801@sendu.me.uk>

Sean Davis wrote:
> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>> Edward Wijaya wrote:
>>  > Dear experts,
>>  >
>>  > Is there any? The TRANSFAC text file which contain entry like this.
>>  > Especially we wich to capture the PWM for each of the Transcription
>>  > factor.
>>
>>  Yes; I've written a module to do this, I just haven't committed it yet
>>  because certain things aren't quite right in terms of the API. But to
>>  just grab the PWM it should work fine. If you want I can email you the
>>  modules.
> 
> I believe there are a set of non-bioperl modules called TFBS.  See
> here (although I'm not sure this is the most up-to-date site):
> 
> http://tfbs.genereg.net/

I believe it's out of date enough to not work on the latest Transfac 
data, though I haven't used tried to confirm.

At any rate, the Transfac (Pro) database is pretty strange and 
complicated, and the TFBS modules certainly don't let you access 
everything in the way you might want or expect.


From cain.cshl at gmail.com  Thu Mar  6 16:43:35 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:43:35 -0500
Subject: [Bioperl-l] anonymous cvs?
Message-ID: <1204821815.6689.7.camel@frissell>

Hi All,

So now that the transition to svn is complete (and I like it), should
anonymous cvs still be working?  I believe there was discussion about
keeping it going via mirroring, and I hope that is the case.  It will
make life a little easier for people who want to do automated installs
of GBrowse and would like to use the installer script to get bioperl via
anon cvs.  If anon cvs is no longer available, does anyone have
suggestions for the best route to take for getting command line svn on
Windows?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 16:48:08 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 11:48:08 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <1204822088.6689.8.camel@frissell>

I should have mentioned that I tried it and it is not currently working:

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl checkout bioperl-live 
can't create temporary directory /tmp/cvs-serv32067
No space left on device


On Thu, 2008-03-06 at 11:43 -0500, Scott Cain wrote:
> Hi All,
> 
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
> 
> Thanks,
> Scott
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From Marc.Logghe at ablynx.com  Thu Mar  6 16:22:10 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 6 Mar 2008 17:22:10 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F576A6.88C9%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>

Hi Nick,
I don't think you should leave out the -format option. You have to leave
it in but the format should be provided by the B::T::GuessSeqFormat
object.
Something like:

#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::Tools::GuessSeqFormat;

$| = 1;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
  my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
  my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
$guesser->guess);
  my $seq_object = $seqio_object->next_seq;
  my $sequence = $seq_object->seq;
  print "$sequence\n";
}

HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> Sent: donderdag 6 maart 2008 16:24
> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Here's the scoop:
> When I use Jason's suggestion, (-format => 'gcg'),
> My program works without complaint on the original file that looks
like:
> !!NA_SEQUENCE 1.0
>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
> 
>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> et c.
> 
> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> (which should be retro-gcg format (before version 11?)),
> my program runs, but there IS a complaint:
> Use of uninitialized value in scalar chomp at
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
1.
> BUT
> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
> returned still has its numbers imbedded. This effects my calculations.
> 
> Thanks, at least i know what my options are.
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:
> 
> >
> > Nick,
> >
> > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a
gcg
> file:
> >
> > /Length: .*Type: .*Check: .*\.\.$/
> >
> > It is the second  line in GCG file. If first line matches to some
other
> format
> > regex, this will not not be evaluated.
> >
> > Let us know,
> >
> > -Heikki
> >
> > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> >> Verily,
> >> One interpretation of the docs might be: will read any format if
the
> format
> >> is specified.
> >> I was hoping that I could write a program that one needn't specify
> format.
> >> It'd be more user-friendly and useful.
> >>
> >> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> >>> probably you should try specifying the format explicitly first- as
in
> >>> (-format => 'gcg')
> >>>
> >>> -j
> >>>
> >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >>>> I thought GCG format changed somewhere along the way but I maybe
> >>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >>>> with an example file).
> >>>>
> >>>> Also, kind of odd that the sequence data wasn't checked...
> >>>>
> >>>> chris
> >>>>
> >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>>>> So the Howto says that Bio::SeqIO will read almost any known
format
> >>>>> including GCG.
> >>>>> So I create a GCG file with Seqlab and try to printout its
> >>>>> sequence as a
> >>>>> string. ( I did guess at the way to get the sequence string:
> >>>>>
> >>>>> #!/usr/bin/perl -w
> >>>>> use strict;
> >>>>> $| = 1;
> >>>>> use Bio::SeqIO;
> >>>>> my $number_of_files = @ARGV;
> >>>>> if(!$number_of_files){print "no files entered\n";exit:}
> >>>>> foreach my $file (@ARGV){
> >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>>>> my $seq_object = $seqio_object->next_seq;
> >>>>> my $sequence = $seq_object->seq;
> >>>>> print "$sequence\n";
> >>>>> my $status = &windowscore($sequence);
> >>>>> }
> >>>>>
> >>>>> But what it returned was the entire contents of the file with no
> >>>>> format
> >>>>> decoding. Have I been deluded?
> >>>>>
> >>>>> NewDNALength:810March5,200818:26Type:NCheck:
> >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>>>>
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>>>> CGAAGGT
> >>>>>
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>>>> GGCTGCT
> >>>>>
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>>>> GCAGAGC
> >>>>>
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>>>> GCCAGCG
> >>>>>
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>>>> TCCCCTG
> >>>>>
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>>>> 51GGCAG
> >>>>>
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>>>> AGACATC
> >>>>>
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>>>> CCGCCC6
> >>>>>
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>>>> TCATGCG
> >>>>>
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>>>> CAGCCGC
> >>>>>
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>>>> GGG
> >>>>>
> >>>>>
> >>>>>
> >>>>> Nick Staffa
> >>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>>>> Scientific Computing Support Group
> >>>>> NIEHS Information Technology Support Services Contract
> >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>>>> National Institute of Environmental Health Sciences
> >>>>> National Institutes of Health
> >>>>> Research Triangle Park, North Carolina
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher
> >>>> Lab of Dr. Robert Switzer
> >>>> Dept of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From stefan.kirov at bms.com  Thu Mar  6 15:51:25 2008
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 06 Mar 2008 10:51:25 -0500
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <47D012FD.7090600@bms.com>

Sendu Bala wrote:
> Sean Davis wrote:
>> On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
>>> Edward Wijaya wrote:
>>> > Dear experts,
>>> >
>>> > Is there any? The TRANSFAC text file which contain entry like this.
>>> > Especially we wich to capture the PWM for each of the Transcription
>>> > factor.
>>>
>>> Yes; I've written a module to do this, I just haven't committed it yet
>>> because certain things aren't quite right in terms of the API. But to
>>> just grab the PWM it should work fine. If you want I can email you the
>>> modules.
>>
>> I believe there are a set of non-bioperl modules called TFBS. See
>> here (although I'm not sure this is the most up-to-date site):
>>
>> http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest Transfac
> data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange and
> complicated, and the TFBS modules certainly don't let you access
> everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
Also be careful: there is a difference between PFM and PWM. Getting PWM
through most programs I have encountered will assume random distribution
(0.25 per each position in the background), unless you specify your own.
This could be something you may be comfortable with, but you definitely
should be aware of.


From jay at jays.net  Thu Mar  6 17:03:51 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 06 Mar 2008 11:03:51 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <47D023F7.4000803@jays.net>

Scott Cain wrote:
> It will make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl via
> anon cvs.

Those installer scripts can't use anon SVN instead?

> If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>   

At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
there isn't an SVN equivalent?

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From whs at ebi.ac.uk  Thu Mar  6 17:08:51 2008
From: whs at ebi.ac.uk (William Spooner)
Date: Thu, 6 Mar 2008 17:08:51 +0000
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <07E3119E-0354-4E93-9980-3CB2B26DF2BE@ebi.ac.uk>

This will be important for Ensembl as well. As far as I know all of  
their install docs refer to BioPerl's anonymous CVS.

On 6 Mar 2008, at 16:43, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

---
William Spooner
Visiting Scientist
whs at ebi.ac.uk


From MEC at stowers-institute.org  Thu Mar  6 16:58:57 2008
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 6 Mar 2008 10:58:57 -0600
Subject: [Bioperl-l] BioPerl Module to Parse Transfac Flat File Database
In-Reply-To: <47D010BD.4000801@sendu.me.uk>
References: <3521d3670803060016r35ac720ar9f2190631ddaf629@mail.gmail.com>
	<47CFEC89.1000705@sendu.me.uk>
	<264855a00803060540u1f3d0f92pbab13349595a0eb3@mail.gmail.com>
	<47D010BD.4000801@sendu.me.uk>
Message-ID: <BD62CBAC4395B94096109020651BE2EC11CE393C@exchmb-02.stowers-institute.org>

we use TFBS all the time against data coming from a recent local install of TRANSFAC(r) Professional 11.1 (2007-03-31)

the most recent is 11.4 (2007-12-14)

TFBS::* has the nice advantage that you can interoperate Transfac pwms with other (say, Jaspar) matrices and/or simple consesus sequence patterns; and it COULD be fairly easily extended to allow interoperation with other sources, say cisRED.  "One interface to rule them all" - bwa ha ha.

However, if you DO have locally installed Transfac (Pro) ($$), and want to use just it, then you should know that you can also call their `match` routines from the unix command line (though this is not documented to my knowledge).  I can supply my cheat sheet or otherwise advise if desired.  Also, if you go this way, I've written the requisite TFMatchOut2GFF to convert TRANSFAC match's output to GFF, if it suits your purpose, which I could release if asked.

If you want to use TFBS::**, I have written a command-line wrapper for the TFBS perl modules that might give you a leg up if you decide to use TFBS::**.  I could release them too, if useful.

But I agree, if I recall, TFBS::* were dropped from ongoing active development due to issues with data access policys.  And, I think that they no longer with with remotely hosted Transfac.  They did a few years ago. I think I tested a while ago and found that they do not.

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala
> Sent: Thursday, March 06, 2008 9:42 AM
> To: Sean Davis
> Cc: bioperl-l at lists.open-bio.org; Edward Wijaya
> Subject: Re: [Bioperl-l] BioPerl Module to Parse Transfac
> Flat File Database
>
> Sean Davis wrote:
> > On Thu, Mar 6, 2008 at 8:07 AM, Sendu Bala <bix at sendu.me.uk> wrote:
> >> Edward Wijaya wrote:
> >>  > Dear experts,
> >>  >
> >>  > Is there any? The TRANSFAC text file which contain
> entry like this.
> >>  > Especially we wich to capture the PWM for each of the
> >> Transcription  > factor.
> >>
> >>  Yes; I've written a module to do this, I just haven't
> committed it
> >> yet  because certain things aren't quite right in terms of
> the API.
> >> But to  just grab the PWM it should work fine. If you want I can
> >> email you the  modules.
> >
> > I believe there are a set of non-bioperl modules called TFBS.  See
> > here (although I'm not sure this is the most up-to-date site):
> >
> > http://tfbs.genereg.net/
>
> I believe it's out of date enough to not work on the latest
> Transfac data, though I haven't used tried to confirm.
>
> At any rate, the Transfac (Pro) database is pretty strange
> and complicated, and the TFBS modules certainly don't let you
> access everything in the way you might want or expect.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Thu Mar  6 17:10:35 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:10:35 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204821815.6689.7.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
Message-ID: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>

BioPerl CVS is no longer being updated; you have to use Subversion to  
grab the latest (we have anon. svn set up for this).  We discussed  
syncing svn commits over to cvs but found it way too problematic and  
decided to make a clean break.

The best option I can think of as a replacement (so everyone isn't  
dependent on installing svn to get Gbrowse and bioperl-live) is to get  
a cron job set up which drops a bioperl-live archive into bioperl.org/ 
DIST or bioperl.org/SRC.  We have already talked about doing this for  
nightly builds from svn main trunk; we can probably set that up on our  
end.  Would that be feasible as a fallback in case svn isn't present?

The subversion project page has information on Windows versions:

http://subversion.tigris.org/project_packages.html

chris

On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:

> Hi All,
>
> So now that the transition to svn is complete (and I like it), should
> anonymous cvs still be working?  I believe there was discussion about
> keeping it going via mirroring, and I hope that is the case.  It will
> make life a little easier for people who want to do automated installs
> of GBrowse and would like to use the installer script to get bioperl  
> via
> anon cvs.  If anon cvs is no longer available, does anyone have
> suggestions for the best route to take for getting command line svn on
> Windows?
>
> Thanks,
> Scott
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cain.cshl at gmail.com  Thu Mar  6 17:22:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:22:29 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
Message-ID: <1204824149.6689.14.camel@frissell>

Hi Chris,

I think a nightly generated tarball would be sufficient for my use.  We
used anon cvs to get the lastest bioperl and then threw it away once it
was installed, so a tarball is just as good,if not better, since users
wouldn't need to install svn.  Not needing to install svn is good thing
for all my users, since I think many distributions do not supply it by
default.

Thanks,
Scott


On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
> BioPerl CVS is no longer being updated; you have to use Subversion to  
> grab the latest (we have anon. svn set up for this).  We discussed  
> syncing svn commits over to cvs but found it way too problematic and  
> decided to make a clean break.
> 
> The best option I can think of as a replacement (so everyone isn't  
> dependent on installing svn to get Gbrowse and bioperl-live) is to get  
> a cron job set up which drops a bioperl-live archive into bioperl.org/ 
> DIST or bioperl.org/SRC.  We have already talked about doing this for  
> nightly builds from svn main trunk; we can probably set that up on our  
> end.  Would that be feasible as a fallback in case svn isn't present?
> 
> The subversion project page has information on Windows versions:
> 
> http://subversion.tigris.org/project_packages.html
> 
> chris
> 
> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
> 
> > Hi All,
> >
> > So now that the transition to svn is complete (and I like it), should
> > anonymous cvs still be working?  I believe there was discussion about
> > keeping it going via mirroring, and I hope that is the case.  It will
> > make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl  
> > via
> > anon cvs.  If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >
> > Thanks,
> > Scott
> >
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cain.cshl at gmail.com  Thu Mar  6 17:28:13 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 06 Mar 2008 12:28:13 -0500
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <47D023F7.4000803@jays.net>
References: <1204821815.6689.7.camel@frissell>  <47D023F7.4000803@jays.net>
Message-ID: <1204824493.6689.19.camel@frissell>

Hi Jay,

It could use anon svn, though svn is considerably less ubiquitous, so it
effectively adds another prerequisite.

For cvs, the GUI WinCVS provides command line cvs as well.  I was
wondering if there was an easy to install equivalent for svn, though it
may be moot for me if the powers that be will provide a nightly
tarball :-)

Scott

On Thu, 2008-03-06 at 11:03 -0600, Jay Hannah wrote:
> Scott Cain wrote:
> > It will make life a little easier for people who want to do automated installs
> > of GBrowse and would like to use the installer script to get bioperl via
> > anon cvs.
> 
> Those installer scripts can't use anon SVN instead?
> 
> > If anon cvs is no longer available, does anyone have
> > suggestions for the best route to take for getting command line svn on
> > Windows?
> >   
> 
> At $work our Windows guys use GUIs for both CVS (repo dead this summer) 
> and SVN. Are there command-line (MS-DOS?) CVS clients for Windows? And 
> there isn't an SVN equivalent?
> 
> j
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Thu Mar  6 17:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 11:28:36 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <1204824149.6689.14.camel@frissell>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
Message-ID: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>

I'm working on the nightly build script now and will post back when  
everything is set up.

chris

On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:

> Hi Chris,
>
> I think a nightly generated tarball would be sufficient for my use.   
> We
> used anon cvs to get the lastest bioperl and then threw it away once  
> it
> was installed, so a tarball is just as good,if not better, since users
> wouldn't need to install svn.  Not needing to install svn is good  
> thing
> for all my users, since I think many distributions do not supply it by
> default.
>
> Thanks,
> Scott
>
>
>
> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>> BioPerl CVS is no longer being updated; you have to use Subversion to
>> grab the latest (we have anon. svn set up for this).  We discussed
>> syncing svn commits over to cvs but found it way too problematic and
>> decided to make a clean break.
>>
>> The best option I can think of as a replacement (so everyone isn't
>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>> get
>> a cron job set up which drops a bioperl-live archive into  
>> bioperl.org/
>> DIST or bioperl.org/SRC.  We have already talked about doing this for
>> nightly builds from svn main trunk; we can probably set that up on  
>> our
>> end.  Would that be feasible as a fallback in case svn isn't present?
>>
>> The subversion project page has information on Windows versions:
>>
>> http://subversion.tigris.org/project_packages.html
>>
>> chris
>>
>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>
>>> Hi All,
>>>
>>> So now that the transition to svn is complete (and I like it),  
>>> should
>>> anonymous cvs still be working?  I believe there was discussion  
>>> about
>>> keeping it going via mirroring, and I hope that is the case.  It  
>>> will
>>> make life a little easier for people who want to do automated  
>>> installs
>>> of GBrowse and would like to use the installer script to get bioperl
>>> via
>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>> suggestions for the best route to take for getting command line  
>>> svn on
>>> Windows?
>>>
>>> Thanks,
>>> Scott
>>>
>>> -- 
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>> GMOD Coordinator (http://www.gmod.org/)
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 20:38:22 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 14:38:22 -0600
Subject: [Bioperl-l] anonymous cvs?
In-Reply-To: <ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
References: <1204821815.6689.7.camel@frissell>
	<84E60454-2B09-4F77-9BD6-4B9150304B2D@uiuc.edu>
	<1204824149.6689.14.camel@frissell>
	<ADC4C24A-76D7-4390-9794-F1D218CAD8B9@uiuc.edu>
Message-ID: <2F746C5B-902C-4510-AEA3-2C46D4F51E7A@uiuc.edu>

Okay, I have set up nightly builds for bioperl-live, db, network, and  
run here:

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

At the moment this is running via a crontab off a script in my portal  
account, retrieving everything via anon. svn and bundling it up into  
zip and tarball archives.  I would like to set it up to grab  
everything off dev but I don't want to mess with my ssh setup, so if  
anyone has ideas there...

The script also adds a CHANGELOG file (last 10 commits) and removes  
the .svn directories prior to bundling.  The archive name has the  
subversion revision number and date included; md5 checksums are in the  
SIGNATURES file.  I'll check on it again tomorrow to make sure cron  
ran it.

We can probably set up automated PPM builds as well; might be worth  
testing down the road (we need a way to set defaults for Build args  
prior to getting that running).

chris

On Mar 6, 2008, at 11:28 AM, Chris Fields wrote:

> I'm working on the nightly build script now and will post back when  
> everything is set up.
>
> chris
>
> On Mar 6, 2008, at 11:22 AM, Scott Cain wrote:
>
>> Hi Chris,
>>
>> I think a nightly generated tarball would be sufficient for my  
>> use.  We
>> used anon cvs to get the lastest bioperl and then threw it away  
>> once it
>> was installed, so a tarball is just as good,if not better, since  
>> users
>> wouldn't need to install svn.  Not needing to install svn is good  
>> thing
>> for all my users, since I think many distributions do not supply it  
>> by
>> default.
>>
>> Thanks,
>> Scott
>>
>>
>>
>> On Thu, 2008-03-06 at 11:10 -0600, Chris Fields wrote:
>>> BioPerl CVS is no longer being updated; you have to use Subversion  
>>> to
>>> grab the latest (we have anon. svn set up for this).  We discussed
>>> syncing svn commits over to cvs but found it way too problematic and
>>> decided to make a clean break.
>>>
>>> The best option I can think of as a replacement (so everyone isn't
>>> dependent on installing svn to get Gbrowse and bioperl-live) is to  
>>> get
>>> a cron job set up which drops a bioperl-live archive into  
>>> bioperl.org/
>>> DIST or bioperl.org/SRC.  We have already talked about doing this  
>>> for
>>> nightly builds from svn main trunk; we can probably set that up on  
>>> our
>>> end.  Would that be feasible as a fallback in case svn isn't  
>>> present?
>>>
>>> The subversion project page has information on Windows versions:
>>>
>>> http://subversion.tigris.org/project_packages.html
>>>
>>> chris
>>>
>>> On Mar 6, 2008, at 10:43 AM, Scott Cain wrote:
>>>
>>>> Hi All,
>>>>
>>>> So now that the transition to svn is complete (and I like it),  
>>>> should
>>>> anonymous cvs still be working?  I believe there was discussion  
>>>> about
>>>> keeping it going via mirroring, and I hope that is the case.  It  
>>>> will
>>>> make life a little easier for people who want to do automated  
>>>> installs
>>>> of GBrowse and would like to use the installer script to get  
>>>> bioperl
>>>> via
>>>> anon cvs.  If anon cvs is no longer available, does anyone have
>>>> suggestions for the best route to take for getting command line  
>>>> svn on
>>>> Windows?
>>>>
>>>> Thanks,
>>>> Scott
>>>>
>>>> -- 
>>>> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>>>> GMOD Coordinator (http://www.gmod.org/)
>>>> 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar  6 21:48:37 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 15:48:37 -0600
Subject: [Bioperl-l] Nightly build archives now available
Message-ID: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>

We now have nightly bundled archives for bioperl-live, bioperl-db,  
bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
am every night.

http://www.bioperl.org/DIST/nightly_builds/
ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds

The archives are date-stamped and also have the Subversion revision,  
just in case one wanted to ensure they get the correct version for the  
bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
(if there are any).  These are currently derived off the anon. svn  
repository.

chris


From David.Messina at sbc.su.se  Thu Mar  6 23:50:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 00:50:04 +0100
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>

Very slick and well-thought-out, Chris -- nice job!


Dave


From hlapp at gmx.net  Fri Mar  7 00:06:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 6 Mar 2008 19:06:41 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <E39ADEB2-1FE7-4024-8188-EFF98F17BF52@gmx.net>

Awesome - thanks for doing this, Chris! -hilmar

On Mar 6, 2008, at 4:48 PM, Chris Fields wrote:

> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~  
> 1:00 am every night.
>
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>
> The archives are date-stamped and also have the Subversion  
> revision, just in case one wanted to ensure they get the correct  
> version for the bug fix.  They also contain a CHANGELOG file for  
> the last 10 revisions (if there are any).  These are currently  
> derived off the anon. svn repository.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From staffa at niehs.nih.gov  Thu Mar  6 23:27:31 2008
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Thu, 06 Mar 2008 18:27:31 -0500
Subject: [Bioperl-l] SeqIO
In-Reply-To: <03C512635899144083CADB0EE22201890172A836@alpaca.lan.ablynx.com>
Message-ID: <C3F5E813.8903%staffa@niehs.nih.gov>

Thanks 
I really appreciate all the interest given and help generated.
that sure sounds like a great idea, but i think
Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
Is there a substitute?
It works great with
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.

as seen in:
gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
guesser guesses gcg
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAGGGCA
GAGCCTCGGGA  et c.
(yes, I added
my $file_type = $guesser->guess;
print "guesser guesses $file_type\n";
)

BUT
when applied to a genbank sequence passed thru the Seqlab editor and turned
into GCG, to wit:
!!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993
DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate kinase
            (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3).
ACCESSION   X05246 Y00261
...
...
BASE COUNT      583 a    367 c    442 g    519 t
ORIGIN

 HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156  ..

       1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
et c.

It thinks it is a flawed PIR:

gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
guesser guesses pir

------------- EXCEPTION  -------------
MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0
LOCUS       HSPGK2G      1911 bp    DNA             PRI       12-SEP-1993


Must look at why guesser is thinking PIR.


On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:

> Hi Nick,
> I don't think you should leave out the -format option. You have to leave
> it in but the format should be provided by the B::T::GuessSeqFormat
> object.
> Something like:
> 
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Tools::GuessSeqFormat;
> 
> $| = 1;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
>   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
>   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
> $guesser->guess);
>   my $seq_object = $seqio_object->next_seq;
>   my $sequence = $seq_object->seq;
>   print "$sequence\n";
> }
> 
> HTH,
> Marc
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
>> Sent: donderdag 6 maart 2008 16:24
>> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
>> Cc: Chris Fields
>> Subject: Re: [Bioperl-l] SeqIO
>> 
>> Here's the scoop:
>> When I use Jason's suggestion, (-format => 'gcg'),
>> My program works without complaint on the original file that looks
> like:
>> !!NA_SEQUENCE 1.0
>>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
>> 
>>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.
>> 
>> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
>> (which should be retro-gcg format (before version 11?)),
>> my program runs, but there IS a complaint:
>> Use of uninitialized value in scalar chomp at
>> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
> 1.
>> BUT
>> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
>> returned still has its numbers imbedded. This effects my calculations.
>> 
>> Thanks, at least i know what my options are.
>> 
>> 
>> 
>> Nick Staffa
>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>


From cjfields at uiuc.edu  Fri Mar  7 04:32:39 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 6 Mar 2008 22:32:39 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
Message-ID: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>

I would like to get automated PPM builds set up as well but I think we  
have to rework some Build.PL stuff to get that going.

The next thing is to set up a regular script to check test/POD coverage.

chris

On Mar 6, 2008, at 5:50 PM, Dave Messina wrote:

> Very slick and well-thought-out, Chris -- nice job!
>
>
> Dave


From Marc.Logghe at ablynx.com  Fri Mar  7 09:04:35 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Fri, 7 Mar 2008 10:04:35 +0100
Subject: [Bioperl-l] SeqIO
In-Reply-To: <C3F5E813.8903%staffa@niehs.nih.gov>
Message-ID: <03C512635899144083CADB0EE22201890172A938@alpaca.lan.ablynx.com>

Ahh, my reply did not make much sense when I took a new look. I was the
one who learnt something here :-)
Did not know that Bio::SeqIO was already using B::T::GuessSeqFormat
under the hood. Learnt as well that you have to be careful with the
filename extension because this seems to have precedence.
Regards,
Marc


> -----Original Message-----
> From: Staffa, Nick (NIH/NIEHS) [mailto:staffa at niehs.nih.gov]
> Sent: vrijdag 7 maart 2008 0:28
> To: Marc Logghe; Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Thanks
> I really appreciate all the interest given and help generated.
> that sure sounds like a great idea, but i think
> Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
> Is there a substitute?
> It works great with
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> 
> as seen in:
> gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
> guesser guesses gcg
>
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAG
GG
> CA
> GAGCCTCGGGA  et c.
> (yes, I added
> my $file_type = $guesser->guess;
> print "guesser guesses $file_type\n";
> )
> 
> BUT
> when applied to a genbank sequence passed thru the Seqlab editor and
> turned
> into GCG, to wit:
> !!NA_SEQUENCE 1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate
kinase
>             (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC
2.7.2.3).
> ACCESSION   X05246 Y00261
> ...
> ...
> BASE COUNT      583 a    367 c    442 g    519 t
> ORIGIN
> 
>  HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156
..
> 
>        1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
> et c.
> 
> It thinks it is a flawed PIR:
> 
> gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
> guesser guesses pir
> 
> ------------- EXCEPTION  -------------
> MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE
1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> 
> 
> Must look at why guesser is thinking PIR.
> 
> 
> 
> 
> On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:
> 
> > Hi Nick,
> > I don't think you should leave out the -format option. You have to
leave
> > it in but the format should be provided by the B::T::GuessSeqFormat
> > object.
> > Something like:
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GuessSeqFormat;
> >
> > $| = 1;
> > my $number_of_files = @ARGV;
> > if(!$number_of_files){print "no files entered\n";exit:}
> > foreach my $file (@ARGV){
> >   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
> >   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file,
-format =>
> > $guesser->guess);
> >   my $seq_object = $seqio_object->next_seq;
> >   my $sequence = $seq_object->seq;
> >   print "$sequence\n";
> > }
> >
> > HTH,
> > Marc
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> >> Sent: donderdag 6 maart 2008 16:24
> >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> >> Cc: Chris Fields
> >> Subject: Re: [Bioperl-l] SeqIO
> >>
> >> Here's the scoop:
> >> When I use Jason's suggestion, (-format => 'gcg'),
> >> My program works without complaint on the original file that looks
> > like:
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> >>
> >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> >> (which should be retro-gcg format (before version 11?)),
> >> my program runs, but there IS a complaint:
> >> Use of uninitialized value in scalar chomp at
> >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0>
line
> > 1.
> >> BUT
> >> If I remove (-format => 'gcg'),  I get no complaint, but the
sequence
> >> returned still has its numbers imbedded. This effects my
calculations.
> >>
> >> Thanks, at least i know what my options are.
> >>
> >>
> >>
> >> Nick Staffa
> >> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >> Scientific Computing Support Group
> >> NIEHS Information Technology Support Services Contract
> >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >> National Institute of Environmental Health Sciences
> >> National Institutes of Health
> >> Research Triangle Park, North Carolina
> >


From bix at sendu.me.uk  Fri Mar  7 10:32:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 10:32:01 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
Message-ID: <47D119A1.10408@sendu.me.uk>

Chris Fields wrote:
> I would like to get automated PPM builds set up as well but I think we 
> have to rework some Build.PL stuff to get that going.

What's the hold-up on that front?


From heikki at sanbi.ac.za  Fri Mar  7 11:09:25 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 7 Mar 2008 13:09:25 +0200
Subject: [Bioperl-l] BioSQL V1.0.0 released
Message-ID: <200803071309.25294.heikki@sanbi.ac.za>

BIOSQL V1.0.0 RELEASED
http://news.open-bio.org/archives/2008_03.html#000094


Congratulations, Hilmar!

		-Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Fri Mar  7 13:53:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:53:50 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D119A1.10408@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
Message-ID: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>

I haven't tried it out yet, to tell the truth.  The worry I have is  
prompting during the build process for database tests, networking, etc.

I have looked for it, but couldn't determine whether we have a way to  
run 'perl Build.PL' and bypass prompts with passed arguments.  The  
only one I could find was 'network', for network tests.

Scott Cain and I have corresponded about this before, i.e. it would be  
nice to have boolean flags for each prompt (prereqs, database tests,  
scripts, network, etc).  For nightly PPMs I would forego tests and  
include scripts.

chris

On Mar 7, 2008, at 4:32 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I would like to get automated PPM builds set up as well but I think  
>> we have to rework some Build.PL stuff to get that going.
>
> What's the hold-up on that front?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Mar  7 13:22:27 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 07:22:27 -0600
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <200803071309.25294.heikki@sanbi.ac.za>
References: <200803071309.25294.heikki@sanbi.ac.za>
Message-ID: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>

Same here.  Great news!

chris

On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:

> BIOSQL V1.0.0 RELEASED
> http://news.open-bio.org/archives/2008_03.html#000094
>
>
> Congratulations, Hilmar!
>
> 		-Heikki
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Mar  7 14:10:08 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:10:08 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
Message-ID: <47D14CC0.8000104@sendu.me.uk>

Chris Fields wrote:
> I haven't tried it out yet, to tell the truth.  The worry I have is 
> prompting during the build process for database tests, networking, etc.
> 
> I have looked for it, but couldn't determine whether we have a way to 
> run 'perl Build.PL' and bypass prompts with passed arguments.  The only 
> one I could find was 'network', for network tests.
> 
> Scott Cain and I have corresponded about this before, i.e. it would be 
> nice to have boolean flags for each prompt (prereqs, database tests, 
> scripts, network, etc).  For nightly PPMs I would forego tests and 
> include scripts.

I don't quite understand how you're making the nightlys right now, but 
you should be using the dist actions:

http://www.bioperl.org/wiki/Making_a_BioPerl_release

Ie.

One time (and one time only):
perl Build.PL (it doesn't matter how you answer the questions)

Then every night:
./Build dist
./Build ppmdist

You then upload the resulting .tar.gz and .zip files.


Only if Build.PL or ModuleBuildBioperl are updated might you need to:
./Build realclean
perl Build.PL
again. But this should be a rare event and even more rarely would it be 
/required/ (probably never).


From bix at sendu.me.uk  Fri Mar  7 14:19:36 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:19:36 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <47D14EF8.5090107@sendu.me.uk>

Sendu Bala wrote:
> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is 
>> prompting during the build process for database tests, networking, etc.
>>
>> I have looked for it, but couldn't determine whether we have a way to 
>> run 'perl Build.PL' and bypass prompts with passed arguments.  The 
>> only one I could find was 'network', for network tests.
>>
>> Scott Cain and I have corresponded about this before, i.e. it would be 
>> nice to have boolean flags for each prompt (prereqs, database tests, 
>> scripts, network, etc).  For nightly PPMs I would forego tests and 
>> include scripts.
> 
> I don't quite understand how you're making the nightlys right now, but 
> you should be using the dist actions:
> 
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
> 
> Ie.
> 
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
> 
> Then every night:
> ./Build dist
> ./Build ppmdist
> 
> You then upload the resulting .tar.gz and .zip files.

Ah, having uploaded the various archives you'll have to manually delete 
them before dunning the dist action the next night, otherwise dist will 
ask you if you want to overwrite them. Otherwise dist asks no questions.


From cjfields at uiuc.edu  Fri Mar  7 14:28:36 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 08:28:36 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D14CC0.8000104@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
Message-ID: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>


On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> I haven't tried it out yet, to tell the truth.  The worry I have is  
>> prompting during the build process for database tests, networking,  
>> etc.
>> I have looked for it, but couldn't determine whether we have a way  
>> to run 'perl Build.PL' and bypass prompts with passed arguments.   
>> The only one I could find was 'network', for network tests.
>> Scott Cain and I have corresponded about this before, i.e. it would  
>> be nice to have boolean flags for each prompt (prereqs, database  
>> tests, scripts, network, etc).  For nightly PPMs I would forego  
>> tests and include scripts.
>
> I don't quite understand how you're making the nightlys right now,  
> but you should be using the dist actions:
>
> http://www.bioperl.org/wiki/Making_a_BioPerl_release
>
> Ie.
>
> One time (and one time only):
> perl Build.PL (it doesn't matter how you answer the questions)
>
> Then every night:
> ./Build dist
> ./Build ppmdist
>
> You then upload the resulting .tar.gz and .zip files.
>
>
> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
> ./Build realclean
> perl Build.PL
> again. But this should be a rare event and even more rarely would it  
> be /required/ (probably never).

I'm not making a distribution; the archives are merely cleaned up svn  
checkouts (no .svn directories).  This is essentially what the  
net_install script would get when installing GBrowse using the 'dev'  
option, except you don't need to install Subversion to get updates.   
Also, at this point we don't have an analogous 'Download tarball'  
setting for browsable svn either, so this is a suitable alternative.

Again, I don't want to deal with prompts while running a cron job  
(this is a bash script), particularly if I can't guarantee the number  
of prompts or the prompting order won't change down the line.  If we  
can set up a way around that using passed args to Build.PL then it  
would make life much easier and we could automate 'Build dist', 'Build  
ppmdist', 'Build testcover', etc.

chris


From bix at sendu.me.uk  Fri Mar  7 14:54:41 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 14:54:41 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
Message-ID: <47D15731.2050000@sendu.me.uk>

Chris Fields wrote:
> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>> One time (and one time only):
>> perl Build.PL (it doesn't matter how you answer the questions)
>>
>> Then every night:
>> ./Build dist
>> ./Build ppmdist
>>
>> You then upload the resulting .tar.gz and .zip files.
>>
>>
>> Only if Build.PL or ModuleBuildBioperl are updated might you need to:
>> ./Build realclean
>> perl Build.PL
>> again. But this should be a rare event and even more rarely would it 
>> be /required/ (probably never).
> 
> I'm not making a distribution; the archives are merely cleaned up svn 
> checkouts (no .svn directories).  This is essentially what the 
> net_install script would get when installing GBrowse using the 'dev' 
> option, except you don't need to install Subversion to get updates.  
> Also, at this point we don't have an analogous 'Download tarball' 
> setting for browsable svn either, so this is a suitable alternative.

The dist action does what you want. I did a diff on the most recent 
nightly build and the .tar.gz produced by the dist action of a checkout 
of revision 14603:

$ diff -r bioperl-1.5.2_100 bioperl-live
diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/chaos.pm
2c2
< # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
---
 > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
Only in bioperl-live/Bio/Tools: WebBlat.pm
Only in bioperl-live: CHANGELOG
Only in bioperl-1.5.2_100: MANIFEST
Only in bioperl-1.5.2_100: META.yml
diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
1,31c1,30
< # Note: this file was auto-generated by Module::Build::Compat version 0.03
[snip]
---
 > #!/usr/bin/perl -w
 >
 > # This is a stub that simply tells you to use Build.PL instead
[snip]
Only in bioperl-live: bioperl.lisp
diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl 
bioperl-live/maintenance/cvs2cl_by_file.pl
29c29
< ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
---
 > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $

I don't know what's going on with the date differences, but for a file 
found in a folder called '/DIST/nightly_builds/', you want the MANIFEST 
and META.yml files. You also want the Compat version of Build.PL since 
we haven't yet moved to forcing people to use Build.PL.

'./Build dist' does the right thing.


> Again, I don't want to deal with prompts while running a cron job (this 
> is a bash script), particularly if I can't guarantee the number of 
> prompts or the prompting order won't change down the line.  If we can 
> set up a way around that using passed args to Build.PL then it would 
> make life much easier and we could automate 'Build dist', 'Build 
> ppmdist', 'Build testcover', etc.

Again, you only need to run 'perl Build.PL' once and answer the 
questions only once. Then you can svn update and run the actions with no 
more questions to answer. This isn't a problem that needs to be solved. 
It is /supposed/ to be this way. It's ready to use! Please make use of 
it; it's one of the (many) reasons I moved Bioperl over to Build.PL in 
the first place.


From cjfields at uiuc.edu  Fri Mar  7 15:29:11 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 09:29:11 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D15731.2050000@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
Message-ID: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>


On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:10 AM, Sendu Bala wrote:
>>> One time (and one time only):
>>> perl Build.PL (it doesn't matter how you answer the questions)
>>>
>>> Then every night:
>>> ./Build dist
>>> ./Build ppmdist
>>>
>>> You then upload the resulting .tar.gz and .zip files.
>>>
>>>
>>> Only if Build.PL or ModuleBuildBioperl are updated might you need  
>>> to:
>>> ./Build realclean
>>> perl Build.PL
>>> again. But this should be a rare event and even more rarely would  
>>> it be /required/ (probably never).
>> I'm not making a distribution; the archives are merely cleaned up  
>> svn checkouts (no .svn directories).  This is essentially what the  
>> net_install script would get when installing GBrowse using the  
>> 'dev' option, except you don't need to install Subversion to get  
>> updates.  Also, at this point we don't have an analogous 'Download  
>> tarball' setting for browsable svn either, so this is a suitable  
>> alternative.
>
> The dist action does what you want. I did a diff on the most recent  
> nightly build and the .tar.gz produced by the dist action of a  
> checkout of revision 14603:
>
> $ diff -r bioperl-1.5.2_100 bioperl-live
> diff -r bioperl-1.5.2_100/Bio/SeqIO/chaos.pm bioperl-live/Bio/SeqIO/ 
> chaos.pm
> 2c2
> < # $Date: 2007-06-14 15:16:21 +0100 (Thu, 14 Jun 2007) $
> ---
> > # $Date: 2007-06-14 10:16:21 -0400 (Thu, 14 Jun 2007) $
> Only in bioperl-live/Bio/Tools: WebBlat.pm
> Only in bioperl-live: CHANGELOG
> Only in bioperl-1.5.2_100: MANIFEST
> Only in bioperl-1.5.2_100: META.yml
> diff -r bioperl-1.5.2_100/Makefile.PL bioperl-live/Makefile.PL
> 1,31c1,30
> < # Note: this file was auto-generated by Module::Build::Compat  
> version 0.03
> [snip]
> ---
> > #!/usr/bin/perl -w
> >
> > # This is a stub that simply tells you to use Build.PL instead
> [snip]
> Only in bioperl-live: bioperl.lisp
> diff -r bioperl-1.5.2_100/maintenance/cvs2cl_by_file.pl bioperl-live/ 
> maintenance/cvs2cl_by_file.pl
> 29c29
> < ## $Date: 2006-11-30 15:57:16 +0000 (Thu, 30 Nov 2006) $
> ---
> > ## $Date: 2006-11-30 10:57:16 -0500 (Thu, 30 Nov 2006) $
>
> I don't know what's going on with the date differences, but for a  
> file found in a folder called '/DIST/nightly_builds/', you want the  
> MANIFEST and META.yml files. You also want the Compat version of  
> Build.PL since we haven't yet moved to forcing people to use Build.PL.
>
> './Build dist' does the right thing.
>
>
>> Again, I don't want to deal with prompts while running a cron job  
>> (this is a bash script), particularly if I can't guarantee the  
>> number of prompts or the prompting order won't change down the  
>> line.  If we can set up a way around that using passed args to  
>> Build.PL then it would make life much easier and we could automate  
>> 'Build dist', 'Build ppmdist', 'Build testcover', etc.
>
> Again, you only need to run 'perl Build.PL' once and answer the  
> questions only once. Then you can svn update and run the actions  
> with no more questions to answer. This isn't a problem that needs to  
> be solved. It is /supposed/ to be this way. It's ready to use!  
> Please make use of it; it's one of the (many) reasons I moved  
> Bioperl over to Build.PL in the first place.

Then set it up the way you want.  I give up.

chris


From bix at sendu.me.uk  Fri Mar  7 15:43:44 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 15:43:44 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
Message-ID: <47D162B0.5070402@sendu.me.uk>

Chris Fields wrote:
> 
> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>
>> Again, you only need to run 'perl Build.PL' once and answer the 
>> questions only once. Then you can svn update and run the actions with 
>> no more questions to answer. This isn't a problem that needs to be 
>> solved. It is /supposed/ to be this way. It's ready to use! Please 
>> make use of it; it's one of the (many) reasons I moved Bioperl over to 
>> Build.PL in the first place.
> 
> Then set it up the way you want.  I give up.

I really don't understand that response. I have merely informed you how 
Build.PL and the actions work, since you didn't know. I have informed 
you it already does what you want in terms of automation; there's 
nothing to wait for, no more work to do. I have requested you use it, 
since there is little value in duplicating code and effort.

Now that you have the information, you can make an informed choice as to 
how to proceed, based on your needs. If you have good reasons for 
sticking with your current nightly build process, by all means stick 
with them.

Mainly I just wanted to make clear (as a general point for anyone 
interested) that the questions asked by Build.PL aren't an issue or 
obstacle in terms of automating builds or tests.


From cjfields at uiuc.edu  Fri Mar  7 16:15:31 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 10:15:31 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D162B0.5070402@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
Message-ID: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>

On Mar 7, 2008, at 9:43 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Mar 7, 2008, at 8:54 AM, Sendu Bala wrote:
>>
>>> Again, you only need to run 'perl Build.PL' once and answer the  
>>> questions only once. Then you can svn update and run the actions  
>>> with no more questions to answer. This isn't a problem that needs  
>>> to be solved. It is /supposed/ to be this way. It's ready to use!  
>>> Please make use of it; it's one of the (many) reasons I moved  
>>> Bioperl over to Build.PL in the first place.
>> Then set it up the way you want.  I give up.
>
> I really don't understand that response. I have merely informed you  
> how Build.PL and the actions work, since you didn't know. I have  
> informed you it already does what you want in terms of automation;  
> there's nothing to wait for, no more work to do. I have requested  
> you use it, since there is little value in duplicating code and  
> effort.
>
> Now that you have the information, you can make an informed choice  
> as to how to proceed, based on your needs. If you have good reasons  
> for sticking with your current nightly build process, by all means  
> stick with them.
>
> Mainly I just wanted to make clear (as a general point for anyone  
> interested) that the questions asked by Build.PL aren't an issue or  
> obstacle in terms of automating builds or tests.

It doesn't come across that way; it comes off as pretty  
condescending.  And please don't assume I lack experience with how  
Module::Build works (I have used 'Build ppmdist' and 'Build testcover'  
quite a few times recently, and the next item on my agenda is to fix  
the various issues with Build.PL and database checking, which you  
already know).

So my response is pretty simple; if you feel the need to use  
'Build.PL' to make nightlies, then by all means set it up.  I find it  
much harder to work with the current Build process in an automated way  
using a bash script, so I work around it.  If it makes you happier we  
can switch the directory over to 'nightly_checkouts', but I think  
that's just mincing semantics.

Okay, it's pretty obvious we're not on the same page here.  I'll go  
through it carefully so you understand the problem:

1)  I am running a 'svn co' on anon. svn for the various distros to a  
temp directory.  This is done using a bash script.  If I attempt to  
change into the distribution directory and run 'perl Build.PL' from  
the bash script, I immediately run into permissions issues and several  
odd things:

Checking prerequisites...
  - ERROR: Bio::Root::Version is not installed
(I think you ran Build.PL directly, so will use CPAN to install  
prerequisites on demand)
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
Warning: You are not allowed to write into directory "/root/.cpan/ 
sources/authors".
     I'll continue, but if you encounter problems, they may be due
     to insufficient permissions.
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
LWP failed with code[500] message[Cannot write to '/root/.cpan/sources/ 
authors/01mailrc.txt.gz-8678': Permission denied]
Fetching with Net::FTP:
   ftp://mirror.hiwaay.net/CPAN/authors/01mailrc.txt.gz
Cannot open Local file /root/.cpan/sources/authors/01mailrc.txt.gz:  
Permission denied
....

2) I suspect, even if I worked around permissions and set up the job  
as root or admin and worked out why it can't find  
'Bio::Root::Version' (?!?), this would still be a terrific pain in the  
*** to deal with as the Build.PL process is expecting answers for each  
and every prompt, and the process differs for each distribution.  Yes,  
I could set something up to deal with that on in the script.  No, I  
will not do that as any additions or changes to prompts could break/ 
hang the script or (worse) silently change what the archive contains.   
Hence my indication that passing flags to 'perl Build.PL' would be a  
nice way to work around that.  For that I haven't heard a response, so  
I assume that functionality isn't there (or am I assuming incorrectly?).

So, from where I stand, even if using Build.PL is the /proper/ way to  
do it, it doesn't work as expected using an automated process (i.e.  
cron).  Make sense?

chris


From matthewehodges at gmail.com  Fri Mar  7 16:16:47 2008
From: matthewehodges at gmail.com (Matt)
Date: Fri, 7 Mar 2008 16:16:47 +0000 (UTC)
Subject: [Bioperl-l] Reciprocal blast
Message-ID: <loom.20080307T160014-878@post.gmane.org>

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the 
protein models of various species also in fasta format. The aim is o have an 
output showing presence/not presence. I think this is possible to do using 
perl, but i'm very much a beginner so any help in this would be greatly 
appreciated.
Thanks
Matt


From bix at sendu.me.uk  Fri Mar  7 17:34:17 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 07 Mar 2008 17:34:17 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D17C99.9050009@sendu.me.uk>

Chris Fields wrote:
> 1)  I am running a 'svn co' on anon. svn for the various distros to a 
> temp directory.

Is it important that you do a fresh co every night? Why not do a co once 
   and then do a 'svn update' every night? This is the crux of the 
problems: if you choose to simply update, then you only have to get 
'perl Build.PL' to work once.


> If I attempt to change into the distribution directory and run 'perl Build.PL' from the 
> bash script, I immediately run into permissions issues and several odd 
> things:
> 
> Checking prerequisites...
>  - ERROR: Bio::Root::Version is not installed
> (I think you ran Build.PL directly, so will use CPAN to install 
> prerequisites on demand)
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
>   Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
> Warning: You are not allowed to write into directory 
> "/root/.cpan/sources/authors".
[snip]

I'm assuming this is on portal? The CPAN setup for users is a little 
broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm

$CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"

Then you can run and configure cpan correctly and install Bundle::CPAN. 
Some of the zlib stuff failed to install for me, but that doesn't seem 
to matter.

Of course, I guess it makes sense for root to just install all of 
Bioperl's prereqs anyway, so that testing can be automated in the future.

Anyway, once you have cpan happy 'perl Build.PL' will run fine. Answer 
'n' to everything and then your cron job just has to call './Build dist'.


> 2) I suspect, even if I worked around permissions and set up the job as 
> root or admin and worked out why it can't find 'Bio::Root::Version' 
> (?!?), this would still be a terrific pain in the *** to deal with as 
> the Build.PL process is expecting answers for each and every prompt, and 
> the process differs for each distribution.

You won't be running Build.PL in the cron job.


> passing flags to 'perl Build.PL' would be a nice way to work around 
> that.  For that I haven't heard a response, so I assume that 
> functionality isn't there (or am I assuming incorrectly?).

It isn't AFAIK, but my point is that it doesn't need to be (for this 
particular use-case at least).


> So, from where I stand, even if using Build.PL is the /proper/ way to do 
> it, it doesn't work as expected using an automated process (i.e. cron).  
> Make sense?

Only if you can't run 'svn update' instead of 'svn co' each night.


From cjfields at uiuc.edu  Fri Mar  7 18:00:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 12:00:52 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D17C99.9050009@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
Message-ID: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>


On Mar 7, 2008, at 11:34 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> 1)  I am running a 'svn co' on anon. svn for the various distros to  
>> a temp directory.
>
> Is it important that you do a fresh co every night? Why not do a co  
> once   and then do a 'svn update' every night? This is the crux of  
> the problems: if you choose to simply update, then you only have to  
> get 'perl Build.PL' to work once.

Unless you update Build.PL (which will happen as the distributions  
grow).  Then you need to rerun 'perl Build.PL'.  It seems safer to run  
that each time with a 'pass-through' flag for automated builds.

>> If I attempt to change into the distribution directory and run  
>> 'perl Build.PL' from the bash script, I immediately run into  
>> permissions issues and several odd things:
>> Checking prerequisites...
>> - ERROR: Bio::Root::Version is not installed
>> (I think you ran Build.PL directly, so will use CPAN to install  
>> prerequisites on demand)
>> CPAN: Storable loaded ok
>> Going to read /root/.cpan/Metadata
>>  Database was generated on Tue, 05 Feb 2008 11:30:54 GMT
>> Warning: You are not allowed to write into directory "/root/.cpan/ 
>> sources/authors".
> [snip]
>
> I'm assuming this is on portal? The CPAN setup for users is a little  
> broken. You need to create /home/yourusername/.cpan/CPAN/MyConfig.pm
>
> $CPAN::Config->{cpan_home} = "/home/yourusername/.cpan/"
>
> Then you can run and configure cpan correctly and install  
> Bundle::CPAN. Some of the zlib stuff failed to install for me, but  
> that doesn't seem to matter.
>
> Of course, I guess it makes sense for root to just install all of  
> Bioperl's prereqs anyway, so that testing can be automated in the  
> future.
>
> Anyway, once you have cpan happy 'perl Build.PL' will run fine.  
> Answer 'n' to everything and then your cron job just has to call './ 
> Build dist'.

I agree about setting up the prereqs.  I could also (as mentioned  
before) set this up as root.  However, if we go this route we need to  
have 'perl Build.PL' included in the process in order to ensure a  
clean build process each time and to prevent the script from breaking  
whenever someone decides to change Build.PL.

>> 2) I suspect, even if I worked around permissions and set up the  
>> job as root or admin and worked out why it can't find  
>> 'Bio::Root::Version' (?!?), this would still be a terrific pain in  
>> the *** to deal with as the Build.PL process is expecting answers  
>> for each and every prompt, and the process differs for each  
>> distribution.
>
> You won't be running Build.PL in the cron job.

See above.  I don't want to set up something automated which can't be  
maintained in the long term.

>> passing flags to 'perl Build.PL' would be a nice way to work around  
>> that.  For that I haven't heard a response, so I assume that  
>> functionality isn't there (or am I assuming incorrectly?).
>
> It isn't AFAIK, but my point is that it doesn't need to be (for this  
> particular use-case at least).

See above.  There are very good reasons to allow this (and the  
functionality has been requested before, particularly from the GMOD  
crowd).  If I can pass in a single flag (for instance, --defaults,  
which just uses the default arg for each prompt) then it would make  
it /much/ easier.

>> So, from where I stand, even if using Build.PL is the /proper/ way  
>> to do it, it doesn't work as expected using an automated process  
>> (i.e. cron).  Make sense?
>
> Only if you can't run 'svn update' instead of 'svn co' each night.

I think a single co with updates is feasible (I can do that with the  
current setup; just run the initial co, copy the directory over to a  
temp copy, then go about my business).

I'll leave the nightly build setup as is for now and work on getting  
Build.PL working (something we need anyway for Devel::Cover and  
Pod::Coverage work).

chris


From David.Messina at sbc.su.se  Fri Mar  7 18:14:38 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 7 Mar 2008 19:14:38 +0100
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <loom.20080307T160014-878@post.gmane.org>
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>

Hey Matt,

Your question is a little beyond the scope of this mailing list. I don't
know what your bioinformatics background is, but in my experience it's best
to get started hands-on, either in a class or with someone you can sit down
and work through it with. You'll have a million questions, and a mailing
list isn't really suitable for that.

That being said, I would run the blasts on the command-line, parse out the
best hits with BioPerl, and then use hashes to identify mutual best hits.

Briefly, you have two datasets A & B. Format each dataset into a blast
database using xdformat or formatdb. Run two blasts, one with A as query and
B as database and then one with B as query and A as database. The two output
files, each containing multiple Blast reports, can then be processed with
Bio::SearchIO to extract the best hit for each protein.

Read this tutorial for help with that:
http://www.bioperl.org/wiki/HOWTO:SearchIO

Once you get the best hit for each protein, then you can use Perl to find
every instance where two proteins, one from each set, are each other's best
hit. One way would be to create two hashes, one for each set, with query
proteins as keys and best hits as values, and then step through to find the
reciprocal bests.


Dave


From jay at jays.net  Fri Mar  7 18:51:35 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 12:51:35 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
References: <loom.20080307T160014-878@post.gmane.org>
	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
Message-ID: <47D18EB7.3060906@jays.net>

Dave Messina wrote:
> Your question is a little beyond the scope of this mailing list. I don't
> know what your bioinformatics background is, but in my experience it's best
> to get started hands-on, either in a class or with someone you can sit down
> and work through it with. You'll have a million questions, and a mailing
> list isn't really suitable for that.
>
> That being said, I would run the blasts on the command-line, parse out the
> best hits with BioPerl, and then use hashes to identify mutual best hits.
>   

Hi Matt,

If you're a glutton for punishment and want to see a ball of Perl that 
automates and tracks stats across my version of "reciprocal blasts" 
(mine is called cross_blast()), help yourself:

   svn checkout svn://vc.jays.net/seqlab seqlab

I abandoned my maiden voyage into bioinformatics, called "SeqLab," as a 
stand-alone entity when the subsequent thousand tasks I worked turned 
out to be unrelated to all the software I had built so far. My naive 
grand unification vision for all of bioinformatics didn't quite work out 
as I had planned. -laugh-  Nowadays I just cherry-pick solutions out of 
its guts on demand.  :)

I'm happy to field any questions you have about that code, if it helps 
you any.

Cheers,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Fri Mar  7 19:43:03 2008
From: jay at jays.net (Jay Hannah)
Date: Fri, 07 Mar 2008 13:43:03 -0600
Subject: [Bioperl-l] Reciprocal blast
In-Reply-To: <47D18EB7.3060906@jays.net>
References: <loom.20080307T160014-878@post.gmane.org>	<628aabb70803071014y6e08a4f9va083ebc8a33439cb@mail.gmail.com>
	<47D18EB7.3060906@jays.net>
Message-ID: <47D19AC7.1060907@jays.net>

Jay Hannah wrote:
> I'm happy to field any questions you have about that code, if it helps 
> you any.

I created a wiki page since I stopped paying the bill on the 
"seqlab.net" domain:   :)

http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

Cheers,

j


From cain.cshl at gmail.com  Fri Mar  7 20:17:29 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 15:17:29 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
Message-ID: <1204921049.6467.9.camel@frissell>

Hi Chris,

Thanks much for this.  I have one observation though: both the http and
ftp directories are empty except for a log file :-/

Also, I saw that you mentioned the 'accept the defaults' option I asked
about in January.  I did implement that on Build.PL at the exact time
that the transition from cvs to svn was happening, so I never got
committed back.  Hopefully I still have it :-)  I'll look around and
commit it when I find it.

Scott

On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> We now have nightly bundled archives for bioperl-live, bioperl-db,  
> bioperl-run, and bioperl-network running; these will be updated ~ 1:00  
> am every night.
> 
> http://www.bioperl.org/DIST/nightly_builds/
> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> 
> The archives are date-stamped and also have the Subversion revision,  
> just in case one wanted to ensure they get the correct version for the  
> bug fix.  They also contain a CHANGELOG file for the last 10 revisions  
> (if there are any).  These are currently derived off the anon. svn  
> repository.
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From cjfields at uiuc.edu  Fri Mar  7 20:25:01 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 7 Mar 2008 14:25:01 -0600
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204921049.6467.9.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
Message-ID: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>

I was testing a few things earlier using 'Build dist' which tanked the  
old archives.  I reran the script manually so everything should be up  
now.

If you have the default setting implemented for Build.PL that would be  
great.  There is a lingering minor issue with Data::Dumper error  
output via perl 5.10, but beyond that it should be fine.

chris

On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:

> Hi Chris,
>
> Thanks much for this.  I have one observation though: both the http  
> and
> ftp directories are empty except for a log file :-/
>
> Also, I saw that you mentioned the 'accept the defaults' option I  
> asked
> about in January.  I did implement that on Build.PL at the exact time
> that the transition from cvs to svn was happening, so I never got
> committed back.  Hopefully I still have it :-)  I'll look around and
> commit it when I find it.
>
> Scott
>
> On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
>> We now have nightly bundled archives for bioperl-live, bioperl-db,
>> bioperl-run, and bioperl-network running; these will be updated ~  
>> 1:00
>> am every night.
>>
>> http://www.bioperl.org/DIST/nightly_builds/
>> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
>>
>> The archives are date-stamped and also have the Subversion revision,
>> just in case one wanted to ensure they get the correct version for  
>> the
>> bug fix.  They also contain a CHANGELOG file for the last 10  
>> revisions
>> (if there are any).  These are currently derived off the anon. svn
>> repository.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From N.Haigh at sheffield.ac.uk  Fri Mar  7 21:01:43 2008
From: N.Haigh at sheffield.ac.uk (Nathan S. Haigh)
Date: Fri,  7 Mar 2008 21:01:43 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>
	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>
	<47D119A1.10408@sendu.me.uk>
	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>
	<47D14CC0.8000104@sendu.me.uk>
	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>
	<47D15731.2050000@sendu.me.uk>
	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>
	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
	<47D17C99.9050009@sendu.me.uk>
	<BC3ACFAC-F203-4C12-BA1F-6AA762F31BF0@uiuc.edu>
Message-ID: <1204923703.47d1ad37a614a@webmail.shef.ac.uk>

Quoting Chris Fields <cjfields at uiuc.edu>:

-- snip --

> 
> I'll leave the nightly build setup as is for now and work on getting  
> Build.PL working (something we need anyway for Devel::Cover and  
> Pod::Coverage work).
> 

One of the test metrics of Devel::Cover is Pod::Coverage ....no need to have a seperate Pod::Coverage test :o)

Nath


From cain.cshl at gmail.com  Fri Mar  7 22:25:53 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 07 Mar 2008 17:25:53 -0500
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
Message-ID: <1204928753.6467.19.camel@frissell>

OK, I added my 'accept the defaults' option.  Use it like this:

  perl Build.PL --accept 1

Scott

On Fri, 2008-03-07 at 14:25 -0600, Chris Fields wrote:
> I was testing a few things earlier using 'Build dist' which tanked the  
> old archives.  I reran the script manually so everything should be up  
> now.
> 
> If you have the default setting implemented for Build.PL that would be  
> great.  There is a lingering minor issue with Data::Dumper error  
> output via perl 5.10, but beyond that it should be fine.
> 
> chris
> 
> On Mar 7, 2008, at 2:17 PM, Scott Cain wrote:
> 
> > Hi Chris,
> >
> > Thanks much for this.  I have one observation though: both the http  
> > and
> > ftp directories are empty except for a log file :-/
> >
> > Also, I saw that you mentioned the 'accept the defaults' option I  
> > asked
> > about in January.  I did implement that on Build.PL at the exact time
> > that the transition from cvs to svn was happening, so I never got
> > committed back.  Hopefully I still have it :-)  I'll look around and
> > commit it when I find it.
> >
> > Scott
> >
> > On Thu, 2008-03-06 at 15:48 -0600, Chris Fields wrote:
> >> We now have nightly bundled archives for bioperl-live, bioperl-db,
> >> bioperl-run, and bioperl-network running; these will be updated ~  
> >> 1:00
> >> am every night.
> >>
> >> http://www.bioperl.org/DIST/nightly_builds/
> >> ftp://ftp.open-bio.org/pub/bioperl/DIST/nightly_builds
> >>
> >> The archives are date-stamped and also have the Subversion revision,
> >> just in case one wanted to ensure they get the correct version for  
> >> the
> >> bug fix.  They also contain a CHANGELOG file for the last 10  
> >> revisions
> >> (if there are any).  These are currently derived off the anon. svn
> >> repository.
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > -- 
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                         cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From n.haigh at sheffield.ac.uk  Sat Mar  8 12:55:39 2008
From: n.haigh at sheffield.ac.uk (Nathan S Haigh)
Date: Sat, 08 Mar 2008 12:55:39 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<628aabb70803061550s24d7d8cfhf80495ea970a6c19@mail.gmail.com>	<5A67E3A9-9997-4A6B-AB07-8403D5FF388E@uiuc.edu>	<47D119A1.10408@sendu.me.uk>	<C71F44BE-CB80-4DED-9F18-8CE80836CFD6@uiuc.edu>	<47D14CC0.8000104@sendu.me.uk>	<FFE7B6A1-FF8C-4C84-994B-6D9AAD46748E@uiuc.edu>	<47D15731.2050000@sendu.me.uk>	<7AFB30F0-4810-4B62-B64A-C712EC4A3872@uiuc.edu>	<47D162B0.5070402@sendu.me.uk>
	<AAF0B249-AD91-475A-8A30-4BB3A2CCF951@uiuc.edu>
Message-ID: <47D28CCB.50507@sheffield.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Fields wrote:

- -- snip --

> 1)  I am running a 'svn co' on anon. svn for the various distros to a
> temp directory.  This is done using a bash script.  If I attempt to
> change into the distribution directory and run 'perl Build.PL' from the
> bash script, I immediately run into permissions issues and several odd
> things:
> 

- -- snip --

Hi Chris,

Do you need to do any svn commands after the checkout? If not, you can do "svn export" instead:
http://svnbook.red-bean.com/en/1.0/re10.html

This basically recursively gets the URL specified without the .svn dirs. However, you then won't be able to run any svn commands on it, as it won't be
a working copy....save bandwidth and possible post processing to delete all the .svn dirs.

Nath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH0ozL9gTv6QYzVL4RAkvcAJ9eSosx3+YWfbBg/KT6+HZrbweGSgCguLCe
ZYtTxSi5q6iiR+sVGDQEZ68=
=uFNP
-----END PGP SIGNATURE-----


From nm249 at cornell.edu  Sat Mar  8 16:48:44 2008
From: nm249 at cornell.edu (Naama Menda)
Date: Sat, 08 Mar 2008 11:48:44 -0500
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <A042F0A0-F007-45C2-A838-7666A85F5836@cornell.edu>

Hi Hilmar,

I have a loading script that uses Bio::Ontology::OntologyI  for  
parsing  obo files and loading terms  into chado schema.
I'm trying to find all relationship types, and it seems that the  
parser looks at the distinct relationship types used by the terms in  
the file,
but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
This is important for storing the relationships in the right  
context , for example all relationships types defined by Sequence  
Ontology should be stored in
the chado schema using the SO cv_id, while other relationship types,  
not defined as Typedef in the obo file, should be stored using the  
'relationship' cv_id.
Without a way to parse Typedefs, I also cannot use Bio::Ontology for  
parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).

Is there another function in Bio::Ontology that handles Typedefs? If  
not can one be added?

Thanks!
-Naama Menda


From bix at sendu.me.uk  Sat Mar  8 23:30:40 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Sat, 08 Mar 2008 23:30:40 +0000
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <1204928753.6467.19.camel@frissell>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>	<1204921049.6467.9.camel@frissell>	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>
Message-ID: <47D321A0.9010209@sendu.me.uk>

Scott Cain wrote:
> OK, I added my 'accept the defaults' option.  Use it like this:
> 
>   perl Build.PL --accept 1

Thanks for that Scott, but can you revert and have another go at that 
commit, because you ended up wiping out the recent commits by Chris and 
myself.

Also, rather than individually alter the Bioperl-specific methods like 
choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
perhaps by overriding prompt() itself? Other questions may get added in 
the future, and some existing questions aren't immediately obvious, so 
it would be nice to be sure an automated process like a cron job /never/ 
gets asked a question.


From hlapp at gmx.net  Sun Mar  9 21:37:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 17:37:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <47D2C36C.2020802@cornell.edu>
References: <47D2C36C.2020802@cornell.edu>
Message-ID: <DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>

Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing  
rather than skipping those could be added to the code; I also once  
started and almost completed a project to integrated the go-perl .obo  
parser into the Bio::OntologyIO framework, but the final touches fell  
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and  
traversing it in some way, go-perl might have all you need. If you  
need more than that, could you elaborate?

	-hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

> Hi Hilmar,
>
> I have a loading script that uses Bio::Ontology::OntologyI  for  
> parsing  obo files and loading terms  into chado schema.
> I'm trying to find all relationship types, and it seems that the  
> parser looks at the distinct relationship types used by the terms  
> in the file,
> but not at the ' [Typedef] ' fields (I used 'get_predicate_terms()' ).
> This is important for storing the relationships in the right  
> context , for example all relationships types defined by Sequence  
> Ontology should be stored in
> the chado schema using the SO cv_id, while other relationship  
> types, not defined as Typedef in the obo file, should be stored  
> using the 'relationship' cv_id.
> Without a way to parse Typedefs, I also cannot use Bio::Ontology  
> for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
>
> Is there another function in Bio::Ontology that handles Typedefs?  
> If not can one be added?
>
> Thanks!
> -Naama Menda

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Mon Mar 10 01:34:05 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 21:34:05 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
Message-ID: <48F99F4E-F17B-4000-8460-9F2CB9E0D75A@gmail.com>

My main problem is that go-perl does not handle updates, so if I want  
to update GO I need an empty schema. We find it more complicated to  
re-load our annotations than to update cvterms and their related data.
Our loading script compares an existing load of an ontology to the  
obo file and updates/insets/deletes accordingly.
We are now in the process of committing this code to GMOD, and  
thought this will be a good opportunity for adding the Typedef  
parsing option.

Thanks,
-Naama


On Sun, Mar 9, 2008 at 5:37 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
Naama -

it is the OntologyIO::obo parser that omits the typedefs. Parsing
rather than skipping those could be added to the code; I also once
started and almost completed a project to integrated the go-perl .obo
parser into the Bio::OntologyIO framework, but the final touches fell
victim to moving jobs and the ensuing upheaval.

If all you need to do is parsing a .obo-formatted ontology and
traversing it in some way, go-perl might have all you need. If you
need more than that, could you elaborate?

        -hilmar

On Mar 8, 2008, at 11:48 AM, Naama Menda wrote:

 > Hi Hilmar,
 >
 > I have a loading script that uses Bio::Ontology::OntologyI  for
 > parsing  obo files and loading terms  into chado schema.
 > I'm trying to find all relationship types, and it seems that the
 > parser looks at the distinct relationship types used by the terms
 > in the file,
 > but not at the ' [Typedef] ' fields (I used 'get_predicate_terms 
()' ).
 > This is important for storing the relationships in the right
 > context , for example all relationships types defined by Sequence
 > Ontology should be stored in
 > the chado schema using the SO cv_id, while other relationship
 > types, not defined as Typedef in the obo file, should be stored
 > using the 'relationship' cv_id.
 > Without a way to parse Typedefs, I also cannot use Bio::Ontology
 > for parsing OBO_REL file (http://www.obofoundry.org/ro/ro.obo).
 >
 > Is there another function in Bio::Ontology that handles Typedefs?
 > If not can one be added?
 >
 > Thanks!
 > -Naama Menda

--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Mar 10 02:13:15 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:13:15 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
Message-ID: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>


On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:

> My main problem is that go-perl does not handle updates, so if I  
> want to update GO I need an empty schema. We find it more  
> complicated to re-load our annotations than to update cvterms and  
> their related data.
> Our loading script compares an existing load of an ontology to the  
> obo file and updates/insets/deletes accordingly.

load_ontology.pl in bioperl-db should have all this functionality,  
though of course that doesn't give you the typedef support (yet).

>
> We are now in the process of committing this code to GMOD

Cool - obviously load_ontology.pl doesn't work off of Chado but  
instead uses BioSQL as the schema (though the ontology model is  
*very* similar between the two).

BTW please keep the Bioperl list in the loop, others may have insight  
too or be interested in the information.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Mon Mar 10 02:43:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 9 Mar 2008 22:43:13 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
Message-ID: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>


On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
>> My main problem is that go-perl does not handle updates, so if I  
>> want to update GO I need an empty schema. We find it more  
>> complicated to re-load our annotations than to update cvterms and  
>> their related data.
>> Our loading script compares an existing load of an ontology to the  
>> obo file and updates/insets/deletes accordingly.
>
> load_ontology.pl in bioperl-db should have all this functionality,  
> though of course that doesn't give you the typedef support (yet).
>
> Will you add this support to obo.pm? I had a look at it and it  
> seems easy to implement.
> Will there be a patch? Or in the next Bioperl release?

If you have ideas for how to implement this we'd be thrilled if you  
can provide a patch.

Most changes in BioPerl happen because and by people who have an itch  
to scratch. Seems like this one is right down your alley?

I'd in principle be interested in doing this too but can't give any  
promises as to when I might have time (unless I need it myself :)

>
>
>>
>> We are now in the process of committing this code to GMOD
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but  
> instead uses BioSQL as the schema (though the ontology model is  
> *very* similar between the two).
>
> We  store ontologies in Chado, and that was the reason for writing  
> a new loader.

Looking at it it seems you wrote a whole new language binding? Did  
you find it too difficult to build on one of the existing ones (which  
use Class::DBI if I recall correctly, though Scott will have the  
details here)

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From naama.menda at gmail.com  Mon Mar 10 02:26:04 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Sun, 9 Mar 2008 22:26:04 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
Message-ID: <e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>
> My main problem is that go-perl does not handle updates, so if I want to
> update GO I need an empty schema. We find it more complicated to re-load our
> annotations than to update cvterms and their related data.
> Our loading script compares an existing load of an ontology to the obo
> file and updates/insets/deletes accordingly.
>
>
> load_ontology.pl in bioperl-db should have all this functionality, though
> of course that doesn't give you the typedef support (yet).
>

Will you add this support to obo.pm? I had a look at it and it seems easy to
implement.
Will there be a patch? Or in the next Bioperl release?

>
>
> We are now in the process of committing this code to GMOD
>
>
> Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> uses BioSQL as the schema (though the ontology model is *very* similar
> between the two).
>

We  store ontologies in Chado, and that was the reason for writing a new
loader.

>
> BTW please keep the Bioperl list in the loop, others may have insight too
> or be interested in the information.
>
> -hilmar
>
> --
>
Thanks!
-Naama

> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


From akarger at CGR.Harvard.edu  Mon Mar 10 13:33:49 2008
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 10 Mar 2008 09:33:49 -0400
Subject: [Bioperl-l] Reciprocal blast
References: <loom.20080307T160014-878@post.gmane.org>
Message-ID: <72AF30DC2881964CB911FD08E57157E7367BD5@lsdiv-msxbe-001.nucleus.harvard.edu>

<plug>
There's a cut & paste protocol for Reciprocal best hit blast at

http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/Pro
tocols/Sequences.html

Let me know if you need to tweak things.
</plug>

-Amir Karger 

> -----Original Message-----
> From: Matt [mailto:matthewehodges at gmail.com] 
> Sent: Friday, March 07, 2008 11:17 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Reciprocal blast
> 
> Dear experts,
> 
> I want to do a best reciprocal blastp of a fasta protein 
> dataset against the 
> protein models of various species also in fasta format. The 
> aim is o have an 
> output showing presence/not presence. I think this is 
> possible to do using 
> perl, but i'm very much a beginner so any help in this would 
> be greatly 
> appreciated.
> Thanks
> Matt
> 
> 
> 


From Daniel.Gerlach at medecine.unige.ch  Mon Mar 10 16:13:39 2008
From: Daniel.Gerlach at medecine.unige.ch (Daniel Gerlach)
Date: Mon, 10 Mar 2008 17:13:39 +0100
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
Message-ID: <47D55E33.8060205@medecine.unige.ch>

Dear all,

This is a very basic question. I have a tree object in $tree and want to 
save its newick representation in a variable as a string:

my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Unfortunately this does not work and he prints out the newick tree on 
stdout plus the message "Use of uninitialized value in print at ...". He 
also prints out the tree on the stdout if I remove the line "print 
$tree_string". The variable $tree_string seems to be empty.

D.


From naama.menda at gmail.com  Mon Mar 10 15:09:12 2008
From: naama.menda at gmail.com (Naama Menda)
Date: Mon, 10 Mar 2008 11:09:12 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
Message-ID: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>

On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:

>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>
>
> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> >
> > On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
> >
> > My main problem is that go-perl does not handle updates, so if I want to
> > update GO I need an empty schema. We find it more complicated to re-load our
> > annotations than to update cvterms and their related data.
> > Our loading script compares an existing load of an ontology to the obo
> > file and updates/insets/deletes accordingly.
> >
> >
> > load_ontology.pl in bioperl-db should have all this functionality,
> > though of course that doesn't give you the typedef support (yet).
> >
>
> Will you add this support to obo.pm? I had a look at it and it seems easy
> to implement.
> Will there be a patch? Or in the next Bioperl release?
>
>
> If you have ideas for how to implement this we'd be thrilled if you can
> provide a patch.
>
> Most changes in BioPerl happen because and by people who have an itch to
> scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any
> promises as to when I might have time (unless I need it myself :)
>

I'll try to provide a patch for this. I'll let you know how it goes..

>
>
>
> >
> > We are now in the process of committing this code to GMOD
> >
> >
> > Cool - obviously load_ontology.pl doesn't work off of Chado but instead
> > uses BioSQL as the schema (though the ontology model is *very* similar
> > between the two).
> >
>
> We  store ontologies in Chado, and that was the reason for writing a new
> loader.
>
>
> Looking at it it seems you wrote a whole new language binding? Did you
> find it too difficult to build on one of the existing ones (which use
> Class::DBI if I recall correctly, though Scott will have the details here)
>

We already had most of the classes. We use these for other code at SGN,
since it's all OO perl. All we needed to do is to add some methods and
accessors to  our Chado classes, and  write a loading script that stores the
ontology from the db and the ontology from the file in hashrefs, compare the
2 and insert/update accordingly. Our main concern was for updating
pre-loaded ontologies (for new ontology files GMOD's make-ontologies works
great!)


> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> -Naama


From clarsen at vecna.com  Mon Mar 10 15:56:55 2008
From: clarsen at vecna.com (Christopher Larsen)
Date: Mon, 10 Mar 2008 11:56:55 -0400 (EDT)
Subject: [Bioperl-l] Reciprocal best blast hits / Orthology
Message-ID: <49819.64.47.82.110.1205164615.squirrel@mail.vecna.com>

Matt, Dave,

Regarding reciprocal best blast hit, yes -- its beyond the list and heres
how/ where to go. It seems what you are looking for is actually an
Ortholog search. If so there is more to it than reciprocity and
ranking--other groups are using phylo trees and bootstrap values etc.

Perhaps check out the perl written up by David Roos and Chris Stockerts
work: OrthoMCL. Their group is quite helpful as well.

http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi

The perl install goes local and will help you to identify a homolog that
should have the same enzyme function or cellular role. Importantly it
tends to the idea of paralogy and pseudogenes as well so you dont step
into a pit. The site explains more.

The point is, you're on the right track, but theres a group that's been
through what you are doing and can supply you with a working
implementation thats very robust and uses BioPerl modules already, so you
don't have to scratch up some code. Also you can check out INPARANOID for
the same reasons. Having just been through this, I'm just trying to lead
to you where we went. Right now we point OrthoMCL at a whole folder of
proteomes (*.faa) and it groups them accordingly.

Brian O: Don't know if this folds well into your MCL wiki page or not,
apologies.

If the group wants some post-processing code that shows the
presence/absence of proteins in any one group perhaps we can help too as
there are a few things written that take the raw output directly.

Cheers,

Chris L

=========================
Message: 5

Dear experts,

I want to do a best reciprocal blastp of a fasta protein dataset against the
protein models of various species also in fasta format. The aim is o have an
output showing presence/not presence. I think this is possible to do using
perl, but i'm very much a beginner so any help in this would be greatly
appreciated.

Thanks

Matt

-- 
Christopher Larsen, Ph.D.
Senior Scientist
Research Grants Manager
Vecna Technologies
5004 Lehigh Ave
College Park, MD 20740
240-737-1625


From Kevin.M.Brown at asu.edu  Mon Mar 10 16:17:11 2008
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 10 Mar 2008 09:17:11 -0700
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <1A4207F8295607498283FE9E93B775B4048A0825@EX02.asurite.ad.asu.edu>

You need to either pass in a FileHandle or a path to an output file else
you are going to see the behavior you are getting.

open my $tree_string, ">TreeFile.txt";
my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');

OR

my $out = new Bio::TreeIO(-file => "TreeFile.txt", -format => 'newick');

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Daniel Gerlach
> Sent: Monday, March 10, 2008 9:14 AM
> To: bioperl-l at portal.open-bio.org
> Subject: [Bioperl-l] Bio::TreeIO - tree object to string
> 
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree 
> and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print 
> at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.
> 
> D.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Mon Mar 10 16:51:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 10 Mar 2008 16:51:59 +0000
Subject: [Bioperl-l] Bio::TreeIO - tree object to string
In-Reply-To: <47D55E33.8060205@medecine.unige.ch>
References: <47D55E33.8060205@medecine.unige.ch>
Message-ID: <47D5672F.6000709@sendu.me.uk>

Daniel Gerlach wrote:
> Dear all,
> 
> This is a very basic question. I have a tree object in $tree and want to 
> save its newick representation in a variable as a string:
> 
> my $out = new Bio::TreeIO(-fh => $tree_string, -format => 'newick');
> $out->write_tree($tree);
> print $tree_string;
> 
> Unfortunately this does not work and he prints out the newick tree on 
> stdout plus the message "Use of uninitialized value in print at ...". He 
> also prints out the tree on the stdout if I remove the line "print 
> $tree_string". The variable $tree_string seems to be empty.

The -fh argument is supposed to be a file handle, not a string. You can 
use whatever standard Perl method you like for attaching a filehandle to 
a scalar.

Eg.
my $tree_string = '';
open(my $fake_fh, "+<", \$tree_string);
my $out = new Bio::TreeIO(-fh => $fake_fh, -format => 'newick');
$out->write_tree($tree);
print $tree_string;

Alternatively, my $tree_string = $tree->simplify_to_leaves_string() 
might give you want you want.


From stephan.rosecker at ish.de  Mon Mar 10 16:27:57 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Mon, 10 Mar 2008 17:27:57 +0100
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
Message-ID: <web-15326627@mail-be-01.mail01.ish.de>

Dear list,

I try to understand how to fetch unigene-cluster with help of bio-db and a local 
biosql-db, but without success.
I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".

#!/usr/local/bin/perl -w

use strict;
use Bio::DB::BioDB;
use Bio::DB::Query::BioQuery;

my $db = Bio::DB::BioDB->new(
         -database => 'biosql',
         -user   => 'postgres',
         -pass   => 'foo',
         -dbname   => 'bioseqdb',
         -host     => 'foo.bar',
         -port     => 5435,    # optional
         -driver   => 'Pg'
);

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections(
          ["Bio::PrimarySeqI c::subject",
          "Bio::PrimarySeqI p::object",
         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
$query->where(["p.accession_number = 'Hs.2'"]);

my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
my $qres = $adp->find_by_query($query);
my $qres2 = $adp2->find_by_query($query);

while(my $pseq = $qres->next_object()) {
         print $pseq->accession_number,?\n?;
}
while(my $pseq = $qres2->next_object()) {
         print $pseq->accession_number,?\n?;
}

Maybe this way is wrong.
Hope you can help me.

stephan
  

From hlapp at gmx.net  Tue Mar 11 02:53:47 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 22:53:47 -0400
Subject: [Bioperl-l] how to get unigene-cluster with bio-db
In-Reply-To: <web-15326627@mail-be-01.mail01.ish.de>
References: <web-15326627@mail-be-01.mail01.ish.de>
Message-ID: <A95440B6-AFFA-4F45-B90F-B74D3C0770C7@gmx.net>

Stephan -

what is the result that you are getting? Do you receive an error? Or  
simply nothing?

(BTW note that the object adaptor that you let the query execute will  
determine what you kind of object you get in return. Hence, I'd  
expect your $qres2 to return Bio::ClusterI compliant objects, not  
Bio::PrimarySeqI ones. That is probably not at the root of the  
problem here, though.)

	-hilmar

On Mar 10, 2008, at 12:27 PM, stephan.rosecker wrote:

> Dear list,
>
> I try to understand how to fetch unigene-cluster with help of bio- 
> db and a local biosql-db, but without success.
> I have transfered "Hs.data" with help of "bp_load_seqdatabase.pl".
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use Bio::DB::BioDB;
> use Bio::DB::Query::BioQuery;
>
> my $db = Bio::DB::BioDB->new(
>         -database => 'biosql',
>         -user   => 'postgres',
>         -pass   => 'foo',
>         -dbname   => 'bioseqdb',
>         -host     => 'foo.bar',
>         -port     => 5435,    # optional
>         -driver   => 'Pg'
> );
>
> my $query = Bio::DB::Query::BioQuery->new();
>
> $query->datacollections(
>          ["Bio::PrimarySeqI c::subject",
>          "Bio::PrimarySeqI p::object",
>         "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
> $query->where(["p.accession_number = 'Hs.2'"]);
>
> my $adp = $db->get_object_adaptor('Bio::PrimarySeqI');
> my $adp2 = $db->get_object_adaptor('Bio::ClusterI');
> my $qres = $adp->find_by_query($query);
> my $qres2 = $adp2->find_by_query($query);
>
> while(my $pseq = $qres->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
> while(my $pseq = $qres2->next_object()) {
>         print $pseq->accession_number,?\n?;
> }
>
> Maybe this way is wrong.
> Hope you can help me.
>
> stephan
>  _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Tue Mar 11 03:17:01 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 10 Mar 2008 23:17:01 -0400
Subject: [Bioperl-l] Bio::Ontology::OntologyI
In-Reply-To: <e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
References: <47D2C36C.2020802@cornell.edu>
	<DA1ACCA1-02CA-4662-A50F-DF714A8E05CB@gmx.net>
	<e6c79c380803091834u534bfb9p9fbf41cb06485e83@mail.gmail.com>
	<7412B724-57B8-4851-9E71-806722DE8A76@gmx.net>
	<e6c79c380803091926q293b2c7aq8a6b07c56c25ac42@mail.gmail.com>
	<741AD3F0-4EE6-4CF9-A95F-53749AEB0FDA@gmx.net>
	<e6c79c380803100809s5ad9648eq976610910e9432f1@mail.gmail.com>
Message-ID: <D355F5D0-756C-4CED-A341-496B23DD5A8B@gmx.net>


On Mar 10, 2008, at 11:09 AM, Naama Menda wrote:

>
>
> On Sun, Mar 9, 2008 at 10:43 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Mar 9, 2008, at 10:26 PM, Naama Menda wrote:
>>
>> On Sun, Mar 9, 2008 at 10:13 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>> On Mar 9, 2008, at 9:34 PM, Naama Menda wrote:
>>
>>> My main problem is that go-perl does not handle updates, so if I  
>>> want to update GO I need an empty schema. We find it more  
>>> complicated to re-load our annotations than to update cvterms and  
>>> their related data.
>>> Our loading script compares an existing load of an ontology to  
>>> the obo file and updates/insets/deletes accordingly.
>>
>> load_ontology.pl in bioperl-db should have all this functionality,  
>> though of course that doesn't give you the typedef support (yet).
>>
>> Will you add this support to obo.pm? I had a look at it and it  
>> seems easy to implement.
>> Will there be a patch? Or in the next Bioperl release?
>
> If you have ideas for how to implement this we'd be thrilled if you  
> can provide a patch.
>
> Most changes in BioPerl happen because and by people who have an  
> itch to scratch. Seems like this one is right down your alley?
>
> I'd in principle be interested in doing this too but can't give any  
> promises as to when I might have time (unless I need it myself :)
>
> I'll try to provide a patch for this. I'll let you know how it goes..

That'd be awesome! Don't hesitate to let us know if you hit bumps.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From stephan.rosecker at ish.de  Tue Mar 11 11:08:08 2008
From: stephan.rosecker at ish.de (stephan.rosecker)
Date: Tue, 11 Mar 2008 12:08:08 +0100
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
Message-ID: <web-15341068@mail-be-01.mail01.ish.de>

Dear list,

I have started the "bp_load_seqdatabase.pl" script from the "bioperl-db-1.5.2_100" 
package with the unigene
"Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS is postgres on a 
similar machine.
BioSQL core schema is v1.0.0..

The job runs since friday.

./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb --dbuser foo --dbpass 
bar --driver Pg --format ClusterIO::unigene ../ncbi/Hs.data

Is it normal that it takes so long?
What are your experiences?

best regards
stephan


From diriano at uni-potsdam.de  Tue Mar 11 11:55:54 2008
From: diriano at uni-potsdam.de (=?UTF-8?B?RGllZ28gTWF1cmljaW8gUmlhwpbDsW8gUGFjaMOzwpdu?=)
Date: Tue, 11 Mar 2008 12:55:54 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <47D6734A.5060103@uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using 
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' => 
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
   my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
   my $blastresult=$writertxt->to_string($result);
   while (my $hit = $result->next_hit){
     print $hit->name."\n";
   }
}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diriano.vcf
Type: text/x-vcard
Size: 324 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080311/85980f6c/attachment-0004.vcf>

From diriano at uni-potsdam.de  Tue Mar 11 13:31:35 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 14:31:35 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205242295.47d689b7186ad@webmail.uni-potsdam.de>

Dear all,

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n";
  }
}
-- 

Diego Mauricio Ria?o-Pach?n
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From diriano at uni-potsdam.de  Tue Mar 11 14:25:19 2008
From: diriano at uni-potsdam.de (diriano at uni-potsdam.de)
Date: Tue, 11 Mar 2008 15:25:19 +0100
Subject: [Bioperl-l] problem with SearchIO and writer
Message-ID: <1205245519.47d6964fcaa48@webmail.uni-potsdam.de>

Dear all,

Please excuse my previous e-mail, it was incomplete, here it is again:

I have a small problem parsing a BLAST report with SearchIO and using
TextResultWriter.

I have a large file with several BLAST results, I instantiate SearchIO as:

my $searchio = new Bio::SearchIO ('-format' => 'blast', '-file' =>
"$blastFile");

##then I extract each individual report as
while (my $result = $searchio->next_result){
  my $writertxt = Bio::SearchIO::Writer::TextResultWriter->new();
  my $blastresult=$writertxt->to_string($result);
  while (my $hit = $result->next_hit){
    print $hit->name."\n"; #I will do further processing of the HSPs
  }
}

But I do not get any output.  It works if I comment the lines referencing the
$writertxt.  The problem is that I need to extract the whole report to later
insert it ($blastresult) into a database. But I also need to process each hit
and hsp.

Any idea how can I accomplish this?  Any help will be greatly appreciated.

Have a nice day,

Diego
-- 

Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
University of Potsdam
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From sac at bioperl.org  Tue Mar 11 19:04:48 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 11 Mar 2008 12:04:48 -0700
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
Message-ID: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>

Ditto. Thanks for biting the bullet, Hilmar.

BTW, I put together a little compendium/review of various bioinformatics
data models a few months ago, where I mention BioSQL among others, but I
never really announced it:

http://biodatamodel.org/

It thought about wikifying it to get the community involved in maintaining
it, but haven't gotten around to it yet.

Feedback is welcome.

Cheers,
Steve

On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:

> Same here.  Great news!
>
> chris
>
> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>
> > BIOSQL V1.0.0 RELEASED
> > http://news.open-bio.org/archives/2008_03.html#000094
> >
> >
> > Congratulations, Hilmar!
> >
> >               -Heikki
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cain.cshl at gmail.com  Tue Mar 11 19:29:41 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Tue, 11 Mar 2008 15:29:41 -0400
Subject: [Bioperl-l] Nightly build archives now available
In-Reply-To: <47D321A0.9010209@sendu.me.uk>
References: <BAB6978A-D50E-4C4A-9FC9-EAAD557B935F@uiuc.edu>
	<1204921049.6467.9.camel@frissell>
	<BCE20903-01C1-4AC3-B8F6-89AC68F6803E@uiuc.edu>
	<1204928753.6467.19.camel@frissell>  <47D321A0.9010209@sendu.me.uk>
Message-ID: <1205263781.6220.37.camel@frissell>

Hi Sendu,

Sorry about that; I diffed the version I had with what was in svn, but
apparently didn't look closely at the results.  Do you happen to know
the best way of reverting with svn?

After it gets reverted one way or the other, I agree that overriding
prompt to include another argument is a good way to go.  That way the
value of $accept can be passed to it and it just does the right thing,
regardless of when/where it is getting called.  I'll do that.

Scott

On Sat, 2008-03-08 at 23:30 +0000, Sendu Bala wrote:
> Scott Cain wrote:
> > OK, I added my 'accept the defaults' option.  Use it like this:
> > 
> >   perl Build.PL --accept 1
> 
> Thanks for that Scott, but can you revert and have another go at that 
> commit, because you ended up wiping out the recent commits by Chris and 
> myself.
> 
> Also, rather than individually alter the Bioperl-specific methods like 
> choose_scripts(), is there perhaps a cleaner way to catch every prompt, 
> perhaps by overriding prompt() itself? Other questions may get added in 
> the future, and some existing questions aren't immediately obvious, so 
> it would be nice to be sure an automated process like a cron job /never/ 
> gets asked a question.
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hlapp at gmx.net  Tue Mar 11 21:34:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 11 Mar 2008 17:34:30 -0400
Subject: [Bioperl-l] time-consuming bp_load_seqdatabase.pl
In-Reply-To: <web-15341068@mail-be-01.mail01.ish.de>
References: <web-15341068@mail-be-01.mail01.ish.de>
Message-ID: <B4761ACF-F3D2-42A4-88D9-6C9790C4BAFB@gmx.net>

It won't be fast, as it will create about ~6 Mln bioentries in your  
database. However, it running since Friday sounds on the high end.

The first step I recommend doing when running into this kind of  
situation is checking the CPU load that the script generates,  
compared to the load generated by the database server. If the  
script's CPU load is significantly less than ~10% then it is likely  
that your database is too slow.

There are various possible reasons why it may be too slow, ranging  
from limited resources, to grossly suboptimal configuration. If your  
database is running on the same 15GB server then resources should not  
be an issue (assuming that you don't have a totally antiquated CPU  
there). You might still want to check the PostgreSQL config file,  
though. What I would suspect though is that you didn't VACUUM the  
database before and/or during the load. That will make the indexes  
used for lookup increasingly slow as a large amount of data accumulates.

Does this ring a bell?

	-hilmar

On Mar 11, 2008, at 7:08 AM, stephan.rosecker wrote:

> Dear list,
>
> I have started the "bp_load_seqdatabase.pl" script from the  
> "bioperl-db-1.5.2_100" package with the unigene
> "Hs.data". It runs on a 7 processot machine with 15GB ram. The DBMS  
> is postgres on a similar machine.
> BioSQL core schema is v1.0.0..
>
> The job runs since friday.
>
> ./bp_load_seqdatabase.pl --host foo --port 5435 --dbname bioseqdb -- 
> dbuser foo --dbpass bar --driver Pg --format ClusterIO::unigene ../ 
> ncbi/Hs.data
>
> Is it normal that it takes so long?
> What are your experiences?
>
> best regards
> stephan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From vuhlhorn at ramapo.edu  Wed Mar 12 18:51:50 2008
From: vuhlhorn at ramapo.edu (Victoria Lyn Uhlhorn)
Date: Wed, 12 Mar 2008 14:51:50 -0400 (EDT)
Subject: [Bioperl-l] HOWTO:Trees module
Message-ID: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>

The following script:

#!/usr/bin/perl -w

use CGI ':standard';
use Bio::Perl;
use Bio::Align::ProteinStatistics;
use Bio::Tree::DistanceFactory;
use Bio::TreeIO;

print header;

print start_html(-bgcolor=>"pink", -title=>('Phylogenetic Tree'), -style=>{-
src=>$style}, -class=>Ltitle), p(), 'Tree';

print start_form, hr;

my $alnio= Bio::AlignIO->new(-file => '/Users/glitterchix4u/Sites/CGI-
bin/HepatitisSerineProt.clustalw',
                             -format => 'clustalw');
my $profactory = Bio::Tree::DistanceFactory->new(-mnethod => 'NJ');
my $stats = Bio::Align::ProteinStatistics->new;
my $treeout = Bio::TreeIO->new(-format => 'newick');

my $tree;
while(my $aln = $alnio->next_aln)
{
	my $mat = $stats->distance(-method => 'Kimura',
	                           -align => $aln);
	$tree = $profactory->make_tree($mat);
	$treeout->write_tree($tree);
}

#$treeout->print_tree($tree);
print "Tree is: ", $tree->size;

print end_form;
print end_html;

How do I print the tree?  I'm having a hard time printing the tree out.


From bix at sendu.me.uk  Wed Mar 12 23:20:01 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 12 Mar 2008 23:20:01 +0000
Subject: [Bioperl-l] HOWTO:Trees module
In-Reply-To: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
References: <20080312145150.ASS93757@msg-1.mail.ramapo.edu>
Message-ID: <47D86521.1010705@sendu.me.uk>

Victoria Lyn Uhlhorn wrote:
> my $treeout = Bio::TreeIO->new(-format => 'newick');

> 	$treeout->write_tree($tree);

> How do I print the tree?  I'm having a hard time printing the tree out.

Your TreeIO will write its trees to the file or filehandle you give it. 
But you haven't given it one. Give it one are write_tree() will then 
cause the tree to be 'printed' there.

If you want the trees stored in a string so you can print() them, there 
are ways to open a filehandle onto a scalar variable.


From hlapp at gmx.net  Thu Mar 13 22:51:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 18:51:13 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
Message-ID: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>

(this is more of a bioperl question than a biosql one)

The load_ncbi_taxonomy.pl script is designed to update the taxon  
tables in a non-disruptive way, and if there weren't many changes  
shouldn't actually take that long (except that recalculating the  
nested set values may take a couple of minutes).

Bioperl-db will store the taxon information it finds in the  
Bio::Species object if it can't locate the taxon by lookup, and will  
not raise an error. The problem with this is that it relies on the  
Bio::SeqIO parser to have gotten the species and lineage information  
correct, which is sometimes a wrong assumption for exotic species.  
Most often the error will not manifest itself at the time of storing  
the erroneously parsed information, but when it is re-retrieved and  
used to populate a Bio::Species object.

For the SymAtlas project we had this situation (new species in  
sequence updates that the last NCBI taxonomy update hadn't yet  
brought in) quite regularly. I wrote a SQL script would fix those  
'haphazard' additions such that load_ncbi_taxonomy would update them  
to their correct values come the next NCBI taxonomy update. I can  
send you the script (it would be for the Oracle version), but I'm not  
sure this is a widely viable strategy.

	-hilmar

On Mar 13, 2008, at 11:06 AM, Peter wrote:

> Dear list,
>
> One of the unresolved issues with Biopython's BioSQL interface is
> dealing with the NCBI taxon ID when loading sequences into the
> database.
>
> As I understand it, ideally before loading any sequences, the user
> will have loaded in the entire NCBI taxonomy using the
> load_ncbi_taxonomy.pl script, as I described here:
> http://biopython.org/wiki/BioSQL#NCBI_Taxonomy
>
> When a new sequence is added to the database with a known taxon id,
> there is no problem.  But happens if its a recently sequenced organism
> which isn't defined yet in the BioSQL taxonomy tables?  Could/should
> the user re-run load_ncbi_taxonomy.pl, and then load in their new
> sequence?
>
> Right now in Biopython due what appears to have been intended as a
> short term hack, we simple don't record the taxon id at all (!), and I
> would like to fix this (bug 2422).
> http://bugzilla.open-bio.org/show_bug.cgi?id=2422
>
> How do BioPerl et al deal with this issue?  Do they try and update the
> taxonomy tables using the available information in the new record's
> annotation (i.e. the new taxon id and the species name)?  Do they
> lookup the NCBI taxonomy definition via the internet?  Do they throw
> an error and halt?
>
> Thanks,
>
> Peter
> (Biopython)
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Thu Mar 13 23:41:43 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 13 Mar 2008 19:41:43 -0400
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
Message-ID: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>


On Mar 13, 2008, at 7:13 PM, Peter wrote:

> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> [...]
>>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>>  tables in a non-disruptive way, and if there weren't many changes
>>  shouldn't actually take that long (except that recalculating the
>>  nested set values may take a couple of minutes).
>
> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> could write some minimal taxonomy entry (without any guess work based
> on the species name), in order to record the sequence's taxon

This is what Bioperl-db does. There isn't any guesswork. If  
Bio::Species has lineage information it will also insert the lineage  
information, though.

> - and then running an improved load_ncbi_taxonomy.pl at a later  
> date would
> sort out the proper taxonomy?

If I remember correctly, the script makes (and hence expects) the  
primary key and the NCBI taxonomy ID to be identical. If your loading  
procedure can achieve that already then load_ncbi_taxonomy.pl should  
pick them up and fix them. You can try that by loading the taxonomy  
through the script, then arbitrarily choose a taxon, create a stub  
bioentry for it and set its taxon_id foreign key to the chosen  
taxon,  change its taxon_name.name to some bogus value (for the  
'scientific name' class, for example) (and feel free to change the  
left_id and right_id values in taxon too), and rerun the script. It  
should fix the change you made, and your bioentry should still point  
to the same taxon (because its primary key did not change, and did  
not get deleted either; otherwise the bioentry would now have a null  
value in the foreign key).

The Bioperl-db way of storing things does not give control over  
primary key assignment to Bioperl-db, so the database will assign it.

> [...]
>>  For the SymAtlas project we had this situation (new species in
>>  sequence updates that the last NCBI taxonomy update hadn't yet
>>  brought in) quite regularly. I wrote a SQL script would fix those
>>  'haphazard' additions such that load_ncbi_taxonomy would update them
>>  to their correct values come the next NCBI taxonomy update. I can
>>  send you the script (it would be for the Oracle version), but I'm  
>> not
>>  sure this is a widely viable strategy.
>
> So this wasn't integrated with load_ncbi_taxonomy.pl at all?

No, but now that you say it I don't see any reason why I couldn't.  
Maybe that's just what I should do.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From biopython at maubp.freeserve.co.uk  Thu Mar 13 23:13:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 13 Mar 2008 23:13:32 +0000
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
Message-ID: <320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>

On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> (this is more of a bioperl question than a biosql one)

Well, yes and no.  And I'm not subscribed to the Bioperl list, nor the
BioJava one, nor the BioRuby one.

>  The load_ncbi_taxonomy.pl script is designed to update the taxon
>  tables in a non-disruptive way, and if there weren't many changes
>  shouldn't actually take that long (except that recalculating the
>  nested set values may take a couple of minutes).

Do you think when faced with a novel taxon id, Biopython/BioPerl/...
could write some minimal taxonomy entry (without any guess work based
on the species name), in order to record the sequence's taxon - and
then running an improved load_ncbi_taxonomy.pl at a later date would
sort out the proper taxonomy?

>  Bioperl-db will store the taxon information it finds in the
>  Bio::Species object if it can't locate the taxon by lookup, and will
>  not raise an error. The problem with this is that it relies on the
>  Bio::SeqIO parser to have gotten the species and lineage information
>  correct, which is sometimes a wrong assumption for exotic species.
>  Most often the error will not manifest itself at the time of storing
>  the erroneously parsed information, but when it is re-retrieved and
>  used to populate a Bio::Species object.

This is what I would like to avoid with Biopython.

>  For the SymAtlas project we had this situation (new species in
>  sequence updates that the last NCBI taxonomy update hadn't yet
>  brought in) quite regularly. I wrote a SQL script would fix those
>  'haphazard' additions such that load_ncbi_taxonomy would update them
>  to their correct values come the next NCBI taxonomy update. I can
>  send you the script (it would be for the Oracle version), but I'm not
>  sure this is a widely viable strategy.

So this wasn't integrated with load_ncbi_taxonomy.pl at all?

Peter


From hlapp at gmx.net  Fri Mar 14 04:00:40 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 14 Mar 2008 00:00:40 -0400
Subject: [Bioperl-l] bioperl basics
In-Reply-To: <20080313.195825.6855.0@webmail20.vgs.untd.com>
References: <20080313.195825.6855.0@webmail20.vgs.untd.com>
Message-ID: <0A104B1F-315F-418C-A6DA-84FF04CC438C@gmx.net>

John - you want to send this to the BioPerl mailing list at
bioperl-l at lists.open-bio.org.

Your problem really is a Perl problem though, not BioPerl. The most  
likely cause is that you don't have the Cache::FileCache module  
installed, so that's what I would do.

The answer to your question for how to change @INC is using -I on the  
command line, 'use lib' in your script, or set the PERL5LIB  
environment variable.

	-hilmar

On Mar 14, 2008, at 1:58 AM, mrphysh at juno.com wrote:

> I am a molecular biologist studying bioinformatics from a Perl  
> background and making progress.  I am realizing that without  
> tapping into the existing infrastructure, I will be writing code  
> for ever.  Bioperl is the path for me.  I am moving forward.
>
> the error I encounter is
>
> can't locate Cache/FileCache in @INC (@INC contains /etc/perl/ /usr/ 
> locaql/lib/perl/5.8.8 .....)    and so forth.
>
> I found the files in a home directory.  I must have told the  
> install to put them there...?
>
>
> anyway:  How do I edit this environmental variable..... @INC.  I  
> cannot find anything in my book.
>
> thanks
> john brigham
>
>
> I will be writing code for years and need to tap into the
> _____________________________________________________________
> Need cash? Click to get an emergency loan, bad credit ok
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> Ioyw6i3mKmyQsg01zMPK1Qa0178ZfajwTEBgEXdzlmb9zLLZc8pLOU/
>
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From markjschreiber at gmail.com  Fri Mar 14 13:48:38 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Fri, 14 Mar 2008 21:48:38 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
Message-ID: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>

>From memory BioJava will add it if it is not already in there. If the
taxid can be found then the system connects you with whatever is in
that taxid, it doesn't overwrite it.

This has two curious side effects. Because the details associated with
a taxid sometimes change (eg common name changes a lot) you can get
connected to an outdated version (if your record is newer than your
NCBI taxonomy) or you can get connected with a version that is newer
than your record which means when you round-trip you don't get
complete identity.

For compatibility across the projects some kind of consensus would be good.

- Mark

On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
>
> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>
> > On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >> [...]
>
> >>  The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>  tables in a non-disruptive way, and if there weren't many changes
> >>  shouldn't actually take that long (except that recalculating the
> >>  nested set values may take a couple of minutes).
> >
> > Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> > could write some minimal taxonomy entry (without any guess work based
> > on the species name), in order to record the sequence's taxon
>
> This is what Bioperl-db does. There isn't any guesswork. If
> Bio::Species has lineage information it will also insert the lineage
> information, though.
>
>
> > - and then running an improved load_ncbi_taxonomy.pl at a later
> > date would
> > sort out the proper taxonomy?
>
> If I remember correctly, the script makes (and hence expects) the
> primary key and the NCBI taxonomy ID to be identical. If your loading
> procedure can achieve that already then load_ncbi_taxonomy.pl should
> pick them up and fix them. You can try that by loading the taxonomy
> through the script, then arbitrarily choose a taxon, create a stub
> bioentry for it and set its taxon_id foreign key to the chosen
> taxon,  change its taxon_name.name to some bogus value (for the
> 'scientific name' class, for example) (and feel free to change the
> left_id and right_id values in taxon too), and rerun the script. It
> should fix the change you made, and your bioentry should still point
> to the same taxon (because its primary key did not change, and did
> not get deleted either; otherwise the bioentry would now have a null
> value in the foreign key).
>
> The Bioperl-db way of storing things does not give control over
> primary key assignment to Bioperl-db, so the database will assign it.
>
> > [...]
>
> >>  For the SymAtlas project we had this situation (new species in
> >>  sequence updates that the last NCBI taxonomy update hadn't yet
> >>  brought in) quite regularly. I wrote a SQL script would fix those
> >>  'haphazard' additions such that load_ncbi_taxonomy would update them
> >>  to their correct values come the next NCBI taxonomy update. I can
> >>  send you the script (it would be for the Oracle version), but I'm
> >> not
> >>  sure this is a widely viable strategy.
> >
> > So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>
> No, but now that you say it I don't see any reason why I couldn't.
> Maybe that's just what I should do.
>
>        -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
>
>
>
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>


From cjfields at uiuc.edu  Fri Mar 14 14:31:09 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 14 Mar 2008 09:31:09 -0500
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
Message-ID: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>

The counter to that perspective (using new sequences with old tax  
info) would be to regularly update NCBI taxonomy, particularly in  
circumstances prior to adding new sequences.  Hilmar mentioned that  
once tax is loaded it doesn't take as long to update, so you could set  
up a cron job to update regularly.

I remember someone mentioning weekly or monthly updates on the list  
quite a while ago, but I'm unsure how often NCBI updates tax  
information (i.e. with every release, monthly, weekly, etc).  I can  
see instances popping up where you used the an up-to-date taxonomy but  
a new sequence contains a tax ID not present.  I think bioperl-db  
handles these but I'm not sure what other Bio* do.

chris

On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:

>> From memory BioJava will add it if it is not already in there. If the
> taxid can be found then the system connects you with whatever is in
> that taxid, it doesn't overwrite it.
>
> This has two curious side effects. Because the details associated with
> a taxid sometimes change (eg common name changes a lot) you can get
> connected to an outdated version (if your record is newer than your
> NCBI taxonomy) or you can get connected with a version that is newer
> than your record which means when you round-trip you don't get
> complete identity.
>
> For compatibility across the projects some kind of consensus would  
> be good.
>
> - Mark
> On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>
>>
>> On Mar 13, 2008, at 7:13 PM, Peter wrote:
>>
>>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>> [...]
>>
>>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
>>>> tables in a non-disruptive way, and if there weren't many changes
>>>> shouldn't actually take that long (except that recalculating the
>>>> nested set values may take a couple of minutes).
>>>
>>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
>>> could write some minimal taxonomy entry (without any guess work  
>>> based
>>> on the species name), in order to record the sequence's taxon
>>
>> This is what Bioperl-db does. There isn't any guesswork. If
>> Bio::Species has lineage information it will also insert the lineage
>> information, though.
>>
>>
>>> - and then running an improved load_ncbi_taxonomy.pl at a later
>>> date would
>>> sort out the proper taxonomy?
>>
>> If I remember correctly, the script makes (and hence expects) the
>> primary key and the NCBI taxonomy ID to be identical. If your loading
>> procedure can achieve that already then load_ncbi_taxonomy.pl should
>> pick them up and fix them. You can try that by loading the taxonomy
>> through the script, then arbitrarily choose a taxon, create a stub
>> bioentry for it and set its taxon_id foreign key to the chosen
>> taxon,  change its taxon_name.name to some bogus value (for the
>> 'scientific name' class, for example) (and feel free to change the
>> left_id and right_id values in taxon too), and rerun the script. It
>> should fix the change you made, and your bioentry should still point
>> to the same taxon (because its primary key did not change, and did
>> not get deleted either; otherwise the bioentry would now have a null
>> value in the foreign key).
>>
>> The Bioperl-db way of storing things does not give control over
>> primary key assignment to Bioperl-db, so the database will assign it.
>>
>>> [...]
>>
>>>> For the SymAtlas project we had this situation (new species in
>>>> sequence updates that the last NCBI taxonomy update hadn't yet
>>>> brought in) quite regularly. I wrote a SQL script would fix those
>>>> 'haphazard' additions such that load_ncbi_taxonomy would update  
>>>> them
>>>> to their correct values come the next NCBI taxonomy update. I can
>>>> send you the script (it would be for the Oracle version), but I'm
>>>> not
>>>> sure this is a widely viable strategy.
>>>
>>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
>>
>> No, but now that you say it I don't see any reason why I couldn't.
>> Maybe that's just what I should do.
>>
>>       -hilmar
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>>
>>
>>
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From markjschreiber at gmail.com  Sat Mar 15 00:56:37 2008
From: markjschreiber at gmail.com (Mark Schreiber)
Date: Sat, 15 Mar 2008 08:56:37 +0800
Subject: [Bioperl-l] [BioSQL-l] Loading sequences with novel NCBI taxon
	id
In-Reply-To: <CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
References: <320fb6e00803130806w46148bacm54c3ead9a50b038f@mail.gmail.com>
	<32EB5B0C-4CC8-4C33-9F41-5D4465B6AC48@gmx.net>
	<320fb6e00803131613o20eae2b7y325814ef26d2738f@mail.gmail.com>
	<CEA4F4E7-A66B-4C62-AE32-511E177BC485@gmx.net>
	<93b45ca50803140648s5098a7d0sec621f448ef03040@mail.gmail.com>
	<CE3675B2-2AFD-46AA-A348-16C9FEA51E0E@uiuc.edu>
Message-ID: <93b45ca50803141756m3d7f022cnb57bd39f37270682@mail.gmail.com>

I agree. A regular update would be best.

Of course if your BioSQL db is limited to one or a few organisms you can
just keep a fragment of the db.

- Mark

On Fri, Mar 14, 2008 at 10:31 PM, Chris Fields <cjfields at uiuc.edu> wrote:

> The counter to that perspective (using new sequences with old tax
> info) would be to regularly update NCBI taxonomy, particularly in
> circumstances prior to adding new sequences.  Hilmar mentioned that
> once tax is loaded it doesn't take as long to update, so you could set
> up a cron job to update regularly.
>
> I remember someone mentioning weekly or monthly updates on the list
> quite a while ago, but I'm unsure how often NCBI updates tax
> information (i.e. with every release, monthly, weekly, etc).  I can
> see instances popping up where you used the an up-to-date taxonomy but
> a new sequence contains a tax ID not present.  I think bioperl-db
> handles these but I'm not sure what other Bio* do.
>
> chris
>
> On Mar 14, 2008, at 8:48 AM, Mark Schreiber wrote:
>
> >> From memory BioJava will add it if it is not already in there. If the
> > taxid can be found then the system connects you with whatever is in
> > that taxid, it doesn't overwrite it.
> >
> > This has two curious side effects. Because the details associated with
> > a taxid sometimes change (eg common name changes a lot) you can get
> > connected to an outdated version (if your record is newer than your
> > NCBI taxonomy) or you can get connected with a version that is newer
> > than your record which means when you round-trip you don't get
> > complete identity.
> >
> > For compatibility across the projects some kind of consensus would
> > be good.
> >
> > - Mark
> > On Fri, Mar 14, 2008 at 7:41 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>
> >>
> >> On Mar 13, 2008, at 7:13 PM, Peter wrote:
> >>
> >>> On Thu, Mar 13, 2008 at 10:51 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> >>>> [...]
> >>
> >>>> The load_ncbi_taxonomy.pl script is designed to update the taxon
> >>>> tables in a non-disruptive way, and if there weren't many changes
> >>>> shouldn't actually take that long (except that recalculating the
> >>>> nested set values may take a couple of minutes).
> >>>
> >>> Do you think when faced with a novel taxon id, Biopython/BioPerl/...
> >>> could write some minimal taxonomy entry (without any guess work
> >>> based
> >>> on the species name), in order to record the sequence's taxon
> >>
> >> This is what Bioperl-db does. There isn't any guesswork. If
> >> Bio::Species has lineage information it will also insert the lineage
> >> information, though.
> >>
> >>
> >>> - and then running an improved load_ncbi_taxonomy.pl at a later
> >>> date would
> >>> sort out the proper taxonomy?
> >>
> >> If I remember correctly, the script makes (and hence expects) the
> >> primary key and the NCBI taxonomy ID to be identical. If your loading
> >> procedure can achieve that already then load_ncbi_taxonomy.pl should
> >> pick them up and fix them. You can try that by loading the taxonomy
> >> through the script, then arbitrarily choose a taxon, create a stub
> >> bioentry for it and set its taxon_id foreign key to the chosen
> >> taxon,  change its taxon_name.name to some bogus value (for the
> >> 'scientific name' class, for example) (and feel free to change the
> >> left_id and right_id values in taxon too), and rerun the script. It
> >> should fix the change you made, and your bioentry should still point
> >> to the same taxon (because its primary key did not change, and did
> >> not get deleted either; otherwise the bioentry would now have a null
> >> value in the foreign key).
> >>
> >> The Bioperl-db way of storing things does not give control over
> >> primary key assignment to Bioperl-db, so the database will assign it.
> >>
> >>> [...]
> >>
> >>>> For the SymAtlas project we had this situation (new species in
> >>>> sequence updates that the last NCBI taxonomy update hadn't yet
> >>>> brought in) quite regularly. I wrote a SQL script would fix those
> >>>> 'haphazard' additions such that load_ncbi_taxonomy would update
> >>>> them
> >>>> to their correct values come the next NCBI taxonomy update. I can
> >>>> send you the script (it would be for the Oracle version), but I'm
> >>>> not
> >>>> sure this is a widely viable strategy.
> >>>
> >>> So this wasn't integrated with load_ncbi_taxonomy.pl at all?
> >>
> >> No, but now that you say it I don't see any reason why I couldn't.
> >> Maybe that's just what I should do.
> >>
> >>       -hilmar
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >> _______________________________________________
> >>
> >>
> >>
> >> BioSQL-l mailing list
> >> BioSQL-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biosql-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From charles-listes+bioperl at plessy.org  Mon Mar 17 04:13:11 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Mon, 17 Mar 2008 13:13:11 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
Message-ID: <20080317041311.GA3784@kunpuu.plessy.org>

Dear Bioperl developpers,

In the Debian Project, we distribute packages for Bioperl and need to
express their dependancy to other Perl modules with "Depends",
"Recommends" and "Suggests" levels. For the moment, everything that is
listed in the "recommends" hash of Build.PL is "Recommended" by our
Debian package. This means that they will be installed by default when
installing Bioperl, but that users can force their removal if needed.

Being "Recommended" also means in Debian that if the recommended module
is not available, then the Debian bioperl package will not reach our
internal quality criteria for being part of our stable release.
Therefore I would like to know if you think that some of the modules
recommeded by Bioperl through the "recommends" hash of Build.PL are less
important than others, i.e. that we can just "Suggest" them in our
dependancy system. "Suggested" packages are not installed by default.

The complete definition of the meaning of "Depends", "Recommends" and
"Suggests" for Debian packages can be found in the section 7.2 of the
Debian policy:

http://www.debian.org/doc/debian-policy/ch-relationships.html

Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
using 1.5.2 in its next stable release. We welcome your comments on this
as well.

The Debian package for Bioperl 1.4: http://packages.debian.org/lenny/bioperl
and for Bioperl 1.5.2: http://packages.debian.org/sid/bioperl

(A copy of this email has been sent to the mailing list of the
Debian-Med project).

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wak?, Saitama, Japan


From David.Messina at sbc.su.se  Mon Mar 17 15:38:28 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 17 Mar 2008 16:38:28 +0100
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <20080317041311.GA3784@kunpuu.plessy.org>
References: <20080317041311.GA3784@kunpuu.plessy.org>
Message-ID: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>

Hi Charles,

Thanks for your note.

All of the BioPerl 'recommended' modules involve optional functionality, so
I would think all of them would map to 'suggested' under Debian so they
won't be installed by default.

For everyone else, this is the list of recommended modules he's talking
about:
Ace
Class::AutoClass
Clone
Convert::Binary::C
Data::Stag::XMLWriter
GD
GD::SVG
Graph
HTML::Entities
HTML::Parser
HTTP::Request::Common
LWP::UserAgent
PostScript::TextBlock
Set::Scalar
SOAP::Lite
Spreadsheet::ParseExcel
Storable
SVG
SVG::Graph
Text::Shellwords
URI::Escape
XML::DOM::XPath
XML::Parser
XML::Parser::PerlSAX
XML::SAX
XML::SAX::Writer
XML::Twig
XML::Writer


Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
> using 1.5.2 in its next stable release. We welcome your comments on this
> as well.
>

I think the consensus here would be that 1.5.2 is the appropriate version of
Bioperl to use in the next stable release of Debian. Although we've started
to work toward Bioperl 1.6, that release will be at least a few months off,
and 1.4, while technically our most recent 'stable' release, is waaay out of
date.


Dave


From mrphysh at juno.com  Mon Mar 17 22:27:21 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Mon, 17 Mar 2008 22:27:21 GMT
Subject: [Bioperl-l] bioperl email list
Message-ID: <20080317.162721.27257.1@webmail19.vgs.untd.com>

Hello bioperl people.

I am a Perl programmer/molecular biologist/nice guy.  I am wandering around within the bioinformatics arena and making progress.

I am realizing that I will be writing code forever unless I can tap into the existing infrastructure.  For me that appears to be bioperl.

I would like to be part of the bioperl community.  I subscribed to the bioperl list and got a return email, but have never received an email.  Did I do something wrong?  Could you look into this please?

The truth is:  these object are blowing me away and I need help.

John S. Brigham
13810 Braun Drive
Golden, Colorado 80401
303-216-0994
mrphysh2juno.com 
_____________________________________________________________
Make money while staying at home.  Click here for information on top-notch home businesses.
http://thirdpartyoffers.juno.com/TGL2121/fc/Ioyw6i3l5e2nlxbjZebFePkm5lBTaKaRlgHsk8Xt4yjn3c9lhQhoRW/


From hlapp at gmx.net  Tue Mar 18 03:44:11 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 17 Mar 2008 23:44:11 -0400
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
Message-ID: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>

On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:

> Hi Charles,
>
> Thanks for your note.
>
> All of the BioPerl 'recommended' modules involve optional  
> functionality, so
> I would think all of them would map to 'suggested' under Debian so  
> they
> won't be installed by default.

I would probably elevate LWP to 'recommended.' Other than that I agree.

	-hilmar

>
> For everyone else, this is the list of recommended modules he's  
> talking
> about:
> Ace
> Class::AutoClass
> Clone
> Convert::Binary::C
> Data::Stag::XMLWriter
> GD
> GD::SVG
> Graph
> HTML::Entities
> HTML::Parser
> HTTP::Request::Common
> LWP::UserAgent
> PostScript::TextBlock
> Set::Scalar
> SOAP::Lite
> Spreadsheet::ParseExcel
> Storable
> SVG
> SVG::Graph
> Text::Shellwords
> URI::Escape
> XML::DOM::XPath
> XML::Parser
> XML::Parser::PerlSAX
> XML::SAX
> XML::SAX::Writer
> XML::Twig
> XML::Writer
>
>
>
> Debian distributes versions 1.4 and 1.5.2 of Bioperl, but considers
>> using 1.5.2 in its next stable release. We welcome your comments  
>> on this
>> as well.
>>
>
> I think the consensus here would be that 1.5.2 is the appropriate  
> version of
> Bioperl to use in the next stable release of Debian. Although we've  
> started
> to work toward Bioperl 1.6, that release will be at least a few  
> months off,
> and 1.4, while technically our most recent 'stable' release, is  
> waaay out of
> date.
>
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From bix at sendu.me.uk  Tue Mar 18 09:29:10 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 09:29:10 +0000
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
References: <20080317041311.GA3784@kunpuu.plessy.org>	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
Message-ID: <47DF8B66.8020509@sendu.me.uk>

Hilmar Lapp wrote:
> On Mar 17, 2008, at 11:38 AM, Dave Messina wrote:
> 
>> Hi Charles,
>>
>> Thanks for your note.
>>
>> All of the BioPerl 'recommended' modules involve optional 
>> functionality, so
>> I would think all of them would map to 'suggested' under Debian so they
>> won't be installed by default.
> 
> I would probably elevate LWP to 'recommended.' Other than that I agree.

I looked at the most used external modules. Used 6 times or more:

   Data::Dumper => used 55 times
   Carp => used 51 times
   IO::String => used 25 times
   Symbol => used 19 times
   File::Spec => used 17 times
   HTTP::Request::Common => used 17 times
   POSIX => used 12 times
   DB_File => used 11 times
   Fcntl => used 11 times
   IO::File => used 11 times
   Exporter => used 10 times
   File::Temp => used 9 times
   Dumpvalue => used 8 times
   LWP::UserAgent => used 8 times
   Scalar::Util => used 8 times
   URI::Escape => used 8 times
   File::Basename => used 6 times
   File::Path => used 6 times
   XML::Writer => used 6 times

I can never remember how to figure out which of those is included with 
perl 5.6.1.

Except maybe XML::Writer, if we do want to promote anything to 
recommended, I suppose it would be those above.

I also agree with everything Dave said; if it's easier everything can be 
'suggested'.


(I reckon most if not all of the Data::Dumper and Carp usages should be 
removed)


From David.Messina at sbc.su.se  Tue Mar 18 14:30:02 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 15:30:02 +0100
Subject: [Bioperl-l] bioperl email list
In-Reply-To: <20080317.162721.27257.1@webmail19.vgs.untd.com>
References: <20080317.162721.27257.1@webmail19.vgs.untd.com>
Message-ID: <628aabb70803180730g2aae7ae0u18a9e6a360c725fa@mail.gmail.com>

Hi John,

Welcome to BioPerl!


>  I subscribed to the bioperl list and got a return email, but have never
> received an email.


As I'm sure you know, most of the time when email doesn't show up, it's
because it's been filtered as spam. I assuming you looked for this already
though.

If that's not it, then I suggest trying to log in to the mailing list server
here:
http://bioperl.org/mailman/listinfo/bioperl-l

Log in by entering your email address (the one you subscribed with) in the
last field on that page and clicking the "Unsubscribe or edit options"
button.

On the next page that comes up, type your password in the first field on the
page.

If for some reason your haven't been subscribed to the list properly, then
you will get an error here. Otherwise, you will be taken to your membership
configuration page. There you can verify. among other options, that mail
delivery is enabled.


The truth is:  these object are blowing me away and I need help.


BioPerl does have a bit of a learning curve, but fortunately there are some
good tutorials that should help you to get started. If you haven't already,
visit the HOWTO section of bioperl.org. Check out the one on BioPerl for
beginners, and then you might follow up with the SeqIO and SearchIO HOWTOs
which cover how to read and write sequences and sequence alignment program
output.

Also, there's lots of great example code in the examples folder of the
BioPerl distribution. I find looking at how other people use BioPerl is very
helpful in understanding what objects are used for what.

Finally, I'll plug the BioPerl Deobfuscator, which is a class browser for
BioPerl and available at: http://bioperl.org/cgi-bin/deob_interface.cgi

BioPerl classes tend to have multlple levels of inheritance, and the
Deobfuscator lets you see all of the methods available to objects of a given
class.


Dave


From bix at sendu.me.uk  Tue Mar 18 15:32:25 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 18 Mar 2008 15:32:25 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
Message-ID: <47DFE089.1070304@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
>> Or is the split intended to be 'core' == "anything and everything
>> that was in 1.4", '????' == "everything else"? In which case,
>> what's a good name for "modules created after 1.4"? 'crust'? ;)
> 
> Nah, "icing".
> 
> a module "use" map might be very useful to help identify "core" vs.
> other layers of mantle/crust/icing.
> 
> http://www.perlmonks.org/?node_id=87329 
> http://search.cpan.org/src/NEILB/pmusage-1.2/

Thanks for those. Neither could quite cope with BioPerl, but I've munged
them together and hacked up 'module_usage.pl' which I've just committed
to the maintenance directory of bioperl-live.

module_usage.pl ../Bio

Produces:
  *warning, may crash your browser; download it and view in a dedicated
image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg
http://bix.sendu.me.uk/files/module_usage.txt

First I considered what modules each BioPerl package (aka class, module)
'uses' (what modules does it load via 'use', 'require' or inherit from
via 'use base', excluding external (non-BioPerl) modules), then grouped
together packages that have identical usage. The graph shows all the
groups with more than one member as nodes and edges from them pointing
to the individual packages that they use. The set of those individual
packages pointed to by groups also have edges showing their
use-relationship to other members of the set (only). Members of the set
are also shaded in red. The saturation of the shade indicates how many
packages use that package (so dark red packages are used a lot).

(I had to simplify in this way because otherwise GraphViz bailed on me.
If anyone can come with nicer simplification/visualisation systems,
please do! It's important to note that there is lots of information loss
in my scheme, so you can't rely on the graph alone.)

Getting to the question on how to decide what is 'core' and on what
basis to split things up, first consider the darker red packages. Next
consider how many groups point to it. Finally consider the membership of
those groups: are they all highly related, or are they from different
'parts' of BioPerl?

For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
pointing to it, but all the members of those groups are
Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
Bio::Graphics?) could be split off cleanly if desired and not kept in
core. Bio::SimpleAlign, on the other hand, whilst not being quite as
dark a red, has 7 attached groups with members from Bio::AlignIO,
Bio::Search and Bio::Tools. You could easily argue it is more
fundamental to BioPerl and should be in core. In turn, the things that
Bio::SimpleAlign points to would also have to be in core.

I haven't done any full analysis along these lines and leave as an
exercise for the interested reader for now ;)


Chris Fields wrote:
> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> 
> I'm pretty flexible on any of that; it's a proposal only and I think
> some of it may be wrongheaded, but hey, I'm willing to take a few
> rotten tomatoes.  The key issue is we should try to work out what we
> mean by 'core' or the core library.  I have a rather extreme view of
> it as being the bare essentials without external, non-perl core
> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> and required modules for those classes) but I'm sure others would
> lump in parsers, DB functionality, etc.  I basically suggest placing
> those (and any stable but potentially non-core code) in a
> 'bioperl-main', with any unstable or untested code going into a
> 'bioperl-unstable'.

My thoughts are along these lines:
# I agree that core should have no external dependencies
# I agree that it might mostly be interfaces
# It should represent a framework with all the interfaces (that have
   stable APIs), directory structure and base classes that everything
   else relies on
# It might not do much useful bioinformatics, but provides just about
   everything needed for a dev to create a new module that does


> In essence, bioperl-main would require core and resemble a stable
> release; bioperl-unstable would require bioperl-main (and core) and
> resemble a dev release.  Not sure how versioning would go or if this
> is a viable option at all, but it's worth discussing.

# I agree that this 3-way split seems reasonable
# bioperl-main would consist primarily of the 'leaves' of the module
   tree, mostly parsers and the like which, whilst 'stable' and tested
   should still be split away from core because the data sources they
   parse could change format slightly
# bioperl-unstable, better bioperl-bleed, would feature brand-new
   stuff, be it new parsers for totally new formats, new APIs that do
   something not thought of before etc. When they are complete, bug-free
   and have stood the test of time they get moved into bioperl-main.
   (It is not a place for all new commits; bug fixes to something in
   bioperl-main would be committed to bioperl-main)
# The current splits (bioperl-run, bioperl-network etc.) do not get
   their own core and bleed variant. Anything they need for core
   functionality would enter the single bioperl-core, anything new
   would enter the single bioperl-bleed, and anything stable would
   be in their own bioperl-[package]

Discuss :)


From snoze.pa at gmail.com  Tue Mar 18 18:27:47 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 13:27:47 -0500
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
Message-ID: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>

Thanks hilmar. I am still wondering if my old problem was fixed. It is
related to when NR databases mixes files from different databases.

On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org> wrote:

> Ditto. Thanks for biting the bullet, Hilmar.
>
> BTW, I put together a little compendium/review of various bioinformatics
> data models a few months ago, where I mention BioSQL among others, but I
> never really announced it:
>
> http://biodatamodel.org/
>
> It thought about wikifying it to get the community involved in maintaining
> it, but haven't gotten around to it yet.
>
> Feedback is welcome.
>
> Cheers,
> Steve
>
> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu> wrote:
>
> > Same here.  Great news!
> >
> > chris
> >
> > On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
> >
> > > BIOSQL V1.0.0 RELEASED
> > > http://news.open-bio.org/archives/2008_03.html#000094
> > >
> > >
> > > Congratulations, Hilmar!
> > >
> > >               -Heikki
> > >
> > > --
> > > ______ _/      _/_____________________________________________________
> > >      _/      _/
> > >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> > >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> > >  _/  _/  _/  University of Western Cape, South Africa
> > >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From darin.london at duke.edu  Tue Mar 18 18:16:58 2008
From: darin.london at duke.edu (darin.london at duke.edu)
Date: Tue, 18 Mar 2008 13:16:58 -0500
Subject: [Bioperl-l] BOSC 2008 Announcement and Call For Submissions
Message-ID: <200803181816.m2IIGwOL007248@tenero.duhs.duke.edu>


BOSC 2008 Call for Abstracts

The 9th annual Bioinformatics Open Source Conference (BOSC 2008) will take place in Toronto, Ontario, Canada, as one of several Special Interest Group (SIG) meetings occurring in conjunction with the 16th annual Intelligent Systems for Molecular Biology Conference (ISMB 2008).

The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. Many Open Source bioinformatics packages are widely used by the research community across many application areas and form a cornerstone in enabling research in the genomic and post-genomic era. Open source bioinformatics software has facilitated rapid innovation and dissemination of new computational methods as well as informatics infrastructure. Since the work of the Open Source Bioinformatics Community represents some of the most cutting edge of Bioinformatics in general, the overall theme for the conference this year is "Tackling Hard Problems with Emerging Technologies". Topics under this umbrella include cyberinfrastructure, grid computing and workflow management and discovery, and visualization. We will also have a series of update talks about the main Open Source Bioinformatics Software suites.

One of the hallmarks of BOSC is the coming together of the open source developer community in one location. A face-to-face meeting of this community creates synergy where participants can work together to create use cases, prototype working code, or run bootcamps for developers from other projects as short, informal, and hands-on tutorials in new software packages and emerging technologies. In short, BOSC is not just a conference for presentations of completed work, but is a dynamic meeting where collaborative work gets done.

This year, BOSC is accepting abstract submissions on the conference theme "Tackling Hard Problems with Emerging Technologies". The conference theme reflects that there are new technologies emerging on both the scientific front (new sequencing technologies, etc.) and the IT front (workflows, mashup/web 2.0, improvements in all of the major programming languages, etc.), which may allow the open source community to solve problems that were previously intractable. Abstracts may be submitted for the following topics.

1. Cyberinfrastructure - We are interested in presentations on topics dealing with the development of infrastructure on the web to facilitate software and data re-use (mashups, or traditional), interoperability and inter-process communication, system/service discovery, and data movement and modeling in distributed systems. This may include peer-to-peer systems of data transfer, Web Services, various flavors of data representation (SOAP, JSON, XML, others), and technologies commonly referred to under the Web 2.0 paradigm (e.g. folksonomies/tagging, user-based content generation, content feeds, and Social Networking).

2. Grid Computing and Workflow Management and Discovery - We particularly invite talks that report progress in making workflow systems easier to use and on how to do distributed-collaborative research , e.g. workflows that encompass the coordination of systems running in different parts of the world.

3. Visualization - Visualization is a maturing area of open source software development. We particularly invite talks that demonstrate innovative visualization systems in the context of workflows.

4. Open Source Software - Speakers will present talks on the use, development, or philosophy of open source software in bioinformatics.

5. Bio* Open Source Project Updates - We invite abstracts from the representatives of the open source projects sponsored by or affiliated to the O|B|F (see Projects).


Please consult the official BOSC 2008 website at http://www.open-bio.org/wiki/Upcoming_BOSC_conference  for all updates and extra information.

Submission Process:
All abstracts must be submitted through our Open Conference Systems site (http://events.open-bio.org/BOSC2008/openconf.php).
The form will ask for a small Abstract Text to be pasted into it, and a full paper.  The small Abstract text should be a summary, while the longer abstract (should provide more details, including the open-source license requirement details)
Full-length abstracts are limited to one page with one inch (2.5 cm) margins on the top, sides, and bottom.  The full-length abstract should include the title, authors, and affiliations.  We prefer your abstract to be in PDF format, although plain t

Important Dates:
May 11: Abstract submission deadline.
June 2: Notification of accepted talks.
June 4: Early registration discount cut-off.
July 18-19: BOSC 2008!

We hope to see you at BOSC 2008!

Kam Dahlquist and Darin London
BOSC 2008 Co-organizers

			 
From hlapp at gmx.net  Tue Mar 18 19:07:54 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 18 Mar 2008 15:07:54 -0400
Subject: [Bioperl-l] BioSQL V1.0.0 released
In-Reply-To: <10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
References: <200803071309.25294.heikki@sanbi.ac.za>
	<7558F8C6-FE40-4BAE-BA6A-D5039B10F350@uiuc.edu>
	<8f200b4c0803111204m2ca45782w34baa0499d690cb5@mail.gmail.com>
	<10f848910803181127u4acc307akbf9ec8513349e311@mail.gmail.com>
Message-ID: <0B1635C3-0BD4-449C-9C52-FD8D07E9D669@gmx.net>

Can you point me to the bug report or a mailing list thread?

	-hilmar

On Mar 18, 2008, at 2:27 PM, snoze pa wrote:

> Thanks hilmar. I am still wondering if my old problem was fixed. It is
> related to when NR databases mixes files from different databases.
>
> On Tue, Mar 11, 2008 at 2:04 PM, Steve Chervitz <sac at bioperl.org>  
> wrote:
>
>> Ditto. Thanks for biting the bullet, Hilmar.
>>
>> BTW, I put together a little compendium/review of various  
>> bioinformatics
>> data models a few months ago, where I mention BioSQL among others,  
>> but I
>> never really announced it:
>>
>> http://biodatamodel.org/
>>
>> It thought about wikifying it to get the community involved in  
>> maintaining
>> it, but haven't gotten around to it yet.
>>
>> Feedback is welcome.
>>
>> Cheers,
>> Steve
>>
>> On Fri, Mar 7, 2008 at 6:22 AM, Chris Fields <cjfields at uiuc.edu>  
>> wrote:
>>
>>> Same here.  Great news!
>>>
>>> chris
>>>
>>> On Mar 7, 2008, at 5:09 AM, Heikki Lehvaslaiho wrote:
>>>
>>>> BIOSQL V1.0.0 RELEASED
>>>> http://news.open-bio.org/archives/2008_03.html#000094
>>>>
>>>>
>>>> Congratulations, Hilmar!
>>>>
>>>>               -Heikki
>>>>
>>>> --
>>>> ______ _/      _/ 
>>>> _____________________________________________________
>>>>      _/      _/
>>>>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>>>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>>>   _/  _/  _/  SANBI, South African National Bioinformatics  
>>>> Institute
>>>>  _/  _/  _/  University of Western Cape, South Africa
>>>>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>>>> ___ _/_/_/_/_/ 
>>>> ________________________________________________________
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From snoze.pa at gmail.com  Tue Mar 18 20:33:08 2008
From: snoze.pa at gmail.com (snoze pa)
Date: Tue, 18 Mar 2008 15:33:08 -0500
Subject: [Bioperl-l] NCBI taxonomy database
Message-ID: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>

Dear Users,
 How can i use NCBI taxonomy database in bioperl? any suggestions!!!
thanks in advance
s


From aaron.j.mackey at gsk.com  Tue Mar 18 16:23:41 2008
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Tue, 18 Mar 2008 12:23:41 -0400
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
Message-ID: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>

Very cool.  I can envision this being printed as a laminated poster to put 
up next to the periodic table of Perl Elements (
http://www.ozonehouse.com/mark/blog/code/PeriodicTable.html)

One GraphViz trick you could try would be to group Bio::X::* (nodes and 
your collection groups sharing common Bio::X:: prefixes) together as 
subgraphs; that should quickly show you which edges go outside of the 
various "domains", and which are entirely self contained.

you could also try to distinguish "use base" relationships (i.e. 
inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
Bio::X" wrapped in an eval (optional use if available) by various edge 
colorings -- this might help to further break things up if we can guess at 
the intended "use" of any Bio::X by Bio::Y.

-Aaron

"Sendu Bala" <bix at sendu.me.uk> wrote on 03/18/2008 11:32:25 AM:

> aaron.j.mackey at gsk.com wrote:
> >> Or is the split intended to be 'core' == "anything and everything
> >> that was in 1.4", '????' == "everything else"? In which case,
> >> what's a good name for "modules created after 1.4"? 'crust'? ;)
> > 
> > Nah, "icing".
> > 
> > a module "use" map might be very useful to help identify "core" vs.
> > other layers of mantle/crust/icing.
> > 
> > http://www.perlmonks.org/?node_id=87329 
> > http://search.cpan.org/src/NEILB/pmusage-1.2/
> 
> Thanks for those. Neither could quite cope with BioPerl, but I've munged
> them together and hacked up 'module_usage.pl' which I've just committed
> to the maintenance directory of bioperl-live.
> 
> module_usage.pl ../Bio
> 
> Produces:
>   *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
> 
> First I considered what modules each BioPerl package (aka class, module)
> 'uses' (what modules does it load via 'use', 'require' or inherit from
> via 'use base', excluding external (non-BioPerl) modules), then grouped
> together packages that have identical usage. The graph shows all the
> groups with more than one member as nodes and edges from them pointing
> to the individual packages that they use. The set of those individual
> packages pointed to by groups also have edges showing their
> use-relationship to other members of the set (only). Members of the set
> are also shaded in red. The saturation of the shade indicates how many
> packages use that package (so dark red packages are used a lot).
> 
> (I had to simplify in this way because otherwise GraphViz bailed on me.
> If anyone can come with nicer simplification/visualisation systems,
> please do! It's important to note that there is lots of information loss
> in my scheme, so you can't rely on the graph alone.)
> 
> Getting to the question on how to decide what is 'core' and on what
> basis to split things up, first consider the darker red packages. Next
> consider how many groups point to it. Finally consider the membership of
> those groups: are they all highly related, or are they from different
> 'parts' of BioPerl?
> 
> For example, Bio::Graphics::Glyph::generic is dark red and has 3 groups
> pointing to it, but all the members of those groups are
> Bio::Graphics::Glyph*. You could imagine that Bio::Graphics::Glyph (or
> Bio::Graphics?) could be split off cleanly if desired and not kept in
> core. Bio::SimpleAlign, on the other hand, whilst not being quite as
> dark a red, has 7 attached groups with members from Bio::AlignIO,
> Bio::Search and Bio::Tools. You could easily argue it is more
> fundamental to BioPerl and should be in core. In turn, the things that
> Bio::SimpleAlign points to would also have to be in core.
> 
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)
> 
> 
> Chris Fields wrote:
> > http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
> > 
> > I'm pretty flexible on any of that; it's a proposal only and I think
> > some of it may be wrongheaded, but hey, I'm willing to take a few
> > rotten tomatoes.  The key issue is we should try to work out what we
> > mean by 'core' or the core library.  I have a rather extreme view of
> > it as being the bare essentials without external, non-perl core
> > dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
> > and required modules for those classes) but I'm sure others would
> > lump in parsers, DB functionality, etc.  I basically suggest placing
> > those (and any stable but potentially non-core code) in a
> > 'bioperl-main', with any unstable or untested code going into a
> > 'bioperl-unstable'.
> 
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>    stable APIs), directory structure and base classes that everything
>    else relies on
> # It might not do much useful bioinformatics, but provides just about
>    everything needed for a dev to create a new module that does
> 
> 
> > In essence, bioperl-main would require core and resemble a stable
> > release; bioperl-unstable would require bioperl-main (and core) and
> > resemble a dev release.  Not sure how versioning would go or if this
> > is a viable option at all, but it's worth discussing.
> 
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>    tree, mostly parsers and the like which, whilst 'stable' and tested
>    should still be split away from core because the data sources they
>    parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>    stuff, be it new parsers for totally new formats, new APIs that do
>    something not thought of before etc. When they are complete, bug-free
>    and have stood the test of time they get moved into bioperl-main.
>    (It is not a place for all new commits; bug fixes to something in
>    bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>    their own core and bleed variant. Anything they need for core
>    functionality would enter the single bioperl-core, anything new
>    would enter the single bioperl-bleed, and anything stable would
>    be in their own bioperl-[package]
> 
> Discuss :)
> 


From David.Messina at sbc.su.se  Tue Mar 18 21:23:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 18 Mar 2008 22:23:18 +0100
Subject: [Bioperl-l] NCBI taxonomy database
In-Reply-To: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
References: <10f848910803181333g49d1567dl99d76daf8ef88cc1@mail.gmail.com>
Message-ID: <628aabb70803181423g305db155r9d66c114f38c64b6@mail.gmail.com>

Hi snoze,

I think you will want to take a look at the docs for the Bio::Taxon module
http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html


and these scripts:
scripts/taxa/local_taxonomydb_query.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/local_taxonomydb_query.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Script
that accesses a local taxonomy database and retrieves species or
TaxonIDs<http://www.bioperl.org/wiki/TaxonID>.
 scripts/taxa/query_entrez_taxa.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/query_entrez_taxa.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Demonstrate
how to retrieve the NCBI <http://www.bioperl.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species. Also
retrieve
TaxonID <http://www.bioperl.org/wiki/TaxonID> for a given accession
number<http://www.bioperl.org/wiki/Accession_number>.
 scripts/taxa/taxid4species.PLS<http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/scripts/taxa/taxid4species.PLS?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup>
Retrieve
the NCBI <http://en.wikipedia.org/wiki/NCBI>
TaxonID<http://www.bioperl.org/wiki/TaxonID>for a given species.

Dave


From alexl at users.sourceforge.net  Wed Mar 19 08:32:38 2008
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 19 Mar 2008 01:32:38 -0700
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk> (Sendu Bala's message of "Tue\,
	18 Mar 2008 15\:32\:25 +0000")
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <42eja7azbt.fsf@allele2.eebweb.arizona.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> # I agree that this 3-way split seems reasonable # bioperl-main
SB> would consist primarily of the 'leaves' of the module tree, mostly
SB> parsers and the like which, whilst 'stable' and tested should
SB> still be split away from core because the data sources they parse
SB> could change format slightly # bioperl-unstable, better
SB> bioperl-bleed, would feature brand-new stuff, be it new parsers
SB> for totally new formats, new APIs that do something not thought of
SB> before etc. When they are complete, bug-free and have stood the
SB> test of time they get moved into bioperl-main.  (It is not a place
SB> for all new commits; bug fixes to something in bioperl-main would
SB> be committed to bioperl-main) # The current splits (bioperl-run,
SB> bioperl-network etc.) do not get their own core and bleed
SB> variant. Anything they need for core functionality would enter the
SB> single bioperl-core, anything new would enter the single
SB> bioperl-bleed, and anything stable would be in their own
SB> bioperl-[package]

SB> Discuss :)

While on the subject of how to split up the bioperl package, spare a
thought for upstream package maintainers.  The Fedora package for the
bioperl "core" that I now maintain is currently a single package which
makes it easy to get reviewed, included in the distribution and
updated/maintained.  (bioperl-run is a separate package).

While I agree that bioperl is now perhaps a little too monolithic, I
thinking splitting it up in a too fine-grained manner like CPAN might
go too far the other way.  For Fedora, each package would then need to
be reviewed and updated separately.  Similar issues might apply for
other distros (such as Debian/Ubuntu).

I think something similar to the three-way split proposed sounds like
a good compromise, so long as everything that a "basic" user of
Bioperl can install most of the functionality in the current "bioperl"
package in (at most) 2-3 packages.  

One model to look at might be the gstreamer model which has a "core"
(gstreamer) and "gstreamer-plugins-base", "gstreamer-plugins-good",
"gstreamer-plugins-bad" and "gstreamer-plugins-ugly" modules for
plugins, see:

http://gstreamer.net/

Alex


From charles-listes+bioperl at plessy.org  Wed Mar 19 10:01:59 2008
From: charles-listes+bioperl at plessy.org (Charles Plessy)
Date: Wed, 19 Mar 2008 19:01:59 +0900
Subject: [Bioperl-l] Are all recommended modules equally important ?
In-Reply-To: <47DF8B66.8020509@sendu.me.uk>
References: <20080317041311.GA3784@kunpuu.plessy.org>
	<628aabb70803170838i53a49aacuf507880d518326d@mail.gmail.com>
	<F7A93439-501D-420A-BD68-442E3587CB71@gmx.net>
	<47DF8B66.8020509@sendu.me.uk>
Message-ID: <20080319100159.GD29304@kunpuu.plessy.org>

Le Tue, Mar 18, 2008 at 09:29:10AM +0000, Sendu Bala a ?crit :
> 
> I looked at the most used external modules. Used 6 times or more:
> 
>   Data::Dumper => used 55 times
>   Carp => used 51 times
>   IO::String => used 25 times
>   Symbol => used 19 times
>   File::Spec => used 17 times
>   HTTP::Request::Common => used 17 times
>   POSIX => used 12 times
>   DB_File => used 11 times
>   Fcntl => used 11 times
>   IO::File => used 11 times
>   Exporter => used 10 times
>   File::Temp => used 9 times
>   Dumpvalue => used 8 times
>   LWP::UserAgent => used 8 times
>   Scalar::Util => used 8 times
>   URI::Escape => used 8 times
>   File::Basename => used 6 times
>   File::Path => used 6 times
>   XML::Writer => used 6 times

Dear Sendu,

thanks a lot for this analysis ! We will downgrade all modules except
those you listed to the priority 'Suggested'. In terms of Debian package,
it means keeping only libio-string-perl, libwww-perl, liburi-perl and
libxml-writer-perl in our 'Recommends' field, as the others are provided
by our perl package itself.

Thanks a lot for the advice,

-- 
Charles Plessy
Debian-Med packaging team
Wak?, Saitama, Japan


From bix at sendu.me.uk  Wed Mar 19 13:27:11 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 19 Mar 2008 13:27:11 +0000
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
References: <OF5F812950.DB7A9F1E-ON85257410.00593111-85257410.005A0FAB@gsk.com>
Message-ID: <47E114AF.8030801@sendu.me.uk>

aaron.j.mackey at gsk.com wrote:
> One GraphViz trick you could try would be to group Bio::X::* (nodes and 
> your collection groups sharing common Bio::X:: prefixes) together as 
> subgraphs; that should quickly show you which edges go outside of the 
> various "domains", and which are entirely self contained.

Not quite sure if I used the 'trick' you were thinking of, but I now 
'cluster' them as you describe. It's no longer quite as attractively 
proportioned, but I suppose it's more useful :)


> you could also try to distinguish "use base" relationships (i.e. 
> inheritance) vs. "use Bio::X" (delegation, composition, etc.) vs. "require 
> Bio::X" wrapped in an eval (optional use if available) by various edge 
> colorings -- this might help to further break things up if we can guess at 
> the intended "use" of any Bio::X by Bio::Y.

I haven't distinguished the eval require cases, but now edges are green 
for inheritance and blue for use/require.


I updated the jpeg:
*warning, may crash your browser; download it and view in a dedicated
  image viewer*
http://bix.sendu.me.uk/files/module_usage.jpeg


If someone wants to mess with the script so it will output a sane ps 
file for conversion to pdf, please do so. I can't figure out how to get 
it to work correctly.


From Jorge.DUARTE at biogemma.com  Wed Mar 19 15:32:44 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:32:44 +0100
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
Message-ID: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>

Dear Bioperl-users,

could someone give me a hint on how to find SNPs in alignments using 
bioperl objects ?

I found several modules capable of representing Sequence Variations,
but could not understand how to go from an "Align" object to a "Variation" 
object.

Any help would be much appreciated,

Thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com


From avilella at gmail.com  Wed Mar 19 16:59:47 2008
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 19 Mar 2008 16:59:47 +0000
Subject: [Bioperl-l] how to go from AlignIO to Variation ?
In-Reply-To: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
References: <OF678A6861.0EF7D6FC-ONC1257411.0054F38B-C1257411.00558060@LGLimagrain.com>
Message-ID: <358f4d650803190959h744f33f5ha345897565a071b0@mail.gmail.com>

Hi Jorge,

(good to see an ex-EBI in the bioperl-ml :-) )

You can use the method aln_to_population in Bio::PopGen::Utilities:

my $pop = Bio::PopGen::Utilities->aln_to_population($aln);

http://www.bioperl.org/wiki/HOWTO:PopGen#Allele_data_from_Alignments_using_Bio::AlignIO_and_Bio::PopGen::Utilities

Cheers,

    Albert.

On Wed, Mar 19, 2008 at 3:32 PM,  <Jorge.DUARTE at biogemma.com> wrote:
> Dear Bioperl-users,
>
>  could someone give me a hint on how to find SNPs in alignments using
>  bioperl objects ?
>
>  I found several modules capable of representing Sequence Variations,
>  but could not understand how to go from an "Align" object to a "Variation"
>  object.
>
>  Any help would be much appreciated,
>
>  Thanks,
>
>  Jorge.
>
>  ---
>  Jorge Duarte
>  Bioinformatics Software Engineer
>  BIOGEMMA
>  Z.I. Du Br?zet
>  8, Rue des Fr?res Lumi?re
>  63028 CLERMONT FERRAND Cedex 2
>  FRANCE
>  Tel : +33 (0)4 73 39 60 73
>  Fax : +33 (0)4 73 39 60 71
>  E-mail : jorge.duarte at biogemma.com
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jorge.DUARTE at biogemma.com  Wed Mar 19 15:22:57 2008
From: Jorge.DUARTE at biogemma.com (Jorge.DUARTE at biogemma.com)
Date: Wed, 19 Mar 2008 16:22:57 +0100
Subject: [Bioperl-l] Using Bioperl book
Message-ID: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>

Hello,

i just found on amazon something about a book "Using Bioperl", published 
on the 1st of March 2008 but which is no more available.

Does anyone know how to get it ?

Many thanks,

Jorge.

--- 
Jorge Duarte
Bioinformatics Software Engineer
BIOGEMMA
Z.I. Du Br?zet
8, Rue des Fr?res Lumi?re
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com

*****************************************************************
       Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
         dans les destinataires lors du premier contact
*****************************************************************


From jason at bioperl.org  Wed Mar 19 17:54:16 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 19 Mar 2008 10:54:16 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>

it's probably more than 6 months out. We still haven't finished  
writing it as life and work continues to intrude on book writing.

-jason
On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:

> Hello,
>
> i just found on amazon something about a book "Using Bioperl",  
> published
> on the 1st of March 2008 but which is no more available.
>
> Does anyone know how to get it ?
>
> Many thanks,
>
> Jorge.
>
> ---
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
>
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From smarkel at accelrys.com  Wed Mar 19 17:53:53 2008
From: smarkel at accelrys.com (Scott Markel)
Date: Wed, 19 Mar 2008 10:53:53 -0700
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
Message-ID: <OF4972A434.60C53CF4-ON88257411.006214B5-88257411.00625152@accelrys.com>

Jorge,

This is a book that Jason Stajich, Ewan Birney, and I are writing.
We're behind.  So it's not that the book is no longer available,
but that it's not yet available.  Hopefully later this year or
early in 2009.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Board of Directors: International Society for Computational Biology
Co-chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


bioperl-l-bounces at lists.open-bio.org wrote on 19.03.2008 08:22:57:

> Hello,
> 
> i just found on amazon something about a book "Using Bioperl", published 

> on the 1st of March 2008 but which is no more available.
> 
> Does anyone know how to get it ?
> 
> Many thanks,
> 
> Jorge.
> 
> --- 
> Jorge Duarte
> Bioinformatics Software Engineer
> BIOGEMMA
> Z.I. Du Br?zet
> 8, Rue des Fr?res Lumi?re
> 63028 CLERMONT FERRAND Cedex 2
> FRANCE
> Tel : +33 (0)4 73 39 60 73
> Fax : +33 (0)4 73 39 60 71
> E-mail : jorge.duarte at biogemma.com
> 
> *****************************************************************
>        Pour toute demande de support merci d'inclure
> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>          dans les destinataires lors du premier contact
> *****************************************************************
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From me at hongyu.org  Thu Mar 20 18:54:53 2008
From: me at hongyu.org (Hongyu Zhang)
Date: Thu, 20 Mar 2008 11:54:53 -0700 (PDT)
Subject: [Bioperl-l] Bio::DB::GenBank module
Message-ID: <501654.68882.qm@web51412.mail.re2.yahoo.com>

Dear all,

It seems that some of the important methods in Bio::DB::GenBank module was discontinued right now, such as get_Seq_by_acc(). The corresponding methods have empty content underneath its names. How come?
 
Best,

Hongyu Zhang, Ph.D.
 Ceres Inc., Thousand Oaks, CA
Cell: 805-405-5394
Fax: 866-447-8750


From joseph.fass at gmail.com  Thu Mar 20 22:10:33 2008
From: joseph.fass at gmail.com (Joseph Fass)
Date: Thu, 20 Mar 2008 15:10:33 -0700
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or Bio::Seq::SeqWithQuality?
Message-ID: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>

I've written code to trim a certain number of bases (and, possibly,
associated qualities) from fasta (or fastq) format sequences, using:

$seq->seq($seq->subseq($a+1,$len-$b));
and, if it's fastq:
$seq->qual($seq->subqual($a+1,$len-$b));
where:
$len = $seq->length; # defined before changing $seq->seq
$a is the number of bases to trim off the beginning of the sequence
$b is the number of bases to trim off the end of the sequence

The code works for sequences, but for qualities I get a trimmed series of
quality characters that is the correct length and is at the correct
position, but has a number of characters (equal to $a) at the *end* of the
series changed to '!' ... i.e.:

@fake header 1
tcggacaatatatat
+
fjasfiojeq%!@%@

becomes:

@fake header 1 trimmed by 4 at beginning and 3 at end
acaatata
+fake header 1 trimmed by 4 at beginning and 3 at end
fioj!!!!

Since the relevant section of code is short, I'll post it:

my $in = Bio::SeqIO->new(-file => "<$opt_i", -format => $format);
my $out = Bio::SeqIO->new(-file=> ">$opt_o", -format => $format);
my $seq_length;
while (my $seq = $in->next_seq()) {
  $seq->desc($seq->desc()." trimmed by $opt_b at beginning and $opt_e at
end");
  $seq_length = $seq->length;
  $seq->seq($seq->subseq($opt_b+1,$seq_length-$opt_e));
  if ($format eq 'fastq') { # if fastq, trim qualities then write out in
fastq format
    $seq->qual($seq->subqual($opt_b+1,$seq_length-$opt_e));
    $out->write_fastq($seq); }
  else {$out->write_seq($seq);} # just write out sequence in fasta format
}

Why should the same process work for ->seq and ->subseq but not ->qual and
->subqual?  Please enlighten me ...


-- 
Joseph Fass
jnfass -at- gmail.com (personal) || joseph.fass -at- gmail.com(professional)
970.227.5928 (c) || 530.752.2698 (w)


From hlapp at gmx.net  Thu Mar 20 22:49:41 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 18:49:41 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
References: <5095.156.83.1.251.1206041052.squirrel@webmail.xs4all.nl>
Message-ID: <0F80B40B-0232-4367-8433-992588B6E71B@gmx.net>

Hi Erik, thanks for the report. Given the error message, it looks  
more like the integer (which in reality is a string) can't be  
automatically converted to a string.

That would be equally interesting, though. DBI I thought used to bind  
all parameters as string by default, but maybe that has changed?

The parameter values are indeed all bound generically (and the query  
is created dynamically too), and I'm leaving it up to the DBD drivers  
to do the "Right Thing". I could obviously force everything into type  
string, but that is likely to have it's own repercussions on various  
RDBMSs.

So could you file this as a bug report on bugzilla.open-bio.org  
(category bioperl-db, this is actually not a BioSQL problem), and run  
the following test on your 8.3 instance (which minor version actually?):

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;
SELECT * from t1 WHERE b = 1;
SELECT * from t1 WHERE c = '1';

SELECT * from t1 WHERE a = 1::text;
SELECT * from t1 WHERE b = 1::text;
SELECT * from t1 WHERE c = integer '1';

DROP TABLE t1;

These work all fine on my 8.1.4 instance.

	-hilmar

On Mar 20, 2008, at 3:24 PM, Erik wrote:
> Hi,
>
> (latest BioSQL, bioperl-db, and bioperl-live installed.)
>
> Postgres 8.3 will not auto-cast text (='character
> varying') to integer any longer, which causes test
> t/16odba.t to fail:
>
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: error while executing query in
> Bio::DB::BioSQL::SeqAdaptor::find_by_query: ERROR:
> operator does not exist: character varying = integer
> LINE 1: ...eq.taxon_id FROM bioentry seq WHERE
> seq.identifier = 5456929
>
> It seems likely to cause many similar statements to fail;
> how should this be solved?
>
> I tried to fix it but I couldn't find the place where the
> statement/clauses are put together.
>
>
> Thanks,
>
> Erik Rijkers
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Thu Mar 20 23:30:03 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 00:30:03 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
 integer any longer
Message-ID: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>

On Thu, March 20, 2008 23:49, Hilmar Lapp wrote:
> Hi Erik, thanks for the report. Given the error message,
> it looks
> more like the integer (which in reality is a string) can't
> be automatically converted to a string.

you are right, of course :)


Here is the postgres 8.3.1 result of your sql statements:

CREATE TABLE t1 (a varchar(10), b text, c integer);

SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

INSERT INTO t1 (a,b,c) VALUES ('a','b',1);

SELECT * from t1 WHERE a = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
SELECT * from t1 WHERE c = '1'; -- ok

SELECT * from t1 WHERE a = 1::text;     -- ok
SELECT * from t1 WHERE b = 1::text;     -- ok
SELECT * from t1 WHERE c = integer '1'; -- ok

The failure is always (virtually) the same:
ERROR:  operator does not exist: character varying = integer
LINE 1: SELECT * from t1 WHERE a = 1;
                                 ^
HINT:  No operator matches the given name and argument
type(s). You might need to add explicit type casts.


Then there is the cast function: for instance, I can let
the test in t/16odba.t proceed faultlessly with

 $seq = $biodb->get_Seq_by_id( "cast(5456929 as text)" );


I am also doubtful/curious as to how this would affect the
various loading scripts which I was going to use - I want
to set up a GBrowse with human/mouse/flybase sequence
annotation to show ChipSeq data against.

But one thing at a time, I guess...


> So could you file this as a bug report on
> bugzilla.open-bio.org
> (category bioperl-db, this is actually not a BioSQL
> problem),

I'll make an entry in bugzilla/bioperl-db.


Thanks for you quick reply!


Erik Rijkers


From David.Messina at sbc.su.se  Thu Mar 20 23:39:49 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 00:39:49 +0100
Subject: [Bioperl-l] Bio::DB::GenBank module
In-Reply-To: <501654.68882.qm@web51412.mail.re2.yahoo.com>
References: <501654.68882.qm@web51412.mail.re2.yahoo.com>
Message-ID: <628aabb70803201639y33df19a6ib83967c33dd90b7f@mail.gmail.com>

Hi Hongyu,

Those methods are inherited. get_Seq_by_acc(), for example, comes from
Bio::DB::WebDBSeqI.

The BioPerl Deobfuscator is one way to see where the methods a given class
has are actually coded. Here's the Deobfuscator view of
Bio::DB::GenBank.<http://bioperl.org/cgi-bin/deob_interface.cgi?Search=Search&module=Bio%3A%3ADB%3A%3AGenBank&sort_order=by+method&search_string=Bio%3A%3ADB%3A%3AGenBank>


Dave


From hlapp at gmx.net  Fri Mar 21 00:34:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 20:34:42 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
References: <15786.156.83.1.157.1206055803.squirrel@webmail.xs4all.nl>
Message-ID: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>


On Mar 20, 2008, at 7:30 PM, Erik wrote:
> Here is the postgres 8.3.1 result of your sql statements:
>
> CREATE TABLE t1 (a varchar(10), b text, c integer);
>
> SELECT * from t1 WHERE a = 1;   -- fails in 8.3.1
> SELECT * from t1 WHERE b = 1;	  -- fails in 8.3.1
> SELECT * from t1 WHERE c = '1'; -- ok
>
> [...]
> The failure is always (virtually) the same:
> ERROR:  operator does not exist: character varying = integer
> LINE 1: SELECT * from t1 WHERE a = 1;
>                                  ^
> HINT:  No operator matches the given name and argument
> type(s). You might need to add explicit type casts.


So it's indeed the backend that changed behavior. It's actually  
documented as I see now:

http://www.postgresql.org/docs/8.3/static/release-8-3.html

scroll to section E.2.2. Migration to Version 8.3, E.2.2.1. General,  
and the first item there:

<quote>
Non-character data types are no longer automatically cast to TEXT  
(Peter, Tom)

Previously, if a non-character value was supplied to an operator or  
function that requires text input, it was automatically cast to text,  
for most (though not all) built-in data types. This no longer  
happens: an explicit cast to text is now required for all non- 
character-string types.
</quote>

I can see the arguments there but this will prevent upgrading to 8.3  
for many many applications, and the comments from the Pg developers  
('fix your SQL to use casts') that I've seen there on the mailing  
lists are just not helpful. Fixing SQL is for many legacy  
applications is just not an option.

In the case of Bioperl-db it's very non-trivial, because all of a  
sudden we would be changing from a hands-off and let-the-driver- 
figure-it-out approach to forcing types everywhere.

So I think at this point with this change I have to declare Bioperl- 
db officially incompatible with PostgreSQL 8.3+ until we've found a  
solution to this, which is too bad because it seems 8.3 has some  
really nice performance features added.

One possible solution might be to create a CAST in the database  
(namely the one that was taken away, restoring behavior to pre-8.3).  
Another possibility is to move the parameter binding method into the  
driver adaptor which would then delegate to the DBI method but would  
be overridden for the PostgreSQL adapter to force all bindings to  
type string.

Which leads me back to the surprise observation that the parameter  
was bound as an integer in the first place, when DBD::Pg used to bind  
everything as string unless you told it otherwise. Which DBD::Pg  
version is it that you are using? I would suspect (or hope) that  
maybe there is soon an update release of DBD::Pg that fixes this  
problem by going back to binding everything as string by default (and  
as the tests show PostgreSQL will still convert strings to integer if  
necessary).

Depending on what I (or can someone else update us on this?) find out  
for the DBD::Pg plans, I'll probably start looking into moving the  
parameter binding into the driver adapters. Though it does feel  
pathetic that this is now also not transparent between drivers.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Fri Mar 21 00:51:43 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 01:51:43 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
Message-ID: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>

On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>
> So I think at this point with this change I have to
> declare Bioperl-
> db officially incompatible with PostgreSQL 8.3+ until
> we've found a
> solution to this, which is too bad because it seems 8.3
> has some
> really nice performance features added.

Pg 8.3 is indeed very noticably faster, and it has other
excellent new features like full text indexing. (This also
makes that downgrading is not really an option)


> Which DBD::Pg version is it that you are using?

DBD::Pg 2.3.0


Thanks,

Erik Rijkers


From hlapp at gmx.net  Fri Mar 21 01:36:50 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 20 Mar 2008 21:36:50 -0400
Subject: [Bioperl-l] postgres 8.3 will not cast text to integer any
	longer
In-Reply-To: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
References: <4483.156.83.1.157.1206060703.squirrel@webmail.xs4all.nl>
Message-ID: <071CB899-AB3E-40B8-9477-82AE98DB88B1@gmx.net>


On Mar 20, 2008, at 8:51 PM, Erik wrote:
> On Fri, March 21, 2008 01:34, Hilmar Lapp wrote:
>>
>> So I think at this point with this change I have to declare  
>> Bioperl-db officially incompatible with PostgreSQL 8.3+ until  
>> we've found a solution to this, which is too bad because it seems  
>> 8.3 has some really nice performance features added.
>
> Pg 8.3 is indeed very noticably faster, and it has other
> excellent new features like full text indexing. (This also
> makes that downgrading is not really an option)

Right, I saw that too. It is, however, just migrated from what was a  
contrib module before, so downgrading and using the contrib module is  
an option.

Furthermore, folding these new features together with a behavior  
change that is backwards incompatible was a choice the PostgreSQL  
people made, not we.

We also aren't doing poor typing that deserves fixing; we're just not  
doing any typing by treating everything as a string. This is the Perl  
paradigm.

At this point it's actually unclear to me how this new behavior is  
compatible with untyped scripting languages unless you know the type  
of each column that you're binding a value for, because if you  
actually force typecasts to string for everything you get an error if  
an integer is indeed what's needed.

I'm wondering what I'm missing.

	-hilmar

BTW what does the following query yield on your 8.3.1 database:

select s.typname as source, t.typname as target, f.proname as  
function, c.castcontextfrom pg_cast c, pg_type s, pg_type t, pg_proc  
f where c.castsource = s.oid and c.casttarget = t.oid and c.castfunc  
= f.oidand t.typname = 'text';

On my 8.1.4 database I get:

   source    | target | function | castcontext
-------------+--------+----------+-------------
  bpchar      | text   | text     | i
  char        | text   | text     | i
  name        | text   | text     | i
  int8        | text   | text     | i
  int2        | text   | text     | i
  int4        | text   | text     | i
  oid         | text   | text     | i
  float4      | text   | text     | i
  float8      | text   | text     | i
  macaddr     | text   | text     | e
  cidr        | text   | text     | e
  inet        | text   | text     | e
  date        | text   | text     | i
  time        | text   | text     | i
  timestamp   | text   | text     | i
  timestamptz | text   | text     | i
  interval    | text   | text     | i
  timetz      | text   | text     | i
  numeric     | text   | text     | i
(19 rows)

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From greg at turnstep.com  Fri Mar 21 02:41:10 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Fri, 21 Mar 2008 02:41:10 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <987C9C0E-840B-44AD-B3E9-0FC2809FF4F4@gmx.net>
Message-ID: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Which leads me back to the surprise observation that the parameter
> was bound as an integer in the first place, when DBD::Pg used to bind
> everything as string unless you told it otherwise. Which DBD::Pg
> version is it that you are using? I would suspect (or hope) that
> maybe there is soon an update release of DBD::Pg that fixes this
> problem by going back to binding everything as string by default (and
> as the tests show PostgreSQL will still convert strings to integer if
> necessary).
>
> Depending on what I (or can someone else update us on this?) find out
> for the DBD::Pg plans, I'll probably start looking into moving the
> parameter binding into the driver adapters. Though it does feel
> pathetic that this is now also not transparent between drivers.

What you are probably looking for is already there, namely:

$dbh->{pg_server_prepare} = 0;

There's good reasons for the casting enforcement in 8.3, although I've
been a sharp critic of the change, and certainly of the suddeness
of it. Another solution to consider is adding the casts back in:

http://people.planetpostgresql.org/peter/index.php?/archives/2008/03.html
(the March 4th entry)

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803202237
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfjIBYACgkQvJuQZxSWSsiamwCdEbNrC4F4oU7AGHrbHAm1YNXG
HbUAoIRJtGW4brvMKklxZYG6pusbcTqf
=Zawx
-----END PGP SIGNATURE-----


From David.Messina at sbc.su.se  Fri Mar 21 08:36:16 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 21 Mar 2008 09:36:16 +0100
Subject: [Bioperl-l] bug in Bio::SeqIO::fastq or
	Bio::Seq::SeqWithQuality?
In-Reply-To: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
References: <195e25cb0803201510i4411fd1ctd1fdff40aecd12c7@mail.gmail.com>
Message-ID: <628aabb70803210136p11de495p26d0ffaebbc3370e@mail.gmail.com>

Hi Joseph,

This looks like a bug; I saw the same thing here. Could you please submit
this to the bug tracker <http://www.bioperl.org/wiki/Bugs> along with your
test code?


Thanks,
Dave


From hlapp at gmx.net  Fri Mar 21 12:52:39 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 21 Mar 2008 08:52:39 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
References: <19ecb7a297f64722c4f63f10ed2ebdce@biglumber.com>
Message-ID: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>

Hi Greg - thanks for your email, it's very helpful.

On Mar 20, 2008, at 10:41 PM, Greg Sabino Mullane wrote:
>>
>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

So disabling server-side prepares will leave values quoted? Having  
server-side prepares would be very useful though, especially for  
Bioperl-db with its many lookup queries that all use similar  
parameter values.

>
> There's good reasons for the casting enforcement in 8.3

I do understand that, but it's also a sharp contrast to other RDBMSs  
that doesn't it make it easier for people to choose Pg when they  
should, and doesn't help writing cross-platform database applications  
either.

> although I've been a sharp critic of the change, and certainly of  
> the suddeness
> of it. Another solution to consider is adding the casts back in:
>
> http://people.planetpostgresql.org/peter/index.php?/archives/ 
> 2008/03.html
> (the March 4th entry)


Thanks for this, that helps a lot.

Do you have links to some of the key threads showing what rationale  
went into the decision? (Or should I just search for your name?) I'd  
like to read up on that first before pouring more oil into the fire.  
I suspect that many of those who made the decision are never faced  
with needing to write cross-RDBMS code.

Also, I wonder why this wasn't made a configurable option so it can  
be disabled by a simple config file change (such as the move away  
from automatic OID columns). But obviously this is the wrong list for  
discussing this (though Bioperl-db *is* one of those pieces of  
software that must be cross-RDBMS).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From baucom at msg.ucsf.edu  Fri Mar 21 20:13:00 2008
From: baucom at msg.ucsf.edu (Albion Baucom)
Date: Fri, 21 Mar 2008 13:13:00 -0700
Subject: [Bioperl-l] SearchIO Performance
Message-ID: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>

Hi. I am pretty new to BioPerl, and have a question about performance  
with regard to Blast (nucleotide) file parsing. My Blast result files  
usually have close to 100 or more sequence hits. Each sequence is  
about 1400 nucleotides long.

After profiling code I wrote, I find that calling the next_result()  
function after creating a search object takes substantially longer  
than non-OO, quick and dirty code I am using to parse the same Blast  
files.

What is substantially longer? Well, the existing code takes about 0.25  
seconds, and the BioPerl call takes about 4.5 seconds. I find that to  
be a dramatic difference, and that kind of time difference becomes  
significant when I have to parse 30 Blast files in a row. I understand  
that SearchIO is parsing the entire file and storing it all for easy  
retrieval later, and maybe this time penalty is what I have to pay for  
that convenience and organization.

I am just wondering if there is anything other than writing custom  
code based on BioPerl to speed this up. Something I might not be aware  
of that I can do ahead of time, or during parsing, to limit what is  
parsed, or facilitate the parsing process. For instance, is there a  
way to "look ahead" and simply parse alignments that meet a specific  
expectancy cutoff?

I confess I have not read the documentation thoroughly (although  
obviously enough to make it do what I want), but am certainly willing  
to do so if someone can point me in the right direction.

Thanks

Albion


From jason at bioperl.org  Fri Mar 21 21:40:00 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 21 Mar 2008 14:40:00 -0700
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
Message-ID: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>


On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:

> Hi. I am pretty new to BioPerl, and have a question about  
> performance with regard to Blast (nucleotide) file parsing. My  
> Blast result files usually have close to 100 or more sequence hits.  
> Each sequence is about 1400 nucleotides long.
>
> After profiling code I wrote, I find that calling the next_result()  
> function after creating a search object takes substantially longer  
> than non-OO, quick and dirty code I am using to parse the same  
> Blast files.
>
> What is substantially longer? Well, the existing code takes about  
> 0.25 seconds, and the BioPerl call takes about 4.5 seconds. I find  
> that to be a dramatic difference, and that kind of time difference  
> becomes significant when I have to parse 30 Blast files in a row. I  
> understand that SearchIO is parsing the entire file and storing it  
> all for easy retrieval later, and maybe this time penalty is what I  
> have to pay for that convenience and organization.
>
> I am just wondering if there is anything other than writing custom  
> code based on BioPerl to speed this up. Something I might not be  
> aware of that I can do ahead of time, or during parsing, to limit  
> what is parsed, or facilitate the parsing process. For instance, is  
> there a way to "look ahead" and simply parse alignments that meet a  
> specific expectancy cutoff?
>
> I confess I have not read the documentation thoroughly (although  
> obviously enough to make it do what I want), but am certainly  
> willing to do so if someone can point me in the right direction.
>
We are quite aware of the speed issues.  This is discussed on the  
wiki in brief detail.
http://bioperl.org/wiki/Why_BioPerl_is_slow

It boils down to the object creation not the parsing (relatively  
speaking).  It takes a while because we're creating a lot of objects  
under the hood for each alignment.  Sendu has written a pull parser  
that doesn't require creation of all the objects until the user  
requests them.
As I've said in the past, if someone wrote SearchIO event-listener  
that created lightweight objects (or just hashes) instead this would  
also provide a substantial speedup.

In the fall I did some experimentation with array-based instead of  
hash-based feature objects got a pretty decent speedup as well, but  
just haven't had any time to roll out a more substantial  
prototyping.  For the inner-loops of things it may make sense to  
substitute a less-flexible but super-fast object.

I always advocate thinking about what your needs are - if you just  
want start/stop of alignments, you can grab this out of a blast  
format  table with the -m9 (NCBI) or --mformat =3 (WUBLAST) and you  
can write a fast parser that uses 'split'.


> Thanks
>
> Albion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From er at xs4all.nl  Fri Mar 21 21:43:47 2008
From: er at xs4all.nl (Erik)
Date: Fri, 21 Mar 2008 22:43:47 +0100 (CET)
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
 swissprot
Message-ID: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>

Hi,

PostgreSQL 8.3.1
DBD::Pg 2.3.0
perl 5.8.8

(The following error may have to do with the 8.3 problems
that I reported yesterday (bug 2472) - I don't know)

 I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
problem.

Then I ran scripts/biosql/load_seqdatabase.pl as:

perl scripts/biosql/load_seqdatabase.pl \
  -driver Pg \
  -dbuser xxxxxxx \
  -dbname bioseqdb \
  -namespace swissprot \
  -format swiss \
   /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat

It took two hours to load 26504 records (7%) of
uniprot_sprot.dat (is it expected to be so slow?), then
failed with:

Could not store Q2UXW0:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Species) failed to insert or to
be found by unique key
STACK: Error::throw
STACK: Bio::Root::Root::throw
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK: Bio::DB::Persistent::PersistentObject::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:244
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169
STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK: Bio::DB::Persistent::PersistentObject::store
/home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: scripts/biosql/load_seqdatabase.pl:630
-----------------------------------------------------------


I don't know if this is directly related to the 8.3
casting problems I reported yesterday (bug 2472), or a
separate Bio::Species issue


regards,

Erik Rijkers


From bix at sendu.me.uk  Fri Mar 21 23:17:59 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 21 Mar 2008 23:17:59 +0000
Subject: [Bioperl-l] SearchIO Performance
In-Reply-To: <8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
References: <AE9E136C-6938-41DF-A973-3F3A5F4CBA77@msg.ucsf.edu>
	<8448D3AD-82BF-4471-9346-C27DDE95DB4D@bioperl.org>
Message-ID: <47E44227.3050002@sendu.me.uk>

Jason Stajich wrote:
> 
> On Mar 21, 2008, at 1:13 PM, Albion Baucom wrote:
> 
>> Hi. I am pretty new to BioPerl, and have a question about performance 
>> with regard to Blast (nucleotide) file parsing.
[...]
>> What is substantially longer? Well, the existing code takes about 0.25 
>> seconds, and the BioPerl call takes about 4.5 seconds. I find that to 
>> be a dramatic difference, and that kind of time difference becomes 
>> significant when I have to parse 30 Blast files in a row. I understand 
>> that SearchIO is parsing the entire file and storing it all for easy 
>> retrieval later, and maybe this time penalty is what I have to pay for 
>> that convenience and organization.
[...]
> Sendu has written a pull parser that 
> doesn't require creation of all the objects until the user requests them.
> As I've said in the past, if someone wrote SearchIO event-listener that 
> created lightweight objects (or just hashes) instead this would also 
> provide a substantial speedup.

Yeah, you'll need BioPerl 1.5.2 (or the latest from svn) and to set the 
format to 'blast_pull'. Depending on the cirumstance and thoughtful 
usage, you can see orders of magnitude speed up.

http://doc.bioperl.org/bioperl-live/Bio/SearchIO/blast_pull.html

The only disadvantage to the normal parser is that the pull parser 
currently only supports NCBI BLASTN and BLASTP.


From hlapp at gmx.net  Sat Mar 22 18:18:45 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 14:18:45 -0400
Subject: [Bioperl-l] Call for Student Applications - NESCent participates in
	the Google Summer of Code
In-Reply-To: <0025B440-EF1E-4632-9DB4-B98489BF3550@duke.edu>
Message-ID: <5AC4F213-8D88-41C6-B380-59B2EF7831F0@gmx.net>

Hi all - just wanted to draw your attention to our Google Summer of  
Code participation this year. One of the projects deals directly with  
BioPerl, another one builds on BioSQL (and could be implemented  
taking advantage of BioPerl or Bio::Phylo, or Biojava).

Cheers,

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================

Phyloinformatics Summer of Code 2008
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

*** Please disseminate this announcement widely to appropriate students
at your institution ***

The National Evolutionary Synthesis Center (NESCent: http:// 
www.nescent.org/) is participating in 2008 for the second year as a  
mentoring organization in the Google Summer of Code (http:// 
code.google.com/soc). Through this program, Google provides  
undergraduate, masters, and PhD students with a unique opportunity to  
obtain hands-on experience writing and extending open-source software  
under the mentorship of experienced developers from around the world.

Our goal in participating is to train future researchers and  
developers to not only have awareness and understanding of the value  
of open-source and collaboratively developed software, but also to  
gain the programming and remote collaboration skills needed to  
successfully contribute to such projects. Students will receive a  
stipend from Google, and may work from their home, or home  
institution, for the duration of the 3 month program. Students will  
each have one or more dedicated mentors with expertise in  
phylogenetic methods and open-source software development.

NESCent is particularly targeting students interested in both  
evolutionary biology and software development. Project ideas (see URL  
below) range from visualizing phylogenetic data in R, to development  
of a Mesquite module, web-services for phylogenetic data providers or  
geophylogeny mashups, implementing phyloXML support, navigating  
databases of networks, topology queries for PhyloCode registries, to  
phylogenetic tree mining in a MapReduce framework, and more.

The project ideas are flexible and many can be adjusted in scope to  
match the skills of the student. If the program sounds interesting to  
you but you are unsure whether you have the necessary skills, please  
email the mentors at the address below.  We will work with you to  
find a project that fits your interests and skills.

INQUIRIES:
Email any questions, including self-proposed project ideas, to  
phylosoc {at}
nescent {dot} org.

TO APPLY:
Apply on-line at the Google Summer of Code website
(http://code.google.com/soc/2008), where you will also find GSoC program
rules and eligibility requirements.  The 1-week application period for
students opens on Monday March 24th and runs through Monday, March  
31st, 2008.

Hilmar Lapp and Todd Vision
US National Evolutionary Synthesis Center

=====
URLs:
=====

2008 NESCent Phyloinformatics Summer of Code:
http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2008

Eligibility requirements:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_eligibility

Stipends:
http://code.google.com/opensource/gsoc/2008/faqs.html#0.1_administrivia

To sign up for quarterly NESCent newsletters: with announcements about
upcoming programs at the Center:
http://www.nescent.org/about/contact.php


From hlapp at gmx.net  Sat Mar 22 19:30:07 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 15:30:07 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <14191FB6-A8DF-4F60-9FF7-FDAA8F3974B8@gmx.net>

Hi Erik,

I suspect that's a seperate Bio::Species issue. If you query your  
BioSQL database for the existence of the taxon:

SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

do you get a result? If not, then for some reason the taxon wasn't  
yet included in the NCBI taxonomy that you loaded.

If yes, then somehow BioPerl didn't properly parse out the taxonID  
from the record. There should have been another message preceding the  
error below, could you post that too? Otherwise, can you rerun with -- 
printerror as command line argument?

Note also that you can always specify --safe to go past any loading  
error. In fact that's what I recommend doing unless you want to debug  
why a particular record doesn't load.

BTW I would recommend that you restore the CASTs that were removed in  
Pg 8.3; otherwise you may hit random issues in Bioperl-db whenever a  
parameter value for a string-type column happens to be a number.  
(taxon.ncbi_taxon_id is of type integer)

See http://people.planetpostgresql.org/peter/index.php?/archives/18- 
Readding-implicit-casts-in-PostgreSQL-8.3.html

as per Greg's email.

	-hilmar

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> Hi,
>
> PostgreSQL 8.3.1
> DBD::Pg 2.3.0
> perl 5.8.8
>
> (The following error may have to do with the 8.3 problems
> that I reported yesterday (bug 2472) - I don't know)
>
>  I ran biosql-schema/scripts/load_ncbi_taxonomy.pl without
> problem.
>
> Then I ran scripts/biosql/load_seqdatabase.pl as:
>
> perl scripts/biosql/load_seqdatabase.pl \
>   -driver Pg \
>   -dbuser xxxxxxx \
>   -dbname bioseqdb \
>   -namespace swissprot \
>   -format swiss \
>    /DATA/ms/ftp.ebi.ac.uk/pub/databases/uniprot/current_release/ 
> knowledgebase/complete/uniprot_sprot.dat
>
> It took two hours to load 26504 records (7%) of
> uniprot_sprot.dat (is it expected to be so slow?), then
> failed with:
>
> Could not store Q2UXW0:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: create: object (Bio::Species) failed to insert or to
> be found by unique key
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:206
> STACK: Bio::DB::Persistent::PersistentObject::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:244
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:169
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/BioSQL/ 
> BasePersistenceAdaptor.pm:251
> STACK: Bio::DB::Persistent::PersistentObject::store
> /home/aardvark/bin/perl/lib/site_perl/5.8.8/Bio/DB/Persistent/ 
> PersistentObject.pm:271
> STACK: scripts/biosql/load_seqdatabase.pl:630
> -----------------------------------------------------------
>
>
> I don't know if this is directly related to the 8.3
> casting problems I reported yesterday (bug 2472), or a
> separate Bio::Species issue
>
>
> regards,
>
> Erik Rijkers
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Mar 22 20:01:51 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 16:01:51 -0400
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 - load_seqdatabase.pl /
	swissprot
In-Reply-To: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
References: <16589.156.83.1.157.1206135827.squirrel@webmail.xs4all.nl>
Message-ID: <69D3EA33-810B-40EA-8687-752FA1A34FBF@gmx.net>

Forgot to respond to this:

On Mar 21, 2008, at 5:43 PM, Erik wrote:
> It took two hours to load 26504 records (7%) of uniprot_sprot.dat  
> (is it expected to be so slow?)


The last time I used to load those regularly it was a bit faster (~ 5  
seqs/s) but it is in a ballpark that wouldn't raise a red flag for me.

BTW you can make it print statistics using the --logchunk N option,  
where N is the number of seqs after which you want the current count  
and the #recs/s printed.

You may get it to be faster if you tune the database (e.g., make sure  
there is enough memory for index reorganization, transaction log and  
tablespace datafile are on separate disks, etc; fiddling with the  
query optimizer has probably little effect as almost all queries are  
simple lookups or inserts).

That all said, the strength of load_seqdatabase.pl isn't speed. It  
doesn't make use of any bulk upload optimizations, and therefore the  
initial load of a very large database will take its time. The power  
is more in subsequent updates where you can configure what you want  
to happen, and during which the database is never in an inconsistent  
state, so it can run in the background.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From er at xs4all.nl  Sat Mar 22 20:34:14 2008
From: er at xs4all.nl (Erik)
Date: Sat, 22 Mar 2008 21:34:14 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>

On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:

> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;

No, I don't seem to have that particular id, although I
ran the ncbi load script yesterday just before the sprot. 
Btw, in the meantime I figured out that it was a parsing
error choking on an unexpected period.

You asked for preceding errors, but there were none.

I have now restarted the same uniprot_sprot.dat load with
--safe, which if I understand you correctly will just skip
any non-parsable records.


And wrt the postgres 8.3 casting:

I only added the first cast of the list of Peter Eisentraut:

CREATE FUNCTION pg_catalog.text(integer) RETURNS text
STRICT IMMUTABLE LANGUAGE SQL AS 'SELECT
textin(int4out($1));'; --added 20080322
CREATE CAST (integer AS text) WITH FUNCTION
pg_catalog.text(integer) AS IMPLICIT; --added 20080322

I hope eventually a more durable solution will be found -
I fear this reinstalling of old casting functionality will
generate unexpected problems of it's own.

But it seems a good intermediary solution; with it, the
previously failing t/16odba.t succeeds...


Thank you,

Erik Rijkers


From hlapp at gmx.net  Sat Mar 22 21:16:18 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 17:16:18 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
References: <5975.156.83.1.157.1206218054.squirrel@webmail.xs4all.nl>
Message-ID: <3C253027-5A2B-4C0C-9DF6-A0DA84CC96A8@gmx.net>


On Mar 22, 2008, at 4:34 PM, Erik wrote:
> On Sat, March 22, 2008 20:30, Hilmar Lapp wrote:
>
>> SELECT * FROM taxon WHERE ncbi_taxon_id = 326939;
>
> No, I don't seem to have that particular id, although I
> ran the ncbi load script yesterday just before the sprot.

Odd. It's on the NCBI taxonomy browser. Maybe just was added the  
other day?

> Btw, in the meantime I figured out that it was a parsing
> error choking on an unexpected period.

Do you want to report that to the BioPerl category on bugzilla.open- 
bio.org?

>
> You asked for preceding errors, but there were none.
>
> I have now restarted the same uniprot_sprot.dat load with
> --safe, which if I understand you correctly will just skip
> any non-parsable records.

And all records that cause some other database error when inserting.  
Note that if you didn't erase the previously loaded records, you will  
either need to choose a new namespace, or, better, use the --lookup  
and --noupdate flags.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mrphysh at juno.com  Sat Mar 22 22:39:22 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Sat, 22 Mar 2008 22:39:22 GMT
Subject: [Bioperl-l] these objects are pretty cool
Message-ID: <20080322.163922.21808.1@webmail01.vgs.untd.com>


I am starting to understand how to use the objects.  

I am the sort who wants to understand how things work, at least on some level.  I think my understanding wold be increased with knowledge of the actual contents of the object.  My book says they are hashes and that makes sense:  field-value.......field-value.......field-value 

But as far as I can tell they cannot be taken apart like a regular hash.

How can I print out the contents?  this makes sense to me:
xxxxxxxxxxxxxxxx
use Bio::SeqIO;

#these objects were made for file input...conversion...file output

while ( my $seq = $out->next_seq() ) {print "$seq\n"; }                        

#or..how about this?

foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }   
xxxxxxxxxxxxxx
this gives back:

Bio::Seq::RichSeq=HASH(0x860dcdc)
Bio::Seq=HASH(0x85f5a20)

Is there some way to slice up the object and look at the parts?

John 
_____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2121/fc/REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/


From jason at bioperl.org  Sat Mar 22 22:46:46 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 22 Mar 2008 15:46:46 -0700
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <20080322.163922.21808.1@webmail01.vgs.untd.com>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
Message-ID: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>

it's got methods that you need to call to get the data.  Did you try  
looking at any of the howtos - they discuss this sort of thing.

http://bioperl.org/wiki/HOWTOs

-jason
On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:

>
> I am starting to understand how to use the objects.
>
> I am the sort who wants to understand how things work, at least on  
> some level.  I think my understanding wold be increased with  
> knowledge of the actual contents of the object.  My book says they  
> are hashes and that makes sense:  field-value.......field- 
> value.......field-value
>
> But as far as I can tell they cannot be taken apart like a regular  
> hash.
>
> How can I print out the contents?  this makes sense to me:
> xxxxxxxxxxxxxxxx
> use Bio::SeqIO;
>
> #these objects were made for file input...conversion...file output
>
> while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>
> #or..how about this?
>
> foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
> xxxxxxxxxxxxxx
> this gives back:
>
> Bio::Seq::RichSeq=HASH(0x860dcdc)
> Bio::Seq=HASH(0x85f5a20)
>
> Is there some way to slice up the object and look at the parts?
>
> John
> _____________________________________________________________
> Click to get a free auto insurance quotes from top companies.
> http://thirdpartyoffers.juno.com/TGL2121/fc/ 
> REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robfsouza at gmail.com  Sat Mar 22 23:11:49 2008
From: robfsouza at gmail.com (Robson Francisco de Souza)
Date: Sat, 22 Mar 2008 20:11:49 -0300
Subject: [Bioperl-l] these objects are pretty cool
In-Reply-To: <DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
References: <20080322.163922.21808.1@webmail01.vgs.untd.com>
	<DA99BA50-B604-4DE1-BE43-194E91236F5F@bioperl.org>
Message-ID: <af6a4f100803221611w393e2dd1r965c11afd45978aa@mail.gmail.com>

Hi John,

The methods usually are designed to provide simple means to access the
object's internal data structure. They are intended to hide the
internal data structure, replacing the need to understand it by a
binding to the documented class methods.

If the documentation does not satisfy you and you still want to take a
look at its data structure, try dumping the object with a code like

use Bio::SeqIO;
use Data::Dumper;
while ( my $seq = $out->next_seq() ) { print Dumper($seq),"\n"; }

and check ou chapters four and five of Programming Perl.
Best,
Robson

PS: watch out for lots of printed output...

2008/3/22, Jason Stajich <jason at bioperl.org>:
> it's got methods that you need to call to get the data.  Did you try
>  looking at any of the howtos - they discuss this sort of thing.
>
>  http://bioperl.org/wiki/HOWTOs
>
>
>  -jason
>
> On Mar 22, 2008, at 10:39 PM, mrphysh at juno.com wrote:
>
>  >
>  > I am starting to understand how to use the objects.
>  >
>  > I am the sort who wants to understand how things work, at least on
>  > some level.  I think my understanding wold be increased with
>  > knowledge of the actual contents of the object.  My book says they
>  > are hashes and that makes sense:  field-value.......field-
>  > value.......field-value
>  >
>  > But as far as I can tell they cannot be taken apart like a regular
>  > hash.
>  >
>  > How can I print out the contents?  this makes sense to me:
>  > xxxxxxxxxxxxxxxx
>  > use Bio::SeqIO;
>  >
>  > #these objects were made for file input...conversion...file output
>  >
>  > while ( my $seq = $out->next_seq() ) {print "$seq\n"; }
>  >
>  > #or..how about this?
>  >
>  > foreach ( my $seqq = $in->next_seq() ) {print "$seqq\n"; }
>  > xxxxxxxxxxxxxx
>  > this gives back:
>  >
>  > Bio::Seq::RichSeq=HASH(0x860dcdc)
>  > Bio::Seq=HASH(0x85f5a20)
>  >
>  > Is there some way to slice up the object and look at the parts?
>  >
>  > John
>  > _____________________________________________________________
>  > Click to get a free auto insurance quotes from top companies.
>  > http://thirdpartyoffers.juno.com/TGL2121/fc/
>  > REAK6aAXgEMUkAKrIEHMi8TYC8kZcjYgq27yjsTkFs54AT2NkhfilW/
>  >
>  >
>  >
>  > _______________________________________________
>  > Bioperl-l mailing list
>  > Bioperl-l at lists.open-bio.org
>  > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  _______________________________________________
>  Bioperl-l mailing list
>  Bioperl-l at lists.open-bio.org
>  http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From er at xs4all.nl  Sat Mar 22 23:36:13 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 00:36:13 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>

Hi Hilmar,

> either need to choose a new namespace, or, better, use the
> --lookup and --noupdate flags.

scripts/biosql/load_seqdatabase.pl is now churning along
to load uniprot_sprot.dat.  I'll try to gather up the
rejected records for further inspection / parser
improvement.

The next thing is performance, it's really intolerably
slow, and I don't think the database is the bottleneck -
isn't it more likely bioperl object heaviness?  I get
continuous near 100% load for 1 cpu (this machine has 2
cpus).

I could give it 10 or more processors; I am thinking I
could cut up the input into 10 (or more) chunks.

Is there anything specific in bioperl/biosql that knows
how to use multiple cores?

thank you very much for your help

Erik Rijkers


From hlapp at gmx.net  Sun Mar 23 01:40:55 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 22 Mar 2008 21:40:55 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
References: <12669.156.83.1.157.1206228973.squirrel@webmail.xs4all.nl>
Message-ID: <1B55060F-534D-4732-B428-4863DD098654@gmx.net>


On Mar 22, 2008, at 7:36 PM, Erik wrote:
> The next thing is performance, it's really intolerably
> slow, and I don't think the database is the bottleneck -
> isn't it more likely bioperl object heaviness?  I get
> continuous near 100% load for 1 cpu (this machine has 2
> cpus).


Is the database on the same machine? If yes, and a significant  
fraction (~30-50% or even more) of the load are generated by the perl  
script, rather than almost everything coming from the postmaster,  
then indeed the database is not the bottleneck.

Of course, the bioperl object creation overhead takes a toll too. I  
would be surprised though if BioPerl can't parse more than 3.6  
records/s on a modern CPU; you can convince yourself of that though  
by writing a simple script along the lines of the following and see  
how fast that goes:

my $seqio = Bio::SeqIO->new(-file => '<uniprot_sprot.dat', -format =>  
'swiss);
my $n = 0;
while (my $seq = $seqio->next_seq) {
	$n++;
	# print something every 5,000 sequences or so
}

But maybe load_seqdatabase.pl or even BioSQL or BioPerl aren't  
suitable for your use-case?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Sun Mar 23 14:09:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:09:56 -0500
Subject: [Bioperl-l] Using Bioperl book
In-Reply-To: <0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
References: <OF134946F0.4C537CC5-ONC1257411.00542F3C-C1257411.00549AFA@LGLimagrain.com>
	<0AFC1124-EE57-4168-BBF3-98845F0138C9@bioperl.org>
Message-ID: <4C401D4F-064C-43F9-A37C-14FA65A96657@uiuc.edu>

Maybe something to discuss at BOSC?

chris

On Mar 19, 2008, at 12:54 PM, Jason Stajich wrote:

> it's probably more than 6 months out. We still haven't finished  
> writing it as life and work continues to intrude on book writing.
>
> -jason
> On Mar 19, 2008, at 8:22 AM, Jorge.DUARTE at biogemma.com wrote:
>
>> Hello,
>>
>> i just found on amazon something about a book "Using Bioperl",  
>> published
>> on the 1st of March 2008 but which is no more available.
>>
>> Does anyone know how to get it ?
>>
>> Many thanks,
>>
>> Jorge.
>>
>> ---
>> Jorge Duarte
>> Bioinformatics Software Engineer
>> BIOGEMMA
>> Z.I. Du Br?zet
>> 8, Rue des Fr?res Lumi?re
>> 63028 CLERMONT FERRAND Cedex 2
>> FRANCE
>> Tel : +33 (0)4 73 39 60 73
>> Fax : +33 (0)4 73 39 60 71
>> E-mail : jorge.duarte at biogemma.com
>>
>> *****************************************************************
>>       Pour toute demande de support merci d'inclure
>> BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
>>         dans les destinataires lors du premier contact
>> *****************************************************************
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Mar 23 14:17:56 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 23 Mar 2008 09:17:56 -0500
Subject: [Bioperl-l] Priorities for a bioperl-1.6 release
In-Reply-To: <47DFE089.1070304@sendu.me.uk>
References: <OF9C92ED19.F51E3401-ON852573F0.006310A3-852573F0.006383D1@gsk.com>
	<47DFE089.1070304@sendu.me.uk>
Message-ID: <FC757880-6FD8-44A5-97BE-F5F37A738C2B@uiuc.edu>


On Mar 18, 2008, at 10:32 AM, Sendu Bala wrote:

> aaron.j.mackey at gsk.com wrote:
>>> Or is the split intended to be 'core' == "anything and everything
>>> that was in 1.4", '????' == "everything else"? In which case,
>>> what's a good name for "modules created after 1.4"? 'crust'? ;)
>> Nah, "icing".
>> a module "use" map might be very useful to help identify "core" vs.
>> other layers of mantle/crust/icing.
>> http://www.perlmonks.org/?node_id=87329 http://search.cpan.org/src/NEILB/pmusage-1.2/
>
> Thanks for those. Neither could quite cope with BioPerl, but I've  
> munged
> them together and hacked up 'module_usage.pl' which I've just  
> committed
> to the maintenance directory of bioperl-live.
>
> module_usage.pl ../Bio
>
> Produces:
> *warning, may crash your browser; download it and view in a dedicated
> image viewer*
> http://bix.sendu.me.uk/files/module_usage.jpeg
> http://bix.sendu.me.uk/files/module_usage.txt
>
> ...
>
> I haven't done any full analysis along these lines and leave as an
> exercise for the interested reader for now ;)

I'm coming into this late (just got back) but I agree, this would be  
very useful.  Your updates based on Aaron's comments help quite a bit.

> Chris Fields wrote:
>> http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules
>> I'm pretty flexible on any of that; it's a proposal only and I think
>> some of it may be wrongheaded, but hey, I'm willing to take a few
>> rotten tomatoes.  The key issue is we should try to work out what we
>> mean by 'core' or the core library.  I have a rather extreme view of
>> it as being the bare essentials without external, non-perl core
>> dependencies (only SeqI/PrimarySeqI, AlignI, AnnotationI, SeqFeatureI
>> and required modules for those classes) but I'm sure others would
>> lump in parsers, DB functionality, etc.  I basically suggest placing
>> those (and any stable but potentially non-core code) in a
>> 'bioperl-main', with any unstable or untested code going into a
>> 'bioperl-unstable'.
>
> My thoughts are along these lines:
> # I agree that core should have no external dependencies
> # I agree that it might mostly be interfaces
> # It should represent a framework with all the interfaces (that have
>  stable APIs), directory structure and base classes that everything
>  else relies on
> # It might not do much useful bioinformatics, but provides just about
>  everything needed for a dev to create a new module that does

Yes, that's essentially the idea.

>> In essence, bioperl-main would require core and resemble a stable
>> release; bioperl-unstable would require bioperl-main (and core) and
>> resemble a dev release.  Not sure how versioning would go or if this
>> is a viable option at all, but it's worth discussing.
>
> # I agree that this 3-way split seems reasonable
> # bioperl-main would consist primarily of the 'leaves' of the module
>  tree, mostly parsers and the like which, whilst 'stable' and tested
>  should still be split away from core because the data sources they
>  parse could change format slightly
> # bioperl-unstable, better bioperl-bleed, would feature brand-new
>  stuff, be it new parsers for totally new formats, new APIs that do
>  something not thought of before etc. When they are complete, bug-free
>  and have stood the test of time they get moved into bioperl-main.
>  (It is not a place for all new commits; bug fixes to something in
>  bioperl-main would be committed to bioperl-main)
> # The current splits (bioperl-run, bioperl-network etc.) do not get
>  their own core and bleed variant. Anything they need for core
>  functionality would enter the single bioperl-core, anything new
>  would enter the single bioperl-bleed, and anything stable would
>  be in their own bioperl-[package]
>
> Discuss :)

We can work on updating the plan via the wiki as well as the mail  
list.  I find it easier to track; we can always link back to the mail  
list when needed.

http://www.bioperl.org/wiki/Proposed_1.6_core_modules
http://www.bioperl.org/wiki/Talk:Proposed_1.6_core_modules

chris


From er at xs4all.nl  Sun Mar 23 18:16:05 2008
From: er at xs4all.nl (Erik)
Date: Sun, 23 Mar 2008 19:16:05 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
> aren't suitable for your use-case?

well, that may turn out to be the case, but I'm not quite
deterred yet.

I am in a situation like many others, I think: microarray,
mass spec, and chipseq (Solexa) data all need
annotation,and while it is easy to retrieve some useful
records from public data sources (entrez, ensembl &
biomart, etc.), it is not so easy to have such high
atomicity in the locally stored annotation data that
fine-grained filtering and sorting on a sql level becomes
possible.  I hope the bioperl parsers, together with the
biosql schema, will give SQL access to all or most data
bits.

And I understand GBrowse can run on top of BioSQL/Pg too,
albeit somewhat preliminary; this is another usage I will
need.

btw, should not all those references to postgres 7.3 be
upgraded to something newer, like 8.2.7 (maybe not yet 8.3
heh) ?  7.3 is not supported anymore by the pg project.

Sprot loaded in 20 hours. Only 170 were rejected - not too
bad.

Thanks,

Erik Rijkers


From hlapp at gmx.net  Sun Mar 23 19:22:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun, 23 Mar 2008 15:22:46 -0400
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
In-Reply-To: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
References: <8382.156.83.1.157.1206296165.squirrel@webmail.xs4all.nl>
Message-ID: <EF4031EF-FBB5-493E-8D3C-8530E2ADFC20@gmx.net>


On Mar 23, 2008, at 2:16 PM, Erik wrote:
> On Sun, March 23, 2008 02:40, Hilmar Lapp wrote:
>> But maybe load_seqdatabase.pl or even BioSQL or BioPerl
>> aren't suitable for your use-case?
>
> well, that may turn out to be the case, but I'm not quite
> deterred yet.
>
> I am in a situation like many others, I think: microarray,
> mass spec, and chipseq (Solexa) data all need
> annotation,and while it is easy to retrieve some useful
> records from public data sources (entrez, ensembl &
> biomart, etc.), it is not so easy to have such high
> atomicity in the locally stored annotation data that
> fine-grained filtering and sorting on a sql level becomes
> possible.  I hope the bioperl parsers, together with the
> biosql schema, will give SQL access to all or most data
> bits.

If you mean annotation by data bits then yes, it should be fairly  
normalized (possibly more normalized than you want, in fact).

Also, using BioSQL as the sequence and sequence annotation model add- 
on to some other database holding your lab data is what many others  
have used it for too.

>
> And I understand GBrowse can run on top of BioSQL/Pg too,
> albeit somewhat preliminary; this is another usage I will
> need.

It can, though keep in mind that that's not the use-case it (BioSQL)  
was built for. If you need to have rapid access to genome intervals  
with 10s of thousands of features and their annotation, you'll have  
start thinking about a more de-normalized data store to run this off  
of, such as populating a native GBrowse GFF store.

>
> btw, should not all those references to postgres 7.3 be
> upgraded to something newer, like 8.2.7 (maybe not yet 8.3
> heh) ?  7.3 is not supported anymore by the pg project.

Oops, indeed. Where are they?

>
> Sprot loaded in 20 hours. Only 170 were rejected - not too
> bad.

That's great. Would be nice if you can provide some rough summary as  
to why they were rejected (if that's obvious), such as taxon errors,  
or other errors.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From sac at bioperl.org  Sun Mar 23 22:20:43 2008
From: sac at bioperl.org (Steve Chervitz)
Date: Sun, 23 Mar 2008 15:20:43 -0700
Subject: [Bioperl-l] HitTableWriter error
In-Reply-To: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
References: <088B032A-E2EE-4E88-8F4C-A206D2230910@leicester.ac.uk>
Message-ID: <8f200b4c0803231520o5082f9f5mf7be8cd061faa98f@mail.gmail.com>

Hi Owen

Sorry, I don't have time to look into this right now, but two thoughts:

1) The ResultTableWriter is intended to convert standard blast reports into
a tabular format. So if you already have tabular results, you are sort of
using it "off label", but in principle, it should work.

2) The iteration method is only available to BlastHit objects, since it is
only relevant to PSI-blast results. The fact that you got this error when
working with blast results indicates that the parser did not generate the
correct object type for your hits, using the GenericHit when it should have
used BlastHit. This is just a hunch and would be worth following up on. Can
you submit this as a bug report?

Thanks,
Steve


On Thu, Mar 20, 2008 at 4:56 AM, Owen Lancaster <ol8 at leicester.ac.uk> wrote:

> Hello
>
> I hope you don't mind me emailing you but I have come across a problem
> when trying to use HitTableWriter. The error can be seen below - the
> input for the script is the BLAST tabular output (specified with the -
> m 8 option) from a blastn search.
>
> If you have any idea what the problem might be I would much appreciate
> it! Hope you can help...
>
> Thanks
>
> Owen
>
>
> Using default column map.
>
> ------------- EXCEPTION  -------------
> MSG: Trouble in ResultTableWriter::_set_row_data_func() eval: Can't
> locate object method "iteration" via package
> "Bio::Search::Hit::GenericHit" at (eval 97) line 1, <GEN1> line 2.
>
>
>
> STACK Bio::SearchIO::Writer::ResultTableWriter::__ANON__ /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/ResultTableWriter.pm:328
> STACK Bio::SearchIO::Writer::HitTableWriter::to_string /Library/Perl/
> 5.8.8/Bio/SearchIO/Writer/HitTableWriter.pm:268
> STACK Bio::SearchIO::write_result /Library/Perl/5.8.8/Bio/SearchIO.pm:
> 331
> STACK Bio::SearchIO::blast::write_result /Library/Perl/5.8.8/Bio/
> SearchIO/blast.pm:2208
> STACK toplevel ./generate_discordant_tails.pl:62
>
> --------------------------------------
>
>


From greg at turnstep.com  Mon Mar 24 00:42:36 2008
From: greg at turnstep.com (Greg Sabino Mullane)
Date: Mon, 24 Mar 2008 00:42:36 -0000
Subject: [Bioperl-l] [BioSQL-l] postgres 8.3 will not cast text to
	integer any longer
In-Reply-To: <C24DE5CA-F433-48A1-BF08-A6D056A2EBCE@gmx.net>
Message-ID: <4ab14dcc59d7566b55ba87027055e9fd@biglumber.com>


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


>> Depending on what I (or can someone else update us on this?) find out
>> for the DBD::Pg plans, I'll probably start looking into moving the
>> parameter binding into the driver adapters. Though it does feel
>> pathetic that this is now also not transparent between drivers.
>
> What you are probably looking for is already there, namely:
>
> $dbh->{pg_server_prepare} = 0;

> So disabling server-side prepares will leave values quoted? Having
> server-side prepares would be very useful though, especially for
> Bioperl-db with its many lookup queries that all use similar
> parameter values.

Yes, it forces DBD::Pg to do the quoting itself, which basically means
that everything is shipped to the server as a single SQL string, and
no placeholders are used. In the grand scheme of things, the speed
difference is not large for most queries. Certainly one way would be
to turn this on for 8.3 and above, and slowly migrate the queries/schema
over time.

>> There's good reasons for the casting enforcement in 8.3

> I do understand that, but it's also a sharp contrast to other RDBMSs
> that doesn't it make it easier for people to choose Pg when they
> should, and doesn't help writing cross-platform database applications
> either.

I'm not overly familiar with how other databases treat this, but I've
heard DB2 can be a stickler about this too. I've not dug into the bioperl
code in a while, to be honest, so I'm not sure what sort of queries we're
talking about. Certainly long-term the code and schema should move away
from implicit casting. Maybe a better short-term solution is addind
the more obvious casts (e.g. text<->int) back in.

> Do you have links to some of the key threads showing what rationale
> went into the decision? (Or should I just search for your name?) I'd
> like to read up on that first before pouring more oil into the fire.
> I suspect that many of those who made the decision are never faced
> with needing to write cross-RDBMS code.
>
> Also, I wonder why this wasn't made a configurable option so it can
> be disabled by a simple config file change (such as the move away
> from automatic OID columns). But obviously this is the wrong list for
. discussing this (though Bioperl-db *is* one of those pieces of
> software that must be cross-RDBMS).

I did ask about that, and was told it would not have been easy to do so.
But I agree, a phasing in period (heck, even a warning) would have been
nice. Feel free to pour some oil on the fire, I think this is one of
many apps that has been affected. (I've run across two other major
cross-DB apps (Interchange and MediaWiki) that are struggling with the
same pain. I managed to painfully fix the latter, but the former is way
too complex to tackle at the moment).

I could not find the thread(s?) I weighed in on, but you can find some
relevant discussions by googling "strict-typing benefits grokbase"

- --
Greg Sabino Mullane greg at turnstep.com
PGP Key: 0x14964AC8 200803232039
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfm+NAACgkQvJuQZxSWSsi4ogCdGNWvCJIzXxb+YKzdm6wwxQMv
p3AAnizkWXoo/rvxv4KVdC8tD0vF87k3
=dNYi
-----END PGP SIGNATURE-----


From er at xs4all.nl  Mon Mar 24 00:45:49 2008
From: er at xs4all.nl (Erik)
Date: Mon, 24 Mar 2008 01:45:49 +0100 (CET)
Subject: [Bioperl-l] postgres 8.3 - load_seqdatabase.pl / swissprot
Message-ID: <19067.156.83.1.157.1206319549.squirrel@webmail.xs4all.nl>

On Sun, March 23, 2008 20:22, Hilmar Lapp wrote:
>
> On Mar 23, 2008, at 2:16 PM, Erik wrote:
>> Sprot loaded in 20 hours. Only 170 were rejected - not
>> too bad.
>
> That's great. Would be nice if you can provide some rough
> summary as  to why they were rejected (if that's
obvious), such as taxon errors,
> or other errors.

see

   http://bugzilla.open-bio.org/show_bug.cgi?id=2474

So I think one easy improvement will be to enlarge that 
varchar(40) column, dbxref.accession. See the following:

select
	dbname
,	accession
,	length(accession)
from 	dbxref
where accession ~ 'Cyc'
order by length(accession) desc
limit 100

patch attached. (which will probably get bug 2389 resolved)

It seems to me bioentry.accession (maybe identifier too?)
needs a similar enlargement.


thanks,

Erikjan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biosqldb-pl.sql.diff
Type: application/octet-stream
Size: 535 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080324/a564d7ee/attachment-0004.obj>

From Russell.Smithies at agresearch.co.nz  Thu Mar 27 02:13:16 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 27 Mar 2008 15:13:16 +1300
Subject: [Bioperl-l] Bioinformatician wanted
Message-ID: <D5DBA313349A4B458528BE63B387F36C06A09345@imail.agresearch.co.nz>

Dear colleagues,

 
It would be appreciated if you could bring the following position to the
attention of potential candidates.

 
Bioinformatician wanted


Many of the world's most amazing scientific discoveries are the result
of someone 'taking a closer look'. It's this inquisitive nature and
relentless search for answers that fuels scientific advancement. And
it's also what we'd like you to apply to us, right now. Mind you, you
won't need to look too hard to discover that AgResearch is the best
place to break all new ground in your career.  


At first glance you'll see we are New Zealand's largest research
institute -world leaders in pastoral research working at the
leading-edge of innovation. Scratch the surface a little more and you'll
find all the diversity and intellectual challenge a Bioinformatician
could ask for. This is a highly collaborative role where you'll be
involved in everything from the analysis of genomic data to the design,
development, implementation and testing of bioinformatics tools.
Knowledge sharing is a pivotal component of our success, so you can also
look forward to acting in consultant capacity (both internally and
externally) and the autonomy to contribute to scientific publications. 

 
Our people are at the pinnacle in their professions, so with your
biological background, higher qualification in bioinformatics or
computing and your experience in contributing bioinformatics expertise
to research groups, you'll not only fit right in, you'll hit the ground
running. An outstanding communicator, time manager and relationship
builder, you'll also come to us with a thorough knowledge of Unix,
pipeline-development, web based technologies and scripting and
programming languages. 

 
AgResearch is a unique organisation at the forefront of our field, and
as far as your future's concerned, that makes us well worth a closer
look. There are many benefits waiting to be discovered here, so isn't it
time you experienced them? The job description is available online and
applications are invited at www.agresearch.co.nz/recruitment/ Reference
AGR661, or contact Nauman Maqbool for further information.

 
Applications close 11 April 2008.

 
Regards,
Russell


Russell Smithies 
Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 
AgResearch Farming Food and Health. First 
Te Ahuwhenua, Te Kai me te Whai Ora. Tuatahi 
Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From Marc.Logghe at ablynx.com  Thu Mar 27 13:26:24 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 14:26:24 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
Message-ID: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>

Hi all,

I am a little bit confused about the above mentioned seq_inds() method.
At first, I had the impression that the method returns an array of
positions in the hsp (hit or query) sequence.

At least that is what one would expect looking at the example usage in
the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
second code block).

Am I correct in believing you can only do this if your hsp query stretch
starts at position 1 of the query sequence?

I think seq_inds() returns a list of positions relative to the query/hit
sequence. So, the code shown in the HOWTO is a kind of special case.

However, I do not understand how seq_inds() is dealing with gaps.

An example. If you blast the worm protein ZK822.4 against swissprot
using blastp at ncbi you get this hsp as top:

 
>sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461
 
 Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
stats.
 Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
 
Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

 
Now, if you call seq_inds(query => 'gap') on that particular hsp object,
you get these positions: 417, 431, 432. Obviously, there is no gap in
the original query sequence at these positions. 
How do you have to read these numbers ? Remark also that for instance
417 is the res just in front of the gap.

Regards,

Marc

 
From bix at sendu.me.uk  Thu Mar 27 14:46:35 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 27 Mar 2008 14:46:35 +0000
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <47EBB34B.8010606@sendu.me.uk>

Marc Logghe wrote:
> Hi all,
> 
> I am a little bit confused about the above mentioned seq_inds() method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.

Yes...


> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
> 
> Am I correct in believing you can only do this if your hsp query stretch
> starts at position 1 of the query sequence?

No...


> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  453
>             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions. 
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.

Its purpose is to let you know the position in query or subject 
coordinates where something interesting happened in the alignment. So 
seq_inds(query => 'gap') is telling you all the places that a gap starts 
in the alignment in terms of the query coordinates. Hence 417 etc.


(Actually, does 432 make sense? Shouldn't it be 431 twice?)


From Marc.Logghe at ablynx.com  Thu Mar 27 15:09:56 2008
From: Marc.Logghe at ablynx.com (Marc Logghe)
Date: Thu, 27 Mar 2008 16:09:56 +0100
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <47EBB34B.8010606@sendu.me.uk>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
Message-ID: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>

Hi Sendu, Chris

> > At least that is what one would expect looking at the example usage
in
> > the HOWTOs
(http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> > second code block).
> >
> > Am I correct in believing you can only do this if your hsp query
stretch
> > starts at position 1 of the query sequence?
> 
> No...

<HOWTO>
# put all the conserved matches in query strand into an array
my @str_array = split "",$hsp->query_string;
foreach ( $hsp->seq_inds('query','conserved') ){
  push @conserved,$str_array[$_ - 1];
}
</HOWTO>

$hsp->query_string will return
'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'

In my example using the 'gap' class (instead of 'conserved'), @str_array
will contain 417, 431 and 432. The off-by-one indices do not exist in
that array.
Therefore, I still think the howto shows a special case where the hsp
query sequence starts at 1 (compared to 402 in my particular example). 


> 
> 
> > Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
453
> >             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> > Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
193
> >
> > Now, if you call seq_inds(query => 'gap') on that particular hsp
object,
> > you get these positions: 417, 431, 432. Obviously, there is no gap
in
> > the original query sequence at these positions.
> > How do you have to read these numbers ? Remark also that for
instance
> > 417 is the res just in front of the gap.
> 
> Its purpose is to let you know the position in query or subject
> coordinates where something interesting happened in the alignment. So
> seq_inds(query => 'gap') is telling you all the places that a gap
starts
> in the alignment in terms of the query coordinates. Hence 417 etc.

So, this means you have to interpret that as a gap is coming after 417 ?

> 
> 
> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
Don't know, depends on how you have to 'read' this.
Thanks for looking into this.
Regards,
Marc


From cjfields at uiuc.edu  Thu Mar 27 15:05:59 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 10:05:59 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
Message-ID: <489252B3-2255-45C3-9219-A8F8A0978B89@uiuc.edu>

According to the GenericHSP::seq_inds() POD, seq_inds() reports  
residue positions (indices) for the query/subject based on identity/ 
conservation, i.e. these are fro the original sequence positions as  
determined by the HSP data, not alignment column positions.  'gaps'  
should be reported at the position prior to where a gap is inserted.   
However I think something is getting borked when the gap length is  
longer than one, so I would partially qualify this as a bug.

Example: When I ran this using bioperl-live it gives a different set  
of gaps indices which appear to be correct.  I reran the BLASTP using  
the web form using your query against swissprot and parsed it.  I got  
slightly different results for the BLAST report (probably differences  
in the query sequence):

 >gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461

  Score = 35.8 bits (81),  Expect = 0.47, Method: Composition-based  
stats.
  Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)

Query  394  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  445
             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

.....

seq_inds('query' => 'gaps') reports 409,423, and 424, which is  
partially correct, e.g. there is a gap inserted after position 409 and  
423 in the query.  However, no gap is present after 424; I think this  
occurs b/c the gap length is 2.  The other HSPs report similar problems.

chris

P.S. Just saw than Sendu posted; I agree, seq. positions with gap  
lengths > 1 should be repeated.  Should be easy to fix that.

On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote:

> Hi all,
>
> I am a little bit confused about the above mentioned seq_inds()  
> method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.
>
> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
>
> Am I correct in believing you can only do this if your hsp query  
> stretch
> starts at position 1 of the query sequence?
>
> I think seq_inds() returns a list of positions relative to the query/ 
> hit
> sequence. So, the code shown in the HOWTO is a kind of special case.
>
> However, I do not understand how seq_inds() is dealing with gaps.
>
> An example. If you blast the worm protein ZK822.4 against swissprot
> using blastp at ncbi you get this hsp as top:
>
>
>
>> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
> Length=1461
>
> Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
> stats.
> Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
>
> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL   
> 453
>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL   
> 193
>
>
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp  
> object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions.
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.
>
> Regards,
>
> Marc
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Thu Mar 27 16:04:20 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 27 Mar 2008 11:04:20 -0500
Subject: [Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds
In-Reply-To: <03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
References: <03C512635899144083CADB0EE222018901800955@alpaca.lan.ablynx.com>
	<47EBB34B.8010606@sendu.me.uk>
	<03C512635899144083CADB0EE2220189018009E1@alpaca.lan.ablynx.com>
Message-ID: <FF47D72B-7CBA-491B-9C9B-9B32FD2AA8BE@uiuc.edu>


On Mar 27, 2008, at 10:09 AM, Marc Logghe wrote:

> Hi Sendu, Chris
>
>>> At least that is what one would expect looking at the example usage
> in
>>> the HOWTOs
> (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> second code block).
>>>
>>> Am I correct in believing you can only do this if your hsp query
> stretch
>>> starts at position 1 of the query sequence?
>>
>> No...
>
> <HOWTO>
> # put all the conserved matches in query strand into an array
> my @str_array = split "",$hsp->query_string;
> foreach ( $hsp->seq_inds('query','conserved') ){
>  push @conserved,$str_array[$_ - 1];
> }
> </HOWTO>
>
> $hsp->query_string will return
> 'IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL'
>
> In my example using the 'gap' class (instead of 'conserved'),  
> @str_array
> will contain 417, 431 and 432. The off-by-one indices do not exist in
> that array.
> Therefore, I still think the howto shows a special case where the hsp
> query sequence starts at 1 (compared to 402 in my particular example).

We'll have to look at it; it should probably be clarified particularly  
in reference to 'gaps' and use of seq positions vs. HSP (or alignment)  
positions.

Think of it this way; seq_inds() takes 'identical', 'conserved', etc.,  
all of which refer to the original positions (indices) of the sequence  
which fall into the particular category asked for.  In these cases we  
are using the coordinates for query/hit directly from the HSP info in  
the report.  This is done with the express purpose of mapping  
attributes back to the original sequence, be it the query or subject.

Gaps, however, are tricky, since sequence coordinates refer to  
residues (not gaps) when using BLAST.  In this case we use the  
sequence position prior to the gap to note where a gap is inserted.   
The previous results, then, would be wrong as there is no gap inserted  
after 432.  I just committed a fix which just repeats the position  
based on the number of gaps.

>>> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL
> 453
>>>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
>>> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL
> 193
>>>
>>> Now, if you call seq_inds(query => 'gap') on that particular hsp
> object,
>>> you get these positions: 417, 431, 432. Obviously, there is no gap
> in
>>> the original query sequence at these positions.
>>> How do you have to read these numbers ? Remark also that for
> instance
>>> 417 is the res just in front of the gap.
>>
>> Its purpose is to let you know the position in query or subject
>> coordinates where something interesting happened in the alignment. So
>> seq_inds(query => 'gap') is telling you all the places that a gap
> starts
>> in the alignment in terms of the query coordinates. Hence 417 etc.
>
> So, this means you have to interpret that as a gap is coming after  
> 417 ?

Yes.

>> (Actually, does 432 make sense? Shouldn't it be 431 twice?)
> Don't know, depends on how you have to 'read' this.
> Thanks for looking into this.
> Regards,
> Marc

Repeating the position based on the number of gaps is now the default  
in bioperl-live.  Just working on fixing problems with collapsing  
numbers and tests and everything should be fine.

chris


From hiekeen at gmail.com  Sat Mar 29 16:09:18 2008
From: hiekeen at gmail.com (Jinyan Huang)
Date: Sun, 30 Mar 2008 00:09:18 +0800
Subject: [Bioperl-l] Gene Id converts.
Message-ID: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>

Hi,

I have a list of gene bank accession id. I want to convert these ids to NCBI id.

For example:

>From NM_011917 to 2919914.

How can I do it?

Thanks


-- 
Best regards,
Jinyan Huang (ekeen)
School of Life Sciences and Technology, 1302 Room
Tongji University
Siping Road 1239, Shanghai 200092
P.R. China
Tel :0086-21-65981041
Msn: hiekeen at hotmail.com
eMail: hiekeen at gmail.com


From cjfields at uiuc.edu  Sat Mar 29 17:42:50 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 29 Mar 2008 12:42:50 -0500
Subject: [Bioperl-l] Gene Id converts.
In-Reply-To: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
References: <fb5dae380803290909o6523f71hc479570846594f3@mail.gmail.com>
Message-ID: <FC7E8E14-83E7-4611-868B-0D1390488C82@uiuc.edu>

There are the GenBank LiveLists (updated every Sunday), which has  
accession/version/UID mappings for nuc and protein GenBank records.  I  
haven't used it personally but it's worth a look:

ftp://ftp.ncbi.nih.gov/genbank/livelists/
There is also gene2accession, which contains mappings between  
accession and UID (though this is more EntrezGene-related, I believe):
ftp://ftp.ncbi.nih.gov/gene/DATA/

Both have documentation detailing formats.  I would recommend using  
one of the above two on a local database setup if you plan on  
converting a large number of accessions.

Bio::DB::EUtilities can also do this but is web-based via eutils.   
There are a couple of stub examples in the Cookbook HOWTO under  
'efetch' on converting accessions to UID (and vice versa), though note  
there is no one-to-one correspondence.  You can also convert UIDs to  
accessions using 'esummary' but the converse (accession to GI)  
requires, strangely, using efetch to grab the UIDs first, then re- 
retrieving the acccessions via esummary for one-to-one correspondence.
http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

chris

On Mar 29, 2008, at 11:09 AM, Jinyan Huang wrote:

> Hi,
>
> I have a list of gene bank accession id. I want to convert these ids  
> to NCBI id.
>
> For example:
>
>> From NM_011917 to 2919914.
>
> How can I do it?
>
> Thanks
>
>
> -- 
> Best regards,
> Jinyan Huang (ekeen)
> School of Life Sciences and Technology, 1302 Room
> Tongji University
> Siping Road 1239, Shanghai 200092
> P.R. China
> Tel :0086-21-65981041
> Msn: hiekeen at hotmail.com
> eMail: hiekeen at gmail.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From pradel.d at gmail.com  Mon Mar 31 14:35:26 2008
From: pradel.d at gmail.com (Damien Pradel)
Date: Mon, 31 Mar 2008 16:35:26 +0200
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
Message-ID: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>

Hello,

I use the SeqIO module in order to  parse EMBL files.
Unfortunately I got a problem: the ID was not recognised because instead of
the ID value  I get the answer "unknown_id" ...

So to solve this problem I have modified the file embl.pm located in
directory SeqIO at the line 189 as follow :

if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

in :
if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
       ($name,$mol,$div) = ($1,$2,$3);
   }
   unless( defined $name && length($name) ) {
       $name = "unknown_id";
   }

With this modification, the ID value is correctly collected.

Hope it will help.

Damien


From golharam at umdnj.edu  Mon Mar 31 19:31:56 2008
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon, 31 Mar 2008 15:31:56 -0400
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO module
Message-ID: <47F13C2C.4070909@umdnj.edu>

I have a (very) basic SAX implementation of a SeqIO module to parse 
GenBank XML records.  Right now, it only reads in basic information 
regarding the sequence and the sequence itself.

It does not yet parse the features table.  Should I submit it to be 
included in bioperl or wait until I implement more for the features 
table?  I'm not sure when I'll get around to it though

Ryan


From cjfields at uiuc.edu  Mon Mar 31 20:05:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 15:05:51 -0500
Subject: [Bioperl-l] Very basic implementation of GenBank XML SeqIO
	module
In-Reply-To: <47F13C2C.4070909@umdnj.edu>
References: <47F13C2C.4070909@umdnj.edu>
Message-ID: <4A3D5CD8-13D7-4CBF-B89A-CE81B8804C61@uiuc.edu>

You can submit it either to me directly or to bugzilla (start a new  
bug report as an enhancement request, then attach the relevant files).

Does it have a test suite available?  If not, you should try setting  
one up:

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

chris

On Mar 31, 2008, at 2:31 PM, Ryan Golhar wrote:

> I have a (very) basic SAX implementation of a SeqIO module to parse  
> GenBank XML records.  Right now, it only reads in basic information  
> regarding the sequence and the sequence itself.
>
> It does not yet parse the features table.  Should I submit it to be  
> included in bioperl or wait until I implement more for the features  
> table?  I'm not sure when I'll get around to it though
>
> Ryan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Mon Mar 31 23:58:44 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 31 Mar 2008 18:58:44 -0500
Subject: [Bioperl-l] Bio::SeqIO - Error in ID value collection
In-Reply-To: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
References: <7db1df730803310735n5777aa08s50d763a62c74d050@mail.gmail.com>
Message-ID: <629B8FAF-3A1F-41E2-BFF1-A709DDE56A09@uiuc.edu>

The parser no longer has this line; it has been updated to work with  
both old and new format EMBL.  You might want to try updating from  
Subversion or install the nightly build.

http://bioperl.org/DIST/nightly_builds/

chris

On Mar 31, 2008, at 9:35 AM, Damien Pradel wrote:
> Hello,
>
> I use the SeqIO module in order to  parse EMBL files.
> Unfortunately I got a problem: the ID was not recognised because  
> instead of
> the ID value  I get the answer "unknown_id" ...
>
> So to solve this problem I have modified the file embl.pm located in
> directory SeqIO at the line 189 as follow :
>
> if( $line =~ /^ID\s+(\S+)\s+\S+\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> in :
> if( $line =~ /^ID\s+(.+?)\;\s+([^;]+)\;\s+(\S+)\;/ ) {
>       ($name,$mol,$div) = ($1,$2,$3);
>   }
>   unless( defined $name && length($name) ) {
>       $name = "unknown_id";
>   }
>
> With this modification, the ID value is correctly collected.
>
> Hope it will help.
>
> Damien
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dfog22 at hotmail.com  Wed Mar 26 14:13:25 2008
From: dfog22 at hotmail.com (MathGon)
Date: Wed, 26 Mar 2008 07:13:25 -0700 (PDT)
Subject: [Bioperl-l]  File concatenation
Message-ID: <16301515.post@talk.nabble.com>


For my first post, I will introduce myself. I'm a PhD student in microbiology
focusing in horizontal gene transfer in hyperthermophilic Archaea;

I retrieve a genbank file for each contig of an unfinished genome. I want to
produce a unique genbank file by concatenation. I didn't manage to find a
such script and I'm not enough trained in perl to write it...
Have you got an other solution or a script for me?

Best regards...
-- 
View this message in context: http://www.nabble.com/File-concatenation-tp16301515p16301515.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.