From brian_osborne at cognia.com  Sat Jan  1 11:18:08 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Sat Jan  1 11:15:04 2005
Subject: [Bioperl-l] Bioperl in 2005
In-Reply-To: <001201c4ee95$8f739130$6400a8c0@GOLHARMOBILE1>
Message-ID: <GAEDKMGOKFBLJPKCLKCCIEKEEGAA.brian_osborne@cognia.com>

Ryan,

You could post it to bioperl-l, some one will commit it to CVS.

Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ryan Golhar
Sent: Thursday, December 30, 2004 12:33 PM
To: 'Jason Stajich'; 'Bioperl List'
Subject: RE: [Bioperl-l] Bioperl in 2005


Hi all,

I'd like to contribute a parser module to parse Spidey results.  I took
the sim4 parser and modified a little bit to properly read in spidey
results.  Everything else about it works the same as the sim4 parser as
far as I can tell.

How can I contribute this module?


-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam@umdnj.edu

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason
Stajich
Sent: Wednesday, December 29, 2004 5:46 PM
To: Bioperl List; bioperl-announce-l@bioperl.org
Subject: [Bioperl-l] Bioperl in 2005


I just wanted to use the end of the year as a chance to reflect on what 
we've accomplished in 2004 and think about what 2005 holds for Bioperl.

What happened in 2004?
First of all, this year has been really has been productive at a level 
perhaps only appreciated by the folks who read the bioperl-guts-l list 
which lists the CVS commits.  New modules, bugfixes and code 
improvements have been steadily making their way into the codebase.  
Not only has there been lots of traffic, but more people are 
contributing code and fixes.

We have also seen increased contributions to the HOWTOs which we hope 
will be an effective place to explain how to use sets of modules to 
complete a particular task.  We are continually working to improve the 
documentation.  This is a balance between a developer trying to get 
something accomplished for their own research and wanting other people 
to use their code (and not wanting to field lots of emails about a 
particular module).  Open source software written solely by volunteers
suffers from a reward system which values code over 
documentation and writing tutorials.  We welcome ideas on changes which 
would help this and are currently thinking about ways to reward the 
productive documenters as well as coders.

We had a chance to have a 5 day Bootcamp in June thanks to Sylvain 
Foisy, the University of Montreal and the Quebec Bioinformatics Network 
(BioneQ).  We hope to do another one of these in 2006. If there is a 
general interest in more widespread Bioperl tutorials please forward 
them to myself or the bioperl list and we can consider how something 
like this could be organized in conjunction with a conference or 
meeting.


How popular is Bioperl?
The 2002 paper has 60+ citations according to Web of Science and we're 
seeing use in a broader context than just sequence analysis.  At least 
one published paper about modules which were already part of the 
codebase has appeared suggesting software availability and 
collaboration can happen prior to publication.   The website has been 
consistently gets around 300,000 hits per month which isn't bad 
considering that the content doesn't change very much and this is just 
a site for one toolkit for specific aspect of science.  The bioperl-l 
mailing list has seen an average 341 mails per month (not correcting 
for spam) which has seen a lot of questions answered and ideas hashed 
out.


How can you help out?
I want to use this chance to also appeal to those who use Bioperl and 
have been sitting on your hands waiting to jump in.  It is a 
collaborative project that only works if new people jump in an 
contribute ideas and manpower.  We've had many examples of people who 
have just jumped on board the project, fixed some bugs, contributed a 
module and went on their merry way.  We've also had other people who 
have jumped in, contributed code, and found themselves fully engaged in 
the project and its internal workings almost immediately.  Not to wax 
poetic, but it was about 5 years ago that fresh out of college, I 
started reading the mailing list, read Steve Chervitz's email plea for 
people to "ask not what Bioperl can do for you, ask what you can do for 
Bioperl" 
(http://bioperl.org/pipermail/bioperl-l/1999-December/003354.html) and 
just jumped right in.  I can only hope to influence some more folks who 
might have wanted to contribute but were waiting for the invitation.  
Well come on over, we'd love to have you taking part.

   As for some specifics.
   - Parsing of Species information out from the ORGANISM lines in 
SwissProt, GenBank, and EMBL is pretty spotty and could take some work.
   - Some more parsers for formats that people have asked for - a Spidey

parser (NCBI's mRNA -> genomic alignment tool)
   - Work on the Structure modules for dealing with protein structure 
data
   - Integrate new applications into bioperl-run and further cleanup the

existing modules so they are more consistent
   - Volunteer to be the next release master.

What does the future hold for Bioperl?
We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - 
this is the domain of Aaron Mackey who agreed to be the release master 
(who has his hands full right now, but I'm sure will ask for help when 
he needs it).  This should incorporate many new modules and bug fixes 
but be compatible with the 1.4 API  as well.  Details on the schedule 
for 1.5 sometime after the holidays.

The future depends entirely on who steps up to work on the project next 
year.  In 2005, I am resolving to limit myself from the front guard of 
mailing list question answering.  This is in part finish my PhD 
research and focus on building more specific tools to support my 
research questions, but also it is time for other people to contribute 
and share the spotlight and be a know-it-all.  Bioperl is very much a 
labor of love and it is an integral part of the tools I use in my own 
work so I expect to focus more directly on those things I need in the 
coming year and help out where I can.

My hope is that some of the new folks who have stepped up to contribute 
will help by continuing the course we have set to have high quality 
releases, a full test suite, POD documentation for every module, and 
overall documentation for using modules in HOWTOs and tutorials.  If 
there are new or unexplored areas the project should consider I hope 
that you will speak up and suggest them.

There is discussion underfoot that a new Bioperl object model may be 
born.  This has been called Bioperl2 and Bioperl-NG.  The idea is it 
would try and create a leaner and cleaner code base which is does 
things like event-based parsing, autogenerated code for things like 
getters/setters, and could do things faster and easier than we are 
currently.  Generally there is a lot of legacy code and legacy design 
in Bioperl and it would be beneficial to have a project that was free 
of these constraints.  At the same time there is an expectation that a 
project like this would also need to achieve something more than what 
the current bioperl API cannot do so it incumbent on the new project to 
have goals that are higher than what Bioperl can do.


Thank you
I'd like to finally thank some people who have done a lot this year.  
Of course I'm not going to remember to name everyone, but I just wanted 
to highlight some folks who have endeavored not only get the toolkit to 
do what they want, but also to help out other people get started with 
it.

The people who have kept the project going.  These are usual suspects 
how have labored to do the dirty grunt work cleaning up boring bugs, 
adding documentation, preparing a release, keeping the servers going, 
etc.  They also code too, but wanted to highlight that they have really 
been critical to keeping the project going by doing the things that 
most people don't want to bother with.
Brian Osborne
Aaron Mackey
Chris Dagdigian
Kyle Jenson  (mailing list and site searching at 
http://search.open-bio.org)

Some usual suspects who have been helping maintain their modules and 
generally being Bioperl knowledgeable on the list:
Scott Cain
Steve Chervitz
Allen Day
Donald Jackson
Stefan Kirov
Hilmar Lapp
Josh Lauricha
Heikki Lehvaslaiho
Chris Mungall
Jurgen Plentinckx
Lincon Stein

There are new several people who have taken up the slack as those 
before them have drifted onto other commitments. (metaphoric slack of 
course, not trying to accuse anyone of being a 'slacker').  Thanks for 
jumping in, fixing bugs, running tests, giving feedback, and just 
getting involved.  It is really encouraging when the project can be a 
2-way street and not just a one way flow information going out from a 
few people who post answers to the list.
Richard Adams
Sean Davis
Rob Edwards
Nathan Haigh
Marc Logghe
Barry Moore
Remo Sanges
James Thompson
Koen van der Drift (Bioperl available via fink on OS X)

Thanks also to Peter van Heusden and Electric Genetics which are 
undertaking a code audit of Bioperl and should have many helpful 
feedback points for us.

I've probably forgotten some people, please post a followup if I have 
neglected someone as I would like you to be recognized for your work 
since we don't give out a whole lot else right now.

A safe and prosperous New Year to you all.

Jason Stajich on behalf of the Bioperl core developers.
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From tex at biocompute.net  Sat Jan  1 02:32:37 2005
From: tex at biocompute.net (James Thompson)
Date: Sat Jan  1 17:40:43 2005
Subject: [Bioperl-l] Re: Questions about Bio::AlignIO::maf
In-Reply-To: <003001c4ee13$b28a8bb0$7347d90a@imcb.astar.edu.sg>
Message-ID: <Pine.LNX.4.44.0501010205170.11219-100000@biosysadmin.com>

Alison,

You're right on this, I just committed the fix to maf.pm. This also fixed
a range problem in the AlignIO.t test script and the associated humor.maf
test data file, I just committed fixes for those as well.

Thanks for the fix. :)

Cheers,

James Thompson

On Thu, 30 Dec 2004, Lee Ping Alison wrote:

> Hi Mr Thompson,
> 
> Thanks for the reply. I understand the need for the one-based inclusive
> coordinate system now; also partly because the major genome browsers use
> that. However, since you're using inclusive coords, then shouldn't you add 1
> to $start first before calculating $end, since $start is zero-based?
> 
> Alison.
> 
> ----- Original Message -----
> From: "James Thompson" <tex@biocompute.net>
> To: "Lee Ping Alison" <g0404203@nus.edu.sg>
> Cc: "Allen Day" <allenday@ucla.edu>; "Bioperl" <bioperl-l@bioperl.org>
> Sent: Wednesday, December 29, 2004 3:30 PM
> Subject: Re: [Bioperl-l] Re: Questions about Bio::AlignIO::maf
> 
> 
> > Alison (and Allen),
> >
> > I was the aforementioned bug fixer. :)
> >
> > Sorry if there's any confusion on this, but AFAIK Bioperl uses an
> one-based
> > inclusive coordinate system. While maf may have its own opinions on the
> best
> > way to do coordinates, maf is only one of the formats that are supported
> by
> > Bio::AlignIO.  The consensus in Bioperl appears to be that it makes more
> sense
> > to use one consistent coordinate system within all of the modules rather
> than
> > catering to the opinions and idiosyncrasies of all of the possible file
> > formats. If we did not fix the off-by-one bug in maf.pm, then would be
> > consistency issues with Bio::Align::AlignI objects created from different
> file
> > formats.
> >
> > Here's a link to a message from the mailing list that seems relevant to
> the
> > topic at hand:
> >
> > http://bioperl.org/pipermail/bioperl-l/2002-June/008309.html
> >
> > Cheers,
> >
> > James Thompson
> >
> > On Wed, 29 Dec 2004, Lee Ping Alison wrote:
> >
> > > Hi,
> > >
> > > Mr Day, thanks a lot for helping me with my queries.
> > >
> > > I've just obtained the most recent bioperl-live code via cvs with the
> bug
> > > fixes you've mentioned. I'm wondering why the off-by-one bug fix (end =
> > > start+size-1) was necessary. I'm thinking that "end = start+size" is
> correct.
> > > Because the MAF file format by UCSC states that coordinates are
> half-open,
> > > zero-based. And I have understood it as the coordinates in "maf" module
> > > should be (start, end] (start exclusive, end inclusive). I've also tried
> > > several coordinates that agree with UCSC Genome Browser which uses
> [start,
> > > end]. Hence, in my opinion the bug fix was not necessary.
> > >
> > > Will someone please enlighten me on this?
> > >
> > > Thank you very much!
> > >
> > > Alison.
> > >
> > >   ----- Original Message -----
> > >   From: Allen Day
> > >   To: Lee Ping Alison
> > >   Cc: Bioperl
> > >   Sent: 29 December, 2004 3:34 PM
> > >   Subject: Re: Questions about Bio::AlignIO::maf
> > >
> > >
> > >   Hi Alison,
> > >
> > >   I did not add strand information as I didn't need it at the time of
> > >   writing.  However, I believe this has come up on list recently and
> someone
> > >   has already patched in strand support, as well as an off-by-one bug in
> my
> > >   code.  Can whoever did these patches recently pipe in?  Thanks.
> > >
> > >   Alison, please keep the bioperl list CCed in your reply.
> > >
> > >   -Allen
> > >
> > >   On Wed, 29 Dec 2004, Lee Ping Alison wrote:
> > >
> > >   > Dear Mr Day,
> > >   >
> > >   > While reading the Bioperl 1.4 documentation for the
> "Bio::AlignIO::maf" module, I found your email address and I have some
> questions about how to use "maf."
> > >   >
> > >   > Am I right to say that the strand information of each sequence in an
> "maf" file is not recorded, when the LocateableSeq object is created in the
> nextAln() method? I observed that $strand was not one of the arguments in
> the call to the constructor.
> > >   >
> > >   > If yes, what is the reason for not using the strand information? And
> subsequently, if I need to retrieve the strand information, how should I go
> about it?
> > >   >
> > >   > Thank you very much for answering my queries.
> > >   >
> > >   > Best Regards,
> > >   > Alison
> > >   > (Institute of Molecular and Cell Biology, Singapore)
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From rj144331 at bcm.tmc.edu  Sun Jan  2 17:56:06 2005
From: rj144331 at bcm.tmc.edu (rj144331)
Date: Sun Jan  2 17:52:51 2005
Subject: [Bioperl-l] Extract sequences from .msf
Message-ID: <41D671E1@webmail.bcm.tmc.edu>

Hi,
I am a second year graduate student in Baylor College of Medicine, Houston, TX 
majoring in bioinformatics. I would like to know how to extract protein 
sequences and store them in fasta format from a html page containing the 
multiple sequence alignment using perl. Any help would be appreciated.

Thanks,
regards,
Rupashree Jayashankar


From hlapp at gnf.org  Sun Jan  2 19:28:52 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Sun Jan  2 19:25:37 2005
Subject: [Bioperl-l] score in seqfeature
Message-ID: <71B3DC9D-5D1E-11D9-827C-000A959EB4C4@gnf.org>

Allen et al, what are the (GFF3-driven?) plans for storing the score 
property introduced by SeqFeature::Generic?

The reason I'm asking is that it doesn't get (de-)serialized in 
bioperl-db because it's neither defined on SeqFeatureI nor has it been 
internal stored as a tag/value pair. I'd like to fix this issue, either 
by pulling it into the annotation bundle in 
SeqFeature::AnnotationAdapter, or by some other means that maybe is 
friendlier or more useful to GFF3 minds.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From rasa at obj.hopto.org  Mon Jan  3 09:41:10 2005
From: rasa at obj.hopto.org (Rasa Gulbinaite)
Date: Mon Jan  3 09:38:56 2005
Subject: [Bioperl-l] Fasta headers
Message-ID: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128>

Hello,

i'm new to bioperl and a bit confused with fasta file headers. I'm working
with SNPs and i would like to get only the fasta headers form the fasta
file, not the sequences. What would be the best way to do this? Thank you.

Rasa

From venkat at calmail.berkeley.edu  Mon Jan  3 09:23:18 2005
From: venkat at calmail.berkeley.edu (Venky Nandagopal)
Date: Mon Jan  3 09:53:49 2005
Subject: [Bioperl-l] Bio::DB::Fasta errors
Message-ID: <opsj0584cy5by0lc@nirnaetharnoediad>

I've been noticing some errors with Bio::DB::Fasta indices. Working with  
different assemblies of a genome, I've been creating symlinks latest/ to  
the latest assembly directory, and genome.fasta to the latest assembly  
fasta file. When I pass latest/genome.fasta to Bio::DB::Fasta, I get a  
genome.fasta.index file, and retrieval works in my script. But then when I  
run a different analysis on it,  or access the same file after a while, I  
get undef for sequences I know for sure to be in the database. Reindexing  
will fix the problem. I'm not certain if this is simply due to the  
symlinks, or a more general issue with Bio::DB::Fasta. Does anyone have  
suggestions?

Venky


-- 
___
Venky Nandagopal
Graduate Student
Eisen Lab
UC Berkeley
From birney at ebi.ac.uk  Mon Jan  3 09:59:13 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Mon Jan  3 09:55:53 2005
Subject: [Bioperl-l] Fasta headers
In-Reply-To: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128>
Message-ID: <Pine.OSX.4.44.0501031456550.412-100000@ewan-birneys-computer.local>


On Mon, 3 Jan 2005, Rasa Gulbinaite wrote:

> Hello,
>
> i'm new to bioperl and a bit confused with fasta file headers. I'm working
> with SNPs and i would like to get only the fasta headers form the fasta
> file, not the sequences. What would be the best way to do this? Thank you.

The desc() method on a sequence object has this - eg:


$seqin = Bio::SeqIO->new( -file => 'my_filename' , -format => 'fasta');

while( ($seq = $seqin->next_seq()) ) {
   $header = $seq->desc();
}


>
> Rasa
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From sdavis2 at mail.nih.gov  Mon Jan  3 10:27:05 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon Jan  3 10:24:15 2005
Subject: [Bioperl-l] Fasta headers
References: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128>
Message-ID: <000901c4f1a8$ae4ba7d0$7d75f345@WATSON>

Rasa,

You can parse the fasta file with seqio, but if you only want the headers 
"as-is", something like this from the command line might do:

cat fastafile.fa > perl -e 'while (<>) {print "$_\n" if ($_ =~ /^>/)}'

Sorry, I didn't test this....

Sean

----- Original Message ----- 
From: "Rasa Gulbinaite" <rasa@obj.hopto.org>
To: <bioperl-l@portal.open-bio.org>
Sent: Monday, January 03, 2005 9:41 AM
Subject: [Bioperl-l] Fasta headers


> Hello,
>
> i'm new to bioperl and a bit confused with fasta file headers. I'm working
> with SNPs and i would like to get only the fasta headers form the fasta
> file, not the sequences. What would be the best way to do this? Thank you.
>
> Rasa
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From rousse at ccr.jussieu.fr  Mon Jan  3 10:53:38 2005
From: rousse at ccr.jussieu.fr (Guillaume Rousse)
Date: Mon Jan  3 10:50:14 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module 
Message-ID: <41D96A82.8070202@ccr.jussieu.fr>

I'm trying to implement a Bio::TreeIO module for parsing 
Alogrithm::Cluster::treecluster output, and I need some help about the 
tree event builder module. Reading the code, I understand I can add 
elements of type 'tree', 'branch-length', 'id', 'node', and 'leaf', and 
also add characters. However, I don't really understand how it works...

Basically, I know all leaves right from given parameters. Then I parse a 
result table, line by line, each new line being an internal node whose 
length is given. So I guess, the code should be similar to:

$self->_eventHandler->start_document;
$self->_eventHandler->start_element( {'Name' => 'tree'} );

# leaves
foreach my $label (@{$self->{_labels}} {
     $self->_eventHandler->start_element( {'Name' => 'leaf'} );
     $self->_eventHandler->characters($label);
     $self->_eventHandler->end_element( {'Name' => 'leaf'} );
}

# nodes
foreach my $line (@{$self->{_result}} {
     $self->_eventHandler->start_element( {'Name' => 'node'} );
     # this node result from the merge of two already existing leaves or 
nodes with a known distance
     $self->_eventHandler->end_element( {'Name' => 'node'} );
}

$self->_eventHandler->end_element( {'Name' => 'tree'} );
my $tree = $self->_eventHandler->end_document;

Any help appreciated.
-- 
Any circuit design must contain at least one part which is obsolete, two 
parts which are unobtainable and three parts which are still under 
development
		-- Murphy's Laws on Technology n?23
From rousse at ccr.jussieu.fr  Mon Jan  3 10:59:43 2005
From: rousse at ccr.jussieu.fr (Guillaume Rousse)
Date: Mon Jan  3 10:56:22 2005
Subject: [Bioperl-l] Installing bioperl-ext-1.4
In-Reply-To: <1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz>
References: <200412102127.iBALPiKu021926@portal.open-bio.org>
	<1102911185.41bd16d165145@webmail2.ec.auckland.ac.nz>
Message-ID: <41D96BEF.9020601@ccr.jussieu.fr>

bcur001@ec.auckland.ac.nz wrote:
> I am wanting to run code to do smith-waterman alignment. From what I can see, I
> need the EMBOSS suite, which appears to come as part of bioperl-ext-1.4.
> 
> I have installed bioperl-1.4 fine. when I attempt to install bioperl-ext-1.4
> however, I encounter problems. I've worked my way through a few initial errors,
> finding and installing the staden library and the Inline pm (both of which
> appear to ahve installed fine), I have, however, finally been stumped. Upon
> attempting to run `perl Makefile.PL` from the bioperl-ext-1.4/ directory, I get
> the following:
> 
> Writing Makefile for Bio::Ext::Align
> Found Staden io_lib "libread" in /usr/local/lib ...
> Automatically using the Read.h found in /usr/local/include/io_lib ...
> Writing Makefile for Bio::SeqIO::staden::read
> Writing Makefile for Bio
> One or more DATA sections were not processed by Inline.
Sorry, I missed this post.

Unless you have really good reasons to do so, you'd better use official 
contrib packages for EMBOSS, io_lib and bioperl (I'm the maintainer) 
instead of attempting manual installs. EMBOSS is installed with better 
defaults as by default installation script, io_lib is patched for wrong 
headers, and Bioperl has every needed dependencies packaged. Only 
bioperl-ext is missing, because I never succedeed building it due to 
problem in Makemaker::Inline.

Just try:
urpmi emboss perl-Bioperl-Run libio_lib1-devel

To have everything installed with needed dependancies.

(forget my initial private mail, it was send too early)
-- 
A bad dinner with your wife is worth more than a good one in the company 
of your mother-in-law.
		-- A law for married men
From jason.stajich at duke.edu  Mon Jan  3 11:13:34 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan  3 11:10:53 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module 
In-Reply-To: <41D96A82.8070202@ccr.jussieu.fr>
References: <41D96A82.8070202@ccr.jussieu.fr>
Message-ID: <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>

Guillaume -

Ironic - I was just starting to download a bunch of jpackage stuff and 
seeing your name everywhere....  Trying to get FOP working so can try 
and build our docbook HOWTOs on linux.

The thing is you need to build the tree by connecting the nodes, so the 
order they are created in is very important.  You can't just build the 
leaves first and then the (internal) nodes later.  You need to build 
from the top down - if you read a newick format from left to right, 
that is exactly how we are building the tree up using the 
EventListener.

In a way the builder basically assumes you have already have the tree 
built, just encoded.  So you start with a root node, you add children.  
For each child you add more children where appropriate until you get to 
a leaf node and you are done with that recursion.

-jason
On Jan 3, 2005, at 10:53 AM, Guillaume Rousse wrote:

> I'm trying to implement a Bio::TreeIO module for parsing 
> Alogrithm::Cluster::treecluster output, and I need some help about the 
> tree event builder module. Reading the code, I understand I can add 
> elements of type 'tree', 'branch-length', 'id', 'node', and 'leaf', 
> and also add characters. However, I don't really understand how it 
> works...
>
> Basically, I know all leaves right from given parameters. Then I parse 
> a result table, line by line, each new line being an internal node 
> whose length is given. So I guess, the code should be similar to:
>
> $self->_eventHandler->start_document;
> $self->_eventHandler->start_element( {'Name' => 'tree'} );
>
> # leaves
> foreach my $label (@{$self->{_labels}} {
>     $self->_eventHandler->start_element( {'Name' => 'leaf'} );
>     $self->_eventHandler->characters($label);
>     $self->_eventHandler->end_element( {'Name' => 'leaf'} );
> }
>
> # nodes
> foreach my $line (@{$self->{_result}} {
>     $self->_eventHandler->start_element( {'Name' => 'node'} );
>     # this node result from the merge of two already existing leaves 
> or nodes with a known distance
>     $self->_eventHandler->end_element( {'Name' => 'node'} );
> }
>
> $self->_eventHandler->end_element( {'Name' => 'tree'} );
> my $tree = $self->_eventHandler->end_document;
>
> Any help appreciated.
> -- 
> Any circuit design must contain at least one part which is obsolete, 
> two parts which are unobtainable and three parts which are still under 
> development
> 		-- Murphy's Laws on Technology n?23
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From cain at cshl.org  Mon Jan  3 11:34:17 2005
From: cain at cshl.org (Scott Cain)
Date: Mon Jan  3 11:31:14 2005
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 21, Issue 1
In-Reply-To: <200501031551.j03FpdKs019038@portal.open-bio.org>
References: <200501031551.j03FpdKs019038@portal.open-bio.org>
Message-ID: <1104770057.3258.27.camel@localhost.localdomain>

Hi Hilmar,

SeqFeature::Annotated (which is what FeatureIO::gff is using) has a
score method that stores the score as a Annotation::SimpleValue, which
means it is used like this:

  my $score = $feature->score->value;

which seems to work well for GFF3 (since scores are by definition a
single value in GFF3).

Scott


On Mon, 2005-01-03 at 10:51 -0500, bioperl-l-request@portal.open-bio.org
wrote:
> Date: Sun, 2 Jan 2005 16:28:52 -0800
> From: Hilmar Lapp <hlapp@gnf.org>
> Subject: [Bioperl-l] score in seqfeature
> To: Allen Day <allenday@ucla.edu>, Bioperl <bioperl-l@bioperl.org>
> Message-ID: <71B3DC9D-5D1E-11D9-827C-000A959EB4C4@gnf.org>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
> 
> Allen et al, what are the (GFF3-driven?) plans for storing the score 
> property introduced by SeqFeature::Generic?
> 
> The reason I'm asking is that it doesn't get (de-)serialized in 
> bioperl-db because it's neither defined on SeqFeatureI nor has it been 
> internal stored as a tag/value pair. I'd like to fix this issue, either 
> by pulling it into the annotation bundle in 
> SeqFeature::AnnotationAdapter, or by some other means that maybe is 
> friendlier or more useful to GFF3 minds.
> 
> 	-hilmar
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From rousse at ccr.jussieu.fr  Mon Jan  3 11:38:28 2005
From: rousse at ccr.jussieu.fr (Guillaume Rousse)
Date: Mon Jan  3 11:35:05 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module
In-Reply-To: <6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>
References: <41D96A82.8070202@ccr.jussieu.fr>
	<6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>
Message-ID: <41D97504.7020808@ccr.jussieu.fr>

Jason Stajich wrote:
> Guillaume -
> 
> Ironic - I was just starting to download a bunch of jpackage stuff and 
> seeing your name everywhere....  Trying to get FOP working so can try 
> and build our docbook HOWTOs on linux.
Funny :)
Actually, I'm not involved anymore in jpackage project, I left Java 
since I discovered perl two years ago. If you need help, you'd better 
ask on the mailing lists, even if they are currently down due to 
migration problems on the boxes hosting the project.

> The thing is you need to build the tree by connecting the nodes, so the 
> order they are created in is very important.  You can't just build the 
> leaves first and then the (internal) nodes later.  You need to build 
> from the top down - if you read a newick format from left to right, that 
> is exactly how we are building the tree up using the EventListener.
> 
> In a way the builder basically assumes you have already have the tree 
> built, just encoded.  So you start with a root node, you add children.  
> For each child you add more children where appropriate until you get to 
> a leaf node and you are done with that recursion.
OK, thanks for the explanations. However, I don't understand how to add 
branch length informations. I guess leave labels are just introduced 
using characters() method, right ?
-- 
You aren't Superman
		-- Murphy's Bush Fire Brigade Laws n?22
From brian_osborne at cognia.com  Mon Jan  3 11:44:39 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Mon Jan  3 11:41:26 2005
Subject: [Bioperl-l] Fasta headers
In-Reply-To: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128>
Message-ID: <GAEDKMGOKFBLJPKCLKCCCELEEGAA.brian_osborne@cognia.com>

Rasa,

On Unix:

>grep '>' fasta-file

Anywhere, with Perl:

>perl -ne 'print if /^>/' fasta-file


Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Rasa
Gulbinaite
Sent: Monday, January 03, 2005 9:41 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Fasta headers


Hello,

i'm new to bioperl and a bit confused with fasta file headers. I'm working
with SNPs and i would like to get only the fasta headers form the fasta
file, not the sequences. What would be the best way to do this? Thank you.

Rasa

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From davidg at lsi.upc.edu  Mon Jan  3 12:00:21 2005
From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=)
Date: Mon Jan  3 11:57:09 2005
Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format.
Message-ID: <001201c4f1b5$b9b46b40$cf1e5393@Davidg>

Hello.

I have the "nr" database in FASTA format (downloaded from NCBI website), and i want to retrieve the accession number of each sequence in that database, so I do the following:

my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');
 
 while (my $seq = $seqsfich->next_seq()) {  
    print STDOUT "Sequence accession number: ", $seq->accession, "\n";
   }

But the results I get are:

Sequence accession number: unknown
Sequence accession number: unknown
Sequence accession number: unknown
Sequence accession number: unknown
etc...

Here you can see a fragment of the "nr.fa" file
:
>gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser baerii]
MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIYSGGSSTYYA
QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMESCCLSDISGP
VATGCLATGFCLPPRPSRGLINLEKL
>gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser baerii]
MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIISAGGSTYYAP
SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFPLMQACCSVD
VTGPSATGCLATEF
>gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser baerii]
MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTISYSVNAYYAQ
SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQACCSVDVTGPS
ATGCLATEF

I suppose the accession numbers are: CAA73704.1, CAA73709.1, CAA73712.1|, etc... (??)
The thing is, how can I do for Bioperl to parse and recognize them?

Thanks in advance.

--
David Garc?a Cort?s
Instituto Nacional de Bioinform?tica (INB)
Nodo Computacional GNHC-2 UPC-CIRI
c/. Jordi Girona 1-3              
Modul C6-E201                   Tel.  : 934 011 650
E-08034 Barcelona               Fax   : 934 017 014
Catalunya (Spain)               e-mail: davidg@lsi.upc.edu


From jason.stajich at duke.edu  Mon Jan  3 12:01:21 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan  3 11:57:58 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module
In-Reply-To: <41D97504.7020808@ccr.jussieu.fr>
References: <41D96A82.8070202@ccr.jussieu.fr>
	<6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>
	<41D97504.7020808@ccr.jussieu.fr>
Message-ID: <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu>


On Jan 3, 2005, at 11:38 AM, Guillaume Rousse wrote:
>> The thing is you need to build the tree by connecting the nodes, so 
>> the order they are created in is very important.  You can't just 
>> build the leaves first and then the (internal) nodes later.  You need 
>> to build from the top down - if you read a newick format from left to 
>> right, that is exactly how we are building the tree up using the 
>> EventListener.
>> In a way the builder basically assumes you have already have the tree 
>> built, just encoded.  So you start with a root node, you add 
>> children.  For each child you add more children where appropriate 
>> until you get to a leaf node and you are done with that recursion.
> OK, thanks for the explanations. However, I don't understand how to 
> add branch length informations. I guess leave labels are just 
> introduced using characters() method, right ?

This would set a branch length for a node. The 'leaf' event is sort of 
a hack - I can't remember why I had to introduce it - I think to deal 
with the labeled internal nodes.

So to build a leaf node with branch_length $branch_length and name 
$idstring you want to do:
# leaf node
$self->_eventHandler->start_element({'Name' => 'node'});
$self->_eventHandler->start_element( { 'Name' => 'branch_length'});
$self->_eventHandler->characters($branch_length);
$self->_eventHandler->end_element( {'Name' => 'branch_length'});
$self->_eventHandler->start_element( { 'Name' => 'id'});
$self->_eventHandler->characters($idstring);
$self->_eventHandler->end_element( {'Name' => 'id'});
$self->_eventHandler->start_element({'Name' => 'leaf'});
$self->_eventHandler->characters(1);
$self->_eventHandler->end_element({'Name' => 'leaf'});
$self->_eventHandler->end_element({'Name' => 'node'});

To build an internal node which has a branch length but no label for 
example:
# Internal Node
$self->_eventHandler->start_element({'Name' => 'node'});
$self->_eventHandler->start_element( { 'Name' => 'branch_length'});
$self->_eventHandler->characters($branch_length);
$self->_eventHandler->end_element( {'Name' => 'branch_length'});
$self->_eventHandler->start_element({'Name' => 'leaf'});
$self->_eventHandler->characters(0);
$self->_eventHandler->end_element({'Name' => 'leaf'});
$self->_eventHandler->end_element({'Name' => 'node'});

See the 'characters' function in Bio;:TreeIO::TreeEventHandler for the 
different field names and event labels that can be used.

If you want to build a node with two leaves, first you have to start 
with a 'tree' section to tell the handler that this is nested data.
Start a 'tree' event, build the node (like the section just above), 
then build two leaf nodes (like the leaf node section above), then end 
the 'tree' event.  'tree' is an unfortunate name for the event but 
don't feel like changing it - a throwback from when I thought I'd only 
need an initial 'tree' an just 'node' events.

$self->_eventHandler->start_document;
$self->_eventHandler->start_element({'Name' => 'tree'});
# do internal node
   # do leaf node
   # do leaf node
$self->_eventHandler->end_element({'Name' => 'tree'});
return $self->_eventHandler->end_document;


Hmm - I guess I need to go back and document the event system here and 
in SearchIO if people are going to develop with it.


> -- 
> You aren't Superman
> 		-- Murphy's Bush Fire Brigade Laws n?22
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gmx.net  Mon Jan  3 12:18:31 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Jan  3 12:15:36 2005
Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format.
In-Reply-To: <001201c4f1b5$b9b46b40$cf1e5393@Davidg>
Message-ID: <7D7AC00B-5DAB-11D9-8820-000A959EB4C4@gmx.net>

The FASTA parser only sets display_id. It doesn't set the accession  
number, and it doesn't set primary_id either. IMO, this is the correct  
behaviour, because the identifier in FASTA headers can come in all  
sorts of formats.

If what you want is to print the identifier part of the description  
line, print $seq->display_id(). If what you want is to extract the  
accession number, then parse it out from what display_id returns, using  
the format you expect it to be in.

	-hilmar

(BTW technically, CAA73704.1 is not the accession - CAA73704 is and 1  
is the version; just to illustrate)

On Monday, January 3, 2005, at 09:00  AM, David Garc?a Cort?s wrote:

> Hello.
>
> I have the "nr" database in FASTA format (downloaded from NCBI  
> website), and i want to retrieve the accession number of each sequence  
> in that database, so I do the following:
>
> my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');
>
>  while (my $seq = $seqsfich->next_seq()) {
>     print STDOUT "Sequence accession number: ", $seq->accession, "\n";
>    }
>
> But the results I get are:
>
> Sequence accession number: unknown
> Sequence accession number: unknown
> Sequence accession number: unknown
> Sequence accession number: unknown
> etc...
>
> Here you can see a fragment of the "nr.fa" file
> :
>> gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser  
>> baerii]
> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIY 
> SGGSSTYYA
> QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMES 
> CCLSDISGP
> VATGCLATGFCLPPRPSRGLINLEKL
>> gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser  
>> baerii]
> MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIIS 
> AGGSTYYAP
> SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFP 
> LMQACCSVD
> VTGPSATGCLATEF
>> gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser  
>> baerii]
> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTIS 
> YSVNAYYAQ
> SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQAC 
> CSVDVTGPS
> ATGCLATEF
>
> I suppose the accession numbers are: CAA73704.1, CAA73709.1,  
> CAA73712.1|, etc... (??)
> The thing is, how can I do for Bioperl to parse and recognize them?
>
> Thanks in advance.
>
> --
> David Garc?a Cort?s
> Instituto Nacional de Bioinform?tica (INB)
> Nodo Computacional GNHC-2 UPC-CIRI
> c/. Jordi Girona 1-3
> Modul C6-E201                   Tel.  : 934 011 650
> E-08034 Barcelona               Fax   : 934 017 014
> Catalunya (Spain)               e-mail: davidg@lsi.upc.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From jason.stajich at duke.edu  Mon Jan  3 12:18:45 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan  3 12:16:19 2005
Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format.
In-Reply-To: <001201c4f1b5$b9b46b40$cf1e5393@Davidg>
References: <001201c4f1b5$b9b46b40$cf1e5393@Davidg>
Message-ID: <861D8C94-5DAB-11D9-B9B4-000393C44276@duke.edu>

I though someone was going to centralize this function at some point.   
Right now there is a _get_accession_version function in  
Bio::SearchIO::blast.  Perhaps someone would care to make a utility  
module which can export a bunch of useful functions like this?

my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');

  while (my $seq = $seqsfich->next_seq()) {
	my ($acc,$ver) = &_get_accession_version($seq->display_id)
	$seq->accession_number($acc);
	$seq->version($ver);
          print STDOUT "Sequence accession number: ",  
$seq->accession_number, "\n";
  }

sub _get_accession_version {
     my $id = shift;

     # handle case when this is accidently called as a class method
     if( ref($id) && $id->isa('Bio::SearchIO') ) {
         $id = shift;
     }
     return undef unless defined $id;
     my ($acc, $version);
     if ($id =~ /(gb|emb|dbj|sp|pdb|bbs|ref|lcl)\|(.*)\|(.*)/) {
         ($acc, $version) = split /\./, $2;
     } elsif ($id =~ /(pir|prf|pat|gnl)\|(.*)\|(.*)/) {
         ($acc, $version) = split /\./, $3;
     } else {
         #punt, not matching the db's at  
ftp://ftp.ncbi.nih.gov/blast/db/README
         #Database Name                     Identifier Syntax
         #============================      ========================
         #GenBank                           gb|accession|locus
         #EMBL Data Library                 emb|accession|locus
         #DDBJ, DNA Database of Japan       dbj|accession|locus
         #NBRF PIR                          pir||entry
         #Protein Research Foundation       prf||name
         #SWISS-PROT                        sp|accession|entry name
         #Brookhaven Protein Data Bank      pdb|entry|chain
         #Patents                           pat|country|number
         #GenInfo Backbone Id               bbs|number
         #General database identifier           gnl|database|identifier
         #NCBI Reference Sequence           ref|accession|locus
         #Local Sequence identifier         lcl|identifier
         $acc=$id;
     }
     return ($acc,$version);
}

On Jan 3, 2005, at 12:00 PM, David Garc?a Cort?s wrote:

> Hello.
>
> I have the "nr" database in FASTA format (downloaded from NCBI  
> website), and i want to retrieve the accession number of each sequence  
> in that database, so I do the following:
>
> my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');
>
>  while (my $seq = $seqsfich->next_seq()) {
>     print STDOUT "Sequence accession number: ", $seq->accession, "\n";
>    }
>
> But the results I get are:
>
> Sequence accession number: unknown
> Sequence accession number: unknown
> Sequence accession number: unknown
> Sequence accession number: unknown
> etc...
>
> Here you can see a fragment of the "nr.fa" file
> :
>> gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser  
>> baerii]
> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIY 
> SGGSSTYYA
> QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMES 
> CCLSDISGP
> VATGCLATGFCLPPRPSRGLINLEKL
>> gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser  
>> baerii]
> MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIIS 
> AGGSTYYAP
> SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFP 
> LMQACCSVD
> VTGPSATGCLATEF
>> gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser  
>> baerii]
> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTIS 
> YSVNAYYAQ
> SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQAC 
> CSVDVTGPS
> ATGCLATEF
>
> I suppose the accession numbers are: CAA73704.1, CAA73709.1,  
> CAA73712.1|, etc... (??)
> The thing is, how can I do for Bioperl to parse and recognize them?
>
> Thanks in advance.
>
> --
> David Garc?a Cort?s
> Instituto Nacional de Bioinform?tica (INB)
> Nodo Computacional GNHC-2 UPC-CIRI
> c/. Jordi Girona 1-3
> Modul C6-E201                   Tel.  : 934 011 650
> E-08034 Barcelona               Fax   : 934 017 014
> Catalunya (Spain)               e-mail: davidg@lsi.upc.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From hlapp at gmx.net  Mon Jan  3 12:20:33 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Jan  3 12:17:18 2005
Subject: [Bioperl-l] Re: score in seqfeature
In-Reply-To: <1104770057.3258.27.camel@localhost.localdomain>
Message-ID: <C6A5484F-5DAB-11D9-8820-000A959EB4C4@gmx.net>

So, what does this mean for SeqFeature::Generic's future? Any input or  
comments, e.g. from the people who've had opinions on that before?

	-hilmar

BTW the digest title as email subject sucks. Sorry.

On Monday, January 3, 2005, at 08:34  AM, Scott Cain wrote:

> Hi Hilmar,
>
> SeqFeature::Annotated (which is what FeatureIO::gff is using) has a
> score method that stores the score as a Annotation::SimpleValue, which
> means it is used like this:
>
>   my $score = $feature->score->value;
>
> which seems to work well for GFF3 (since scores are by definition a
> single value in GFF3).
>
> Scott
>
>
> On Mon, 2005-01-03 at 10:51 -0500,  
> bioperl-l-request@portal.open-bio.org
> wrote:
>> Date: Sun, 2 Jan 2005 16:28:52 -0800
>> From: Hilmar Lapp <hlapp@gnf.org>
>> Subject: [Bioperl-l] score in seqfeature
>> To: Allen Day <allenday@ucla.edu>, Bioperl <bioperl-l@bioperl.org>
>> Message-ID: <71B3DC9D-5D1E-11D9-827C-000A959EB4C4@gnf.org>
>> Content-Type: text/plain; charset=US-ASCII; format=flowed
>>
>> Allen et al, what are the (GFF3-driven?) plans for storing the score
>> property introduced by SeqFeature::Generic?
>>
>> The reason I'm asking is that it doesn't get (de-)serialized in
>> bioperl-db because it's neither defined on SeqFeatureI nor has it been
>> internal stored as a tag/value pair. I'd like to fix this issue,  
>> either
>> by pulling it into the annotation bundle in
>> SeqFeature::AnnotationAdapter, or by some other means that maybe is
>> friendlier or more useful to GFF3 minds.
>>
>> 	-hilmar
>> --  
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>
> --  
> ----------------------------------------------------------------------- 
> -
> Scott Cain, Ph. D.                                          
> cain@cshl.org
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From davidg at lsi.upc.edu  Mon Jan  3 12:59:14 2005
From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=)
Date: Mon Jan  3 12:55:54 2005
Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format.
References: <001201c4f1b5$b9b46b40$cf1e5393@Davidg>
	<861D8C94-5DAB-11D9-B9B4-000393C44276@duke.edu>
Message-ID: <003401c4f1bd$f2e603d0$cf1e5393@Davidg>

Thank you very much. Now it works!!!! :-)

----- Original Message ----- 
From: "Jason Stajich" <jason.stajich@duke.edu>
To: "David Garc?a Cort?s" <davidg@lsi.upc.edu>
Cc: <bioperl-l@bioperl.org>
Sent: Monday, January 03, 2005 6:18 PM
Subject: Re: [Bioperl-l] Problems parsing Accesion number in FASTA format.


>I though someone was going to centralize this function at some point. 
>Right now there is a _get_accession_version function in 
>Bio::SearchIO::blast.  Perhaps someone would care to make a utility  module 
>which can export a bunch of useful functions like this?
>
> my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');
>
>  while (my $seq = $seqsfich->next_seq()) {
> my ($acc,$ver) = &_get_accession_version($seq->display_id)
> $seq->accession_number($acc);
> $seq->version($ver);
>          print STDOUT "Sequence accession number: ", 
> $seq->accession_number, "\n";
>  }
>
> sub _get_accession_version {
>     my $id = shift;
>
>     # handle case when this is accidently called as a class method
>     if( ref($id) && $id->isa('Bio::SearchIO') ) {
>         $id = shift;
>     }
>     return undef unless defined $id;
>     my ($acc, $version);
>     if ($id =~ /(gb|emb|dbj|sp|pdb|bbs|ref|lcl)\|(.*)\|(.*)/) {
>         ($acc, $version) = split /\./, $2;
>     } elsif ($id =~ /(pir|prf|pat|gnl)\|(.*)\|(.*)/) {
>         ($acc, $version) = split /\./, $3;
>     } else {
>         #punt, not matching the db's at 
> ftp://ftp.ncbi.nih.gov/blast/db/README
>         #Database Name                     Identifier Syntax
>         #============================      ========================
>         #GenBank                           gb|accession|locus
>         #EMBL Data Library                 emb|accession|locus
>         #DDBJ, DNA Database of Japan       dbj|accession|locus
>         #NBRF PIR                          pir||entry
>         #Protein Research Foundation       prf||name
>         #SWISS-PROT                        sp|accession|entry name
>         #Brookhaven Protein Data Bank      pdb|entry|chain
>         #Patents                           pat|country|number
>         #GenInfo Backbone Id               bbs|number
>         #General database identifier           gnl|database|identifier
>         #NCBI Reference Sequence           ref|accession|locus
>         #Local Sequence identifier         lcl|identifier
>         $acc=$id;
>     }
>     return ($acc,$version);
> }
>
> On Jan 3, 2005, at 12:00 PM, David Garc?a Cort?s wrote:
>
>> Hello.
>>
>> I have the "nr" database in FASTA format (downloaded from NCBI  website), 
>> and i want to retrieve the accession number of each sequence  in that 
>> database, so I do the following:
>>
>> my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');
>>
>>  while (my $seq = $seqsfich->next_seq()) {
>>     print STDOUT "Sequence accession number: ", $seq->accession, "\n";
>>    }
>>
>> But the results I get are:
>>
>> Sequence accession number: unknown
>> Sequence accession number: unknown
>> Sequence accession number: unknown
>> Sequence accession number: unknown
>> etc...
>>
>> Here you can see a fragment of the "nr.fa" file
>> :
>>> gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser 
>>> baerii]
>> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIY 
>> SGGSSTYYA
>> QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMES 
>> CCLSDISGP
>> VATGCLATGFCLPPRPSRGLINLEKL
>>> gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser 
>>> baerii]
>> MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIIS 
>> AGGSTYYAP
>> SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFP 
>> LMQACCSVD
>> VTGPSATGCLATEF
>>> gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser 
>>> baerii]
>> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTIS 
>> YSVNAYYAQ
>> SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQAC 
>> CSVDVTGPS
>> ATGCLATEF
>>
>> I suppose the accession numbers are: CAA73704.1, CAA73709.1, 
>> CAA73712.1|, etc... (??)
>> The thing is, how can I do for Bioperl to parse and recognize them?
>>
>> Thanks in advance.
>>
>> --
>> David Garc?a Cort?s
>> Instituto Nacional de Bioinform?tica (INB)
>> Nodo Computacional GNHC-2 UPC-CIRI
>> c/. Jordi Girona 1-3
>> Modul C6-E201                   Tel.  : 934 011 650
>> E-08034 Barcelona               Fax   : 934 017 014
>> Catalunya (Spain)               e-mail: davidg@lsi.upc.edu
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
> 


From qfdong at iastate.edu  Mon Jan  3 13:08:26 2005
From: qfdong at iastate.edu (Qunfeng)
Date: Mon Jan  3 13:08:27 2005
Subject: [Bioperl-l] parse long organism name
In-Reply-To: <AA0AC734-5046-11D9-AFFD-000A959EB4C4@gmx.net>
References: <6.1.2.0.2.20041216165914.039e5758@qfdong.mail.iastate.edu>
	<AA0AC734-5046-11D9-AFFD-000A959EB4C4@gmx.net>
Message-ID: <6.1.2.0.2.20050103113653.03ab1020@qfdong.mail.iastate.edu>

I didn't get any error msg.

When I parse the organism name with the following methods:

my $organism = $seq_object->species->binomial();
my $species = $seq_object->species->species();
my $genus = $seq_object->species->genus();
my $common_name = $seq_object->species->common_name();

I got the following value

$organism as Paphiopedilum 'Dark
$species as   Paphiopedilum
$genus as 'Dark
$common_name as  Paphiopedilum 'Dark Roller' x Paphiopedilum 
rothschildianum

So, the common_name is correct, while binmial(), species(), and genus() all 
assume that the name is in CORRECT species, genus form.

Qunfeng

At 10:14 AM 12/17/2004, Hilmar Lapp wrote:
>What's the error that you get, if any?
>
>         -hilmar
>
>On Thursday, December 16, 2004, at 03:00  PM, Qunfeng wrote:
>
>>For example,
>>http://www.ncbi.nlm.nih.gov/entrez/ viewer.fcgi?db=nucleotide&val=47776109
>>
>>It has a LONG name:
>><http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/ 
>>wwwtax.cgi?id=232838>Paphiopedilum 'Dark Roller' x Paphiopedilum
>>rothschildianum
>>
>>Is there anyway in Bioperl to parse out that long name from GenBank
>>format file?
>>
>>Thanks!
>>
>>Qunfeng _______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>--
>-------------------------------------------------------------

From brian_osborne at cognia.com  Mon Jan  3 13:11:05 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Mon Jan  3 13:08:29 2005
Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format.
In-Reply-To: <001201c4f1b5$b9b46b40$cf1e5393@Davidg>
Message-ID: <GAEDKMGOKFBLJPKCLKCCKELJEGAA.brian_osborne@cognia.com>

David,

The information you need is returned by the display_id() and desc() methods.
display_id() will return >(\S+), and desc() returns >\S+\s+(.+).


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of David Garc?a
Cort?s
Sent: Monday, January 03, 2005 12:00 PM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] Problems parsing Accesion number in FASTA format.


Hello.

I have the "nr" database in FASTA format (downloaded from NCBI website), and
i want to retrieve the accession number of each sequence in that database,
so I do the following:

my $seqsfich  = Bio::SeqIO->new(-file=>"nr.fa", '-format' => 'Fasta');

 while (my $seq = $seqsfich->next_seq()) {
    print STDOUT "Sequence accession number: ", $seq->accession, "\n";
   }

But the results I get are:

Sequence accession number: unknown
Sequence accession number: unknown
Sequence accession number: unknown
Sequence accession number: unknown
etc...

Here you can see a fragment of the "nr.fa" file
:
>gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain [Acipenser baerii]
MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIYSGGSS
TYYA
QSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMESCCLSD
ISGP
VATGCLATGFCLPPRPSRGLINLEKL
>gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain [Acipenser baerii]
MGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQTPGKGLEWVSIISAGGST
YYAP
SVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMITVTSATPSPPTVFPLMQAC
CSVD
VTGPSATGCLATEF
>gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain [Acipenser baerii]
MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQAPGKGLEWVSTISYSVNA
YYAQ
SVQGRFTISRDDSNSMLYLQMNSLKTEDSAVYYCARESNFNRFDYWGSGTMVTVTNATPSPPTVFPLMQACCSVDV
TGPS
ATGCLATEF

I suppose the accession numbers are: CAA73704.1, CAA73709.1, CAA73712.1|,
etc... (??)
The thing is, how can I do for Bioperl to parse and recognize them?

Thanks in advance.

--
David Garc?a Cort?s
Instituto Nacional de Bioinform?tica (INB)
Nodo Computacional GNHC-2 UPC-CIRI
c/. Jordi Girona 1-3
Modul C6-E201                   Tel.  : 934 011 650
E-08034 Barcelona               Fax   : 934 017 014
Catalunya (Spain)               e-mail: davidg@lsi.upc.edu


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Mon Jan  3 13:41:35 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Jan  3 13:38:22 2005
Subject: [Bioperl-l] parse long organism name
In-Reply-To: <6.1.2.0.2.20050103113653.03ab1020@qfdong.mail.iastate.edu>
Message-ID: <186AB058-5DB7-11D9-8BB0-000A959EB4C4@gmx.net>

To be honest I'm not even sure what binomial is supposed to return 
here. The problem originates from the fact that binomial, species, and 
genus won't store their values redundantly but rather access the 
classification array (kingdom->order->blah->foo etc) at the expected 
locations. common_name on the contrary does store it's value itself.

I don't feel I'm suited to take this on. If anybody else does please 
don't hesitate to come forward. My gut reaction would be to push more 
towards using the taxonomy classes by Jason et al over the Bio::Species 
class. I'd hope that model would be able to handle such weirdnesses 
better.

	-hilmar

On Monday, January 3, 2005, at 10:08  AM, Qunfeng wrote:

> I didn't get any error msg.
>
> When I parse the organism name with the following methods:
>
> my $organism = $seq_object->species->binomial();
> my $species = $seq_object->species->species();
> my $genus = $seq_object->species->genus();
> my $common_name = $seq_object->species->common_name();
>
> I got the following value
>
> $organism as Paphiopedilum 'Dark
> $species as   Paphiopedilum
> $genus as 'Dark
> $common_name as  Paphiopedilum 'Dark Roller' x Paphiopedilum 
> rothschildianum
>
> So, the common_name is correct, while binmial(), species(), and 
> genus() all assume that the name is in CORRECT species, genus form.
>
> Qunfeng
>
> At 10:14 AM 12/17/2004, Hilmar Lapp wrote:
>> What's the error that you get, if any?
>>
>>         -hilmar
>>
>> On Thursday, December 16, 2004, at 03:00  PM, Qunfeng wrote:
>>
>>> For example,
>>> http://www.ncbi.nlm.nih.gov/entrez/ 
>>> viewer.fcgi?db=nucleotide&val=47776109
>>>
>>> It has a LONG name:
>>> <http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/ 
>>> wwwtax.cgi?id=232838>Paphiopedilum 'Dark Roller' x Paphiopedilum
>>> rothschildianum
>>>
>>> Is there anyway in Bioperl to parse out that long name from GenBank
>>> format file?
>>>
>>> Thanks!
>>>
>>> Qunfeng _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> --
>> -------------------------------------------------------------
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From golharam at umdnj.edu  Mon Jan  3 10:09:28 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Mon Jan  3 14:14:29 2005
Subject: [Bioperl-l] Bioperl in 2005
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCIEKEEGAA.brian_osborne@cognia.com>
Message-ID: <000901c4f1a6$38dcb4f0$3400a8c0@GOLHARMOBILE1>

Good idea...

I've attached two files to this message:  Results.pm and Exon.pm.  They
belong in Bio::Tools::Spidey.  If the attachments don't come through,
I'll paste their contents in a message then...

They are used to parse the output of Spidey and work essentially in the
same manner that Bio::Tools::Sim4 works.

Ryan

-----Original Message-----
From: Brian Osborne [mailto:brian_osborne@cognia.com] 
Sent: Saturday, January 01, 2005 11:18 AM
To: golharam@umdnj.edu; 'Jason Stajich'; 'Bioperl List'
Subject: RE: [Bioperl-l] Bioperl in 2005


Ryan,

You could post it to bioperl-l, some one will commit it to CVS.

Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ryan Golhar
Sent: Thursday, December 30, 2004 12:33 PM
To: 'Jason Stajich'; 'Bioperl List'
Subject: RE: [Bioperl-l] Bioperl in 2005


Hi all,

I'd like to contribute a parser module to parse Spidey results.  I took
the sim4 parser and modified a little bit to properly read in spidey
results.  Everything else about it works the same as the sim4 parser as
far as I can tell.

How can I contribute this module?


-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam@umdnj.edu

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason
Stajich
Sent: Wednesday, December 29, 2004 5:46 PM
To: Bioperl List; bioperl-announce-l@bioperl.org
Subject: [Bioperl-l] Bioperl in 2005


I just wanted to use the end of the year as a chance to reflect on what 
we've accomplished in 2004 and think about what 2005 holds for Bioperl.

What happened in 2004?
First of all, this year has been really has been productive at a level 
perhaps only appreciated by the folks who read the bioperl-guts-l list 
which lists the CVS commits.  New modules, bugfixes and code 
improvements have been steadily making their way into the codebase.  
Not only has there been lots of traffic, but more people are 
contributing code and fixes.

We have also seen increased contributions to the HOWTOs which we hope 
will be an effective place to explain how to use sets of modules to 
complete a particular task.  We are continually working to improve the 
documentation.  This is a balance between a developer trying to get 
something accomplished for their own research and wanting other people 
to use their code (and not wanting to field lots of emails about a 
particular module).  Open source software written solely by volunteers
suffers from a reward system which values code over 
documentation and writing tutorials.  We welcome ideas on changes which 
would help this and are currently thinking about ways to reward the 
productive documenters as well as coders.

We had a chance to have a 5 day Bootcamp in June thanks to Sylvain 
Foisy, the University of Montreal and the Quebec Bioinformatics Network 
(BioneQ).  We hope to do another one of these in 2006. If there is a 
general interest in more widespread Bioperl tutorials please forward 
them to myself or the bioperl list and we can consider how something 
like this could be organized in conjunction with a conference or 
meeting.


How popular is Bioperl?
The 2002 paper has 60+ citations according to Web of Science and we're 
seeing use in a broader context than just sequence analysis.  At least 
one published paper about modules which were already part of the 
codebase has appeared suggesting software availability and 
collaboration can happen prior to publication.   The website has been 
consistently gets around 300,000 hits per month which isn't bad 
considering that the content doesn't change very much and this is just 
a site for one toolkit for specific aspect of science.  The bioperl-l 
mailing list has seen an average 341 mails per month (not correcting 
for spam) which has seen a lot of questions answered and ideas hashed 
out.


How can you help out?
I want to use this chance to also appeal to those who use Bioperl and 
have been sitting on your hands waiting to jump in.  It is a 
collaborative project that only works if new people jump in an 
contribute ideas and manpower.  We've had many examples of people who 
have just jumped on board the project, fixed some bugs, contributed a 
module and went on their merry way.  We've also had other people who 
have jumped in, contributed code, and found themselves fully engaged in 
the project and its internal workings almost immediately.  Not to wax 
poetic, but it was about 5 years ago that fresh out of college, I 
started reading the mailing list, read Steve Chervitz's email plea for 
people to "ask not what Bioperl can do for you, ask what you can do for 
Bioperl" 
(http://bioperl.org/pipermail/bioperl-l/1999-December/003354.html) and 
just jumped right in.  I can only hope to influence some more folks who 
might have wanted to contribute but were waiting for the invitation.  
Well come on over, we'd love to have you taking part.

   As for some specifics.
   - Parsing of Species information out from the ORGANISM lines in 
SwissProt, GenBank, and EMBL is pretty spotty and could take some work.
   - Some more parsers for formats that people have asked for - a Spidey

parser (NCBI's mRNA -> genomic alignment tool)
   - Work on the Structure modules for dealing with protein structure 
data
   - Integrate new applications into bioperl-run and further cleanup the

existing modules so they are more consistent
   - Volunteer to be the next release master.

What does the future hold for Bioperl?
We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - 
this is the domain of Aaron Mackey who agreed to be the release master 
(who has his hands full right now, but I'm sure will ask for help when 
he needs it).  This should incorporate many new modules and bug fixes 
but be compatible with the 1.4 API  as well.  Details on the schedule 
for 1.5 sometime after the holidays.

The future depends entirely on who steps up to work on the project next 
year.  In 2005, I am resolving to limit myself from the front guard of 
mailing list question answering.  This is in part finish my PhD 
research and focus on building more specific tools to support my 
research questions, but also it is time for other people to contribute 
and share the spotlight and be a know-it-all.  Bioperl is very much a 
labor of love and it is an integral part of the tools I use in my own 
work so I expect to focus more directly on those things I need in the 
coming year and help out where I can.

My hope is that some of the new folks who have stepped up to contribute 
will help by continuing the course we have set to have high quality 
releases, a full test suite, POD documentation for every module, and 
overall documentation for using modules in HOWTOs and tutorials.  If 
there are new or unexplored areas the project should consider I hope 
that you will speak up and suggest them.

There is discussion underfoot that a new Bioperl object model may be 
born.  This has been called Bioperl2 and Bioperl-NG.  The idea is it 
would try and create a leaner and cleaner code base which is does 
things like event-based parsing, autogenerated code for things like 
getters/setters, and could do things faster and easier than we are 
currently.  Generally there is a lot of legacy code and legacy design 
in Bioperl and it would be beneficial to have a project that was free 
of these constraints.  At the same time there is an expectation that a 
project like this would also need to achieve something more than what 
the current bioperl API cannot do so it incumbent on the new project to 
have goals that are higher than what Bioperl can do.


Thank you
I'd like to finally thank some people who have done a lot this year.  
Of course I'm not going to remember to name everyone, but I just wanted 
to highlight some folks who have endeavored not only get the toolkit to 
do what they want, but also to help out other people get started with 
it.

The people who have kept the project going.  These are usual suspects 
how have labored to do the dirty grunt work cleaning up boring bugs, 
adding documentation, preparing a release, keeping the servers going, 
etc.  They also code too, but wanted to highlight that they have really 
been critical to keeping the project going by doing the things that 
most people don't want to bother with.
Brian Osborne
Aaron Mackey
Chris Dagdigian
Kyle Jenson  (mailing list and site searching at 
http://search.open-bio.org)

Some usual suspects who have been helping maintain their modules and 
generally being Bioperl knowledgeable on the list:
Scott Cain
Steve Chervitz
Allen Day
Donald Jackson
Stefan Kirov
Hilmar Lapp
Josh Lauricha
Heikki Lehvaslaiho
Chris Mungall
Jurgen Plentinckx
Lincon Stein

There are new several people who have taken up the slack as those 
before them have drifted onto other commitments. (metaphoric slack of 
course, not trying to accuse anyone of being a 'slacker').  Thanks for 
jumping in, fixing bugs, running tests, giving feedback, and just 
getting involved.  It is really encouraging when the project can be a 
2-way street and not just a one way flow information going out from a 
few people who post answers to the list.
Richard Adams
Sean Davis
Rob Edwards
Nathan Haigh
Marc Logghe
Barry Moore
Remo Sanges
James Thompson
Koen van der Drift (Bioperl available via fink on OS X)

Thanks also to Peter van Heusden and Electric Genetics which are 
undertaking a code audit of Bioperl and should have many helpful 
feedback points for us.

I've probably forgotten some people, please post a followup if I have 
neglected someone as I would like you to be recognized for your work 
since we don't give out a whole lot else right now.

A safe and prosperous New Year to you all.

Jason Stajich on behalf of the Bioperl core developers.
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Exon.pm
Type: application/octet-stream
Size: 5201 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050103/65408f91/Exon-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Results.pm
Type: application/octet-stream
Size: 12607 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050103/65408f91/Results-0001.obj
From Peter.Robinson at t-online.de  Mon Jan  3 17:40:01 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Mon Jan  3 17:36:25 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
Message-ID: <1104792001.3186.17.camel@localhost.localdomain>

Hi Bioperlers, hi Hilmar,

after some thinking I have embarked on a lex/yacc parser for the Entrez
Gene ASN.1 format as the way of least resistance, although I am not sure
how that would fit in to BioPerl. If anyone is interested in this (or
has a better idea of how to go about it..), please drop me a line.

In the meantime I have been looking at writing code to parse some of the
"easy" Entrez gene documents, starting off with gene_info. This file
includes the NCBI taxon id for each entry. I would like to convert this
to a Bio::Species object to pass to the following
	my $seq = $self->sequence_factory->create(
			     -verbose => $self->verbose(),
			     -accession_number => $geneID,  
			     -desc => $description,
			     -display_id => $symbol,
			     -species =>  ??? 
			     -annotation => $ann);

and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
this sort of thing. However, the code for that is pretty preliminary. Is
anyone working on this at the moment? Or is there a better way of doing
this (it seems a shame not to provide the actual species name if one has
the taxid...)

best

Peter


On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
> Great to hear that someone is giving this a shot. Yes at this point is  
> appears that NCBI is only offering the ASN.1, not a conversion to XML.  
> Their asn2xml tool will not work with this ASN.1 format either, just  
> checked it to be sure. They do seem to be mulling the option of XML  
> though on the Gene FAQ. Maybe if enough people get in their ears they  
> will spend some effort towards that. After all, the entrez gene web  
> interface can display XML on demand - even though it looks fairly  
> hideous.
> 
> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in  
> perl is actually thin - there is Convert::ASN1 at version 0.18 two  
> years ago that I could find ... doesn't make me feel warm and fuzzy.
> 
> In the absence of any XML available from NCBI, gene_info might be the  
> best start. An option could be to check for the presence of the other  
> tab-delimited files and use those that are present. These are  
> tab-delimited and hence the format itself is trivial so you can focus  
> entirely on setting up a Bio::Seq plus annotation that's  
> comparable/compatible to what the current SeqIO::locuslink does.
> 
> My $0.02 (worth less and less almost every day).
> 
> 	-hilmar
> 
> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
> 
> > Hi,
> >
> > I have been thinking about given a BioPerl EntrezGene parser a try  
> > since
> > I have been a heavy user of locus link to date. One issue is that the
> > files that correspond to LL_tmpl (which was a flat file) are now in asn
> > format
> > http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
> > genehelp.html#query
> > Although I saw some mention of ASN support in Bioperl by googling, I
> > can't seem to find any module that does this in the present
> > distribution. What is the status on that? In any case, I will be  
> > working
> > on this in the next month or two and if anything nice comes of it I  
> > will
> > send it to you / BioPerpl.
> >
> > best wishes & happy holidays
> >
> > Peter
> >
> > On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
> >> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for parsing
> >> any input file, what you're asking is whether or not there is a SeqIO
> >> parser for NCBI Gene.
> >>
> >> The answer to that question is no, not yet. Anybody who feels  
> >> motivated
> >> is welcome to give it a try ... Since I'll need it, I'll write the
> >> parser if nobody else does within the next 3 months, but I'm not going
> >> to promise when exactly this will happen.
> >>
> >> 	-hilmar
> >>
> >> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
> >>
> >>> Hi,
> >>>
> >>> I was wondering with regards to bioperl-db the scripts and schema and
> >>> load_seqdatabase.pl has there been preparation for integration of
> >>> Entrez
> >>> gene information when locuslink is phased out?  Or if it has already
> >>> been
> >>> changed could somebody point
> >>> me to the documentation or changed code?
> >>>
> >>> Thanks,
> >>> Annie.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> > -- 
> > Peter N. Robinson
> > peter.robinson@t-online.de
> > peter.robinson@charite.de
> > http://www.charite.de/ch/medgen/robinson/
> >
> >
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From jason.stajich at duke.edu  Tue Jan  4 09:33:30 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jan  4 09:34:55 2005
Subject: [Bioperl-l] Re: bug 1727
In-Reply-To: <1104835270.13556.9.camel@sb289.gbf-braunschweig.de>
References: <41D32343.4070007@gbf.de>
	<348C2892-59E3-11D9-B264-000393C44276@gmail.com>
	<1104835270.13556.9.camel@sb289.gbf-braunschweig.de>
Message-ID: <9ADB6CF6-5E5D-11D9-9C0C-000393C44276@duke.edu>

There is some guessing done if you do not supply a -format => 
$formatstring when you initialize the Bio::SeqIO object.

Please direct the questions to the mailing list.

-jason
On Jan 4, 2005, at 5:41 AM, Guido Dieterich wrote:

>  Hi Jason,
>
>  does the? Bio::SeqIO checks if a file or filehandle is in the 
> appropiate format, eg. a fasta format?
>  It seems that not, or?
>
>  Guido
>
>
>
>
>
>
> Bio::SeqIO::swiss I believe.
>
> -jason
> On Dec 29, 2004, at 4:36 PM, gdi wrote:
>
> > Hi jason,
> >
> > in which module is the bug (was)?
> >
> > Best regards Guido
> >
> >
> --
> Jason Stajich
> jason.stajich-at-gmail.com or jason-at-bioperl.org
> http://jason.open-bio.org
>
> -- 
>
>
> Dr. Guido Dieterich
> Dipl.-Biologe
>
> BioComputing
> SB  - Strukturbiologie                                     \==-|
> GBF - Gesellschaft fuer Biotechnologische Forschung         \=/   
> 0010010010100101110010
>       German Research Centre for Biotechnology              /-\
>                                                            /-==|  
> 0010100100111101010010
> WWW: http://www.gbf.de         _/_/_/   _/_/_/   _/_/_/   |==-/
> EMAIL: gdi@gbf.de            _/    _/  _/   _/  _/         \=/    
> 0100100100010010010101
>                             _/        _/   _/  _/          /\
> Mascheroder Weg 1          _/  _/    _/_/_/   _/_/_/      /=-\    
> 1101001010100101010101
> D-38124 Braunschweig      _/    _/  _/   _/  _/
> Tel: +(49) 531 6181 745  _/    _/  _/   _/  _/
> FAX: +(49) 531 2612 388   _/_/_/  _/_/_/   _/
>
> http://www.gbf.de/sb
>
>
> Es ist nicht genug, zu wissen, man muss auch anwenden.
> Es ist nicht genug, zu wollen, man muss auch tun.
> JOHANN WOLFGANG VON GOETHE
> Deutscher Dichter
> (1749 - 1832)
>
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2492 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/dd97b16d/attachment.bin
From rousse at ccr.jussieu.fr  Tue Jan  4 08:11:36 2005
From: rousse at ccr.jussieu.fr (Guillaume Rousse)
Date: Tue Jan  4 09:34:57 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module
In-Reply-To: <17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu>
References: <41D96A82.8070202@ccr.jussieu.fr>	<6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>	<41D97504.7020808@ccr.jussieu.fr>
	<17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu>
Message-ID: <41DA9608.9080900@ccr.jussieu.fr>

Jason Stajich wrote:
> If you want to build a node with two leaves, first you have to start 
> with a 'tree' section to tell the handler that this is nested data.
> Start a 'tree' event, build the node (like the section just above), then 
> build two leaf nodes (like the leaf node section above), then end the 
> 'tree' event.  'tree' is an unfortunate name for the event but don't 
> feel like changing it - a throwback from when I thought I'd only need an 
> initial 'tree' an just 'node' events.
> 
> $self->_eventHandler->start_document;
> $self->_eventHandler->start_element({'Name' => 'tree'});
> # do internal node
>   # do leaf node
>   # do leaf node
> $self->_eventHandler->end_element({'Name' => 'tree'});
> return $self->_eventHandler->end_document;
OK, done, but I still have an issue with each internal node connecting 
two leaves, producing a third intermediate leaf. I don't know if the 
problems comes from me or from bioperl. Here is my code, along with a 
test script.

I you don't want to install Algorithm::Cluster to test, the input data 
is something as:
  -1:   5   4   0.000
  -2:   7   6   0.000
  -3:  10  11   0.010
  -4:   2   0   0.090
  -5:  -3  12   0.095
  -6:   1  -4   0.115
  -7:  -5   9   0.143
  -8:  -1   3   0.250
  -9:  -2  -7   0.618
-10:  -8  -6   0.639
-11:   8 -10   5.805
-12:  -9 -11  28.056
Where the first column is internal node id, the second and third one the 
  children id for each node, and the fourth one the distance between the 
children.

I also patched svggraph to use parameters instead of hard-coded values, 
and also to allow some normalisation for the branches lengths, in such a 
way that it would be easy to add new normalisation functions, including 
arbitrary code. Patch attached too.
-- 
No flight ever leaves on time unless you are running late and need the 
delay to make the flight
		-- Murphy's Laws for Frequent Flyers n?1
-------------- next part --------------
#!/usr/bin/perl 

use Algorithm::Cluster;
use Bio::TreeIO;
use strict;

my $weight =  [ 1,1 ];

my $data =  [
	[ 1.1, 1.2 ],
	[ 1.4, 1.3 ],
	[ 1.1, 1.5 ],
	[ 2.0, 1.5 ],
	[ 1.7, 1.9 ],
	[ 1.7, 1.9 ],
	[ 5.7, 5.9 ],
	[ 5.7, 5.9 ],
	[ 3.1, 3.3 ],
	[ 5.4, 5.3 ],
	[ 5.1, 5.5 ],
	[ 5.0, 5.5 ],
	[ 5.1, 5.2 ],
];

my $mask =  [
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
	[ 1, 1 ],
];

my $labels = [ qw/a b c d e f g h i j k l m/ ];

my %params = (
	applyscale =>         0,
	transpose  =>         0,
	method     =>       'a',
	dist       =>       'e',
	data      =>    $data,
	mask      =>    $mask,
	weight    =>  $weight,
);

my ($result, $linkdist);
my ($i,$j);

($result, $linkdist) = Algorithm::Cluster::treecluster(%params);

$i=0;
foreach(@{$result}) {
	printf("%3d: %3d %3d %7.3f\n",-1-$i,$_->[0],$_->[1],$linkdist->[$i]);
	++$i;
}

my $in = new Bio::TreeIO(
    -format   => 'cluster',
    -result   => $result,
    -linkdist => $linkdist,
    -labels   => $labels,
);
my $out = new Bio::TreeIO(
    -format => 'svggraph',
    -file   => '>output.svg'
);
$out->write_tree($in->next_tree());
-------------- next part --------------
# $Id: nexus.pm,v 1.2 2003/12/06 18:10:26 jason Exp $
#
# BioPerl module for Bio::TreeIO::cluster
#
# Contributed by Guillaume Rousse <Guillaume-dot-Rousse-at-inria-dot-fr>
#
# Copyright INRIA
#
# You may distribute this module under the same terms as perl itself

# POD documentation - main docs before the code

=head1 NAME

Bio::TreeIO::cluster - A TreeIO driver module for parsing Algorithm::Cluster::treecluster output

=head1 SYNOPSIS

  # do not use this module directly
  use Bio::TreeIO;
  use Algorithm::Cluster::treecluster;
  my ($result, $linkdist) = Algorithm::Cluster::treecluster(
    distances => $matrix
  );
  my $treeio = new Bio::TreeIO(
    -format   => 'cluster',
    -result   =>  $result,
    -linkdist => $linkdist,
    -labels   => $labels
  );
  my $tree = $treeio->next_tree;

=head1 DESCRIPTION

This is a driver module for parsing Algorithm::Cluster::treecluster output.

=head1 FEEDBACK

=head2 Mailing Lists

User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list.  Your participation is much appreciated.

  bioperl-l@bioperl.org              - General discussion
  http://bioperl.org/MailList.shtml  - About the mailing lists

=head2 Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via
the web:

  http://bugzilla.bioperl.org/

=head1 AUTHOR - Guillaume Rousse

Email Guillaume-dot-Rousse-at-inria-dot-fr

Describe contact details here

=head1 CONTRIBUTORS

Additional contributors names and emails here

=head1 APPENDIX

The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _

=cut


# Let the code begin...


package Bio::TreeIO::cluster;
use vars qw(@ISA);
use strict;

use Bio::TreeIO;
use Bio::Event::EventGeneratorI;
use IO::String;

@ISA = qw(Bio::TreeIO);

sub _initialize {
  my ($self, %args) = @_;
  $self->{_result}   = $args{'-result'};
  $self->{_linkdist} = $args{'-linkdist'};
  $self->{_labels}   = $args{'-labels'};
  $self->SUPER::_initialize(%args);
}

=head2 next_tree

 Title   : next_tree
 Usage   : my $tree = $treeio->next_tree
 Function: Gets the next tree in the stream
 Returns : Bio::Tree::TreeI
 Args    : none


=cut

sub next_tree {
    my ($self) = @_;

    $self->_eventHandler->start_document();

    # build tree from the root
    $self->_eventHandler->start_element({Name => 'tree'});
    $self->_recurse(-1, 0);
    $self->_recurse(-1, 1);
    $self->_eventHandler->end_element({Name => 'tree'});

    return $self->_eventHandler->end_document;
}

sub _recurse {
    my ($self, $line, $column) = @_;

    my $id  = $self->{_result}->[$line]->[$column];
    if ($id >= 0) {
	# leaf
	$self->debug("leaf $id\n");
	$self->debug("distance $self->{_linkdist}->[$line]\n");
	$self->debug("label $self->{_labels}->[$id]\n");
	$self->_eventHandler->start_element({Name => 'node'});
	$self->_eventHandler->start_element({Name => 'branch_length'});
	$self->_eventHandler->characters($self->{_linkdist}->[$line]);
	$self->_eventHandler->end_element({Name => 'branch_length'});
	$self->_eventHandler->start_element({Name => 'id'});
	$self->_eventHandler->characters($self->{_labels}->[$id]);
	$self->_eventHandler->end_element({Name => 'id'});
	$self->_eventHandler->start_element({Name => 'leaf'});
	$self->_eventHandler->characters(1);
	$self->_eventHandler->end_element({Name => 'leaf'});
	$self->_eventHandler->end_element({Name => 'node'});
    } else {
	# internal node
	$self->debug("internal node $id\n");
	$self->debug("distance $self->{_linkdist}->[$line]\n");
	$self->_eventHandler->start_element({Name => 'node'});
	$self->_eventHandler->start_element({Name => 'branch_length'});
	$self->_eventHandler->characters($self->{_linkdist}->[$line]);
	$self->_eventHandler->end_element({Name => 'branch_length'});
	$self->_eventHandler->start_element({Name => 'leaf'});
	$self->_eventHandler->characters(0);
	$self->_eventHandler->end_element({Name => 'leaf'});
	$self->_eventHandler->start_element({Name => 'tree'});
	my $child_id = - ($id + 1);
	$self->_recurse($child_id, 0);
	$self->_recurse($child_id, 1);
	$self->_eventHandler->end_element({Name => 'tree'});
	$self->_eventHandler->end_element({Name => 'node'});
    }
}

=head2 write_tree

 Title   : write_tree
 Usage   :
 Function: Sorry not possible with this format
 Returns : none
 Args    : none


=cut

sub write_tree{
    $_[0]->throw("Sorry the format 'cluster' can only be used as an input format");
}

1;
-------------- next part --------------
--- /usr/lib/perl5/vendor_perl/5.8.6/Bio/TreeIO/svggraph.pm	2003-11-28 07:27:16.000000000 +0100
+++ Bio/TreeIO/svggraph.pm	2005-01-04 13:57:14.265334869 +0100
@@ -86,22 +86,16 @@
 
 @ISA = qw(Bio::TreeIO );
 
-=head2 new
-
- Title   : new
- Usage   : my $obj = new Bio::TreeIO::svggraph();
- Function: Builds a new Bio::TreeIO::svggraph object 
- Returns : Bio::TreeIO::svggraph
- Args    :
-
-
-=cut
-
-sub new {
-  my($class,@args) = @_;
-
-  my $self = $class->SUPER::new(@args);
-
+sub _initialize {
+  my ($self, %args) = @_;
+  $self->{_width}        = $args{'-width'} || 1600;
+  $self->{_height}       = $args{'-height'} || 1000;
+  $self->{_margin}       = $args{'-margin'} || 30;
+  $self->{_stroke}       = $args{'-stroke'} || 'black';
+  $self->{_stroke_width} = $args{'-stroke_width'} || 2;
+  $self->{_font_size}    = $args{'-font_size'} || '10px';
+  $self->{_normalize}    = $args{'-normalize'};
+  $self->SUPER::_initialize(%args);
 }
 
 =head2 write_tree
@@ -116,28 +110,35 @@
 
 sub write_tree{
    my ($self,$tree) = @_;
-   my $line = _write_tree_Helper($tree->get_root_node);
+   my $line = $self->_write_tree_Helper($tree->get_root_node);
    $self->_print($line. "\n");
    $self->flush if $self->_flush_on_write && defined $self->_fh;
    return;
 }
 
 sub _write_tree_Helper {
-   my ($node) = @_;
+   my ($self,$node) = @_;
 
-   #this needs to be parameterized
-   my $graph = SVG::Graph->new(width=>1600,height=>1000,margin=>30);
+   my $graph = SVG::Graph->new(
+       width  => $self->{_width},
+       height => $self->{_height},
+       margin => $self->{_margin}
+   );
 
    my $group0 = $graph->add_frame;
    my $tree = SVG::Graph::Data::Tree->new;
    my $root = SVG::Graph::Data::Node->new;
    $root->name($node->id);
-   _decorateRoot($root, $node->each_Descendent());
+   $self->_decorateRoot($root, $node->each_Descendent());
    $tree->root($root);
    $group0->add_data($tree);
 
-   #this needs to be parameterized
-   $group0->add_glyph('tree', stroke=>'black','stroke-width'=>2,'font-size'=>'10px');
+   $group0->add_glyph(
+       'tree',
+       'stroke'       => $self->{_stroke},
+       'stroke-width' => $self->{_stroke_width},
+       'font-size'    => $self->{_font_size}
+   );
 
    return($graph->draw);
 }
@@ -156,16 +157,21 @@
 =cut
 
 sub _decorateRoot{
-  my $previousNode = shift;
-  my @children = @_;
-   foreach my $child (@children)
-	 {
-	   my $currNode = SVG::Graph::Data::Node->new;
-	   $currNode->branch_label($child->id);
-	   $currNode->branch_length($child->branch_length);
-	   $previousNode->add_daughter($currNode);
-	   _decorateRoot($currNode, $child->each_Descendent());
-	 }
+  my ($self,$previousNode,@children) = @_;
+  foreach my $child (@children) {
+    my $currNode = SVG::Graph::Data::Node->new;
+    $currNode->branch_label($child->id);
+    my $length = $child->branch_length;
+    CASE: {
+      if ($self->{_normalize} eq 'log') {
+	$length = log($length + 1);
+	last CASE;
+      }
+    }
+    $currNode->branch_length($length);
+    $previousNode->add_daughter($currNode);
+    $self->_decorateRoot($currNode, $child->each_Descendent());
+  }
 }
 
 =head2 next_tree
From jason.stajich at duke.edu  Tue Jan  4 10:45:53 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jan  4 10:42:27 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module
In-Reply-To: <41DA9608.9080900@ccr.jussieu.fr>
References: <41D96A82.8070202@ccr.jussieu.fr>	<6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>	<41D97504.7020808@ccr.jussieu.fr>
	<17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu>
	<41DA9608.9080900@ccr.jussieu.fr>
Message-ID: <B74F0B72-5E67-11D9-9C0C-000393C44276@duke.edu>

Okay. It is committed in CVS.  Can see about getting you a CVS account 
if you want to adapt this more.

I made the argument initialization a little more bioperl-like (using 
_rearrange).   Your example code produces a sensible tree I believe, 
can you confirm that it works fine on your end too? (it may take up to 
15 minutes for the anon CVS repository to sync from the read-write 
one).

-jason


--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From rasa at obj.hopto.org  Tue Jan  4 11:31:07 2005
From: rasa at obj.hopto.org (Rasa Gulbinaite)
Date: Tue Jan  4 11:28:52 2005
Subject: [Bioperl-l] Fasta headers
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCCELEEGAA.brian_osborne@cognia.com>
References: <1267.81.7.113.128.1104763270.squirrel@81.7.113.128>
	<GAEDKMGOKFBLJPKCLKCCCELEEGAA.brian_osborne@cognia.com>
Message-ID: <1221.81.7.113.128.1104856267.squirrel@81.7.113.128>

Thank you all very much. It was really helpful.


Rasa

> Rasa,
>
> On Unix:
>
>>grep '>' fasta-file
>
> Anywhere, with Perl:
>
>>perl -ne 'print if /^>/' fasta-file
>
>
> Brian O.
>
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Rasa
> Gulbinaite
> Sent: Monday, January 03, 2005 9:41 AM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] Fasta headers
>
>
> Hello,
>
> i'm new to bioperl and a bit confused with fasta file headers. I'm working
> with SNPs and i would like to get only the fasta headers form the fasta
> file, not the sequences. What would be the best way to do this? Thank you.
>
> Rasa
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>


From dcn208 at nyu.edu  Tue Jan  4 12:06:50 2005
From: dcn208 at nyu.edu (Damion Colin Nero)
Date: Tue Jan  4 14:50:55 2005
Subject: [Bioperl-l] Script Request
Message-ID: <4c46b64c3694.4c36944c46b6@nyu.edu>

An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/5764a4c2/attachment.htm
From Guido.Dieterich at gbf.de  Tue Jan  4 09:44:58 2005
From: Guido.Dieterich at gbf.de (Guido Dieterich)
Date: Tue Jan  4 14:51:03 2005
Subject: [Bioperl-l] format checking
Message-ID: <1104849898.13555.13.camel@sb289.gbf-braunschweig.de>

Hi bioperlaner,

         
            <code>$seqIO = Bio::SeqIO->new(-fh   => \*FH,
        -format=>'fasta');</code>
        I tried this 
        as a fasta file 
        
        gene_name>
        PPPPGGGAAAAA # any Sequence
        
        this does not create an error
        
        does the  Bio::SeqIO checks if a file or filehandle is in the
        appr opiate format,eg. a fasta format?
        It seems that not, or?
        
        Guido

-- 
			
			
Dr. Guido Dieterich                                    
Dipl.-Biologe										      

BioComputing                                                									 											 
SB  - Strukturbiologie                                     \==-|	   
GBF - Gesellschaft fuer Biotechnologische Forschung         \=/   0010010010100101110010
      German Research Centre for Biotechnology		    /-\ 	   
                                                           /-==|  0010100100111101010010
WWW: http://www.gbf.de         _/_/_/   _/_/_/   _/_/_/   |==-/ 	   
EMAIL: gdi@gbf.de            _/    _/  _/   _/  _/	   \=/    0100100100010010010101
                            _/        _/   _/  _/          /\  	   
Mascheroder Weg 1          _/  _/    _/_/_/   _/_/_/      /=-\    1101001010100101010101
D-38124 Braunschweig      _/    _/  _/   _/  _/
Tel: +(49) 531 6181 745  _/    _/  _/   _/  _/
FAX: +(49) 531 2612 388   _/_/_/  _/_/_/   _/

http://www.gbf.de/sb


Es ist nicht genug, zu wissen, man muss auch anwenden.
Es ist nicht genug, zu wollen, man muss auch tun.
JOHANN WOLFGANG VON GOETHE
Deutscher Dichter
(1749 - 1832)
From rousse at ccr.jussieu.fr  Tue Jan  4 11:55:55 2005
From: rousse at ccr.jussieu.fr (Guillaume Rousse)
Date: Tue Jan  4 14:51:12 2005
Subject: [Bioperl-l] Need help for implementing a new TreeIO module
In-Reply-To: <B74F0B72-5E67-11D9-9C0C-000393C44276@duke.edu>
References: <41D96A82.8070202@ccr.jussieu.fr>	<6ACDABCF-5DA2-11D9-B9B4-000393C44276@duke.edu>	<41D97504.7020808@ccr.jussieu.fr>
	<17F1ACAA-5DA9-11D9-B9B4-000393C44276@duke.edu>
	<41DA9608.9080900@ccr.jussieu.fr>
	<B74F0B72-5E67-11D9-9C0C-000393C44276@duke.edu>
Message-ID: <41DACA9B.9090108@ccr.jussieu.fr>

Jason Stajich wrote:
> Okay. It is committed in CVS.  Can see about getting you a CVS account 
> if you want to adapt this more.

> 
> I made the argument initialization a little more bioperl-like (using 
> _rearrange).   Your example code produces a sensible tree I believe, can 
> you confirm that it works fine on your end too? (it may take up to 15 
> minutes for the anon CVS repository to sync from the read-write one).
It's OK, apart the issue reported earlier about final internal nodes. 
I'm joining the graphic output, and a dump of the tree.
-- 
Undetectable errors are infinite in variety, in contrast to detectable 
errors, which by definition are limited.
	-- Murphy's Laws of Computer Programming n?14
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.svg.bz2
Type: application/octet-stream
Size: 1343 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/2ce0ada9/test.svg.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.dump.bz2
Type: application/octet-stream
Size: 1087 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050104/2ce0ada9/test.dump.obj
From sdavis2 at mail.nih.gov  Tue Jan  4 15:21:42 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Jan  4 15:20:01 2005
Subject: [Bioperl-l] Script Request
In-Reply-To: <4c46b64c3694.4c36944c46b6@nyu.edu>
References: <4c46b64c3694.4c36944c46b6@nyu.edu>
Message-ID: <3F7EC9F4-5E8E-11D9-B000-000D933565E8@mail.nih.gov>

Damion,

While not a direct answer, you might look at the PDL library.  PDL has 
numeric functions for working with matrices.

As for averaging, you can simply add the numbers that you are 
interested in and divide by the number of elements.  There are many 
ways to do this in perl.  Check out:

http://www.bu.edu/linguistics/UG/course/lx865/lab-perl.html

Sean

On Jan 4, 2005, at 12:06 PM, Damion Colin Nero wrote:

> I am looking for a perl script that can average small groups of 
> numbers down columns (i.e. 50 out of 500 numbers).? I know this is a 
> simple thing to do but I am a new user and have been having problems 
> with it so if you have a script that I could use or that might be?a 
> good reference please let me know.? thanks.
>
> Damion Nero
> Coruzzi Lab
> Department of Biology
> New York University
> 766 Waverly building
> New York, NY 10003-6688
> Tel: (212) 998-3963
> email: dcn208@nyu.edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From razi at genet.sickkids.on.ca  Tue Jan  4 15:44:15 2005
From: razi at genet.sickkids.on.ca (Razi Khaja)
Date: Tue Jan  4 15:40:52 2005
Subject: [Bioperl-l] Script Request
In-Reply-To: <4c46b64c3694.4c36944c46b6@nyu.edu>
Message-ID: <20050104204415.31989.qmail@web51607.mail.yahoo.com>

Not sure if this is what you want but try the script below.  You should read O'Reilly Learning Perl http://www.oreilly.com/catalog/lperl3/
 
#!/usr/bin/perl
use strict;
 
my( $file, $col ) = @ARGV;
 
my $total=0;
my $n=0;
 
open( FILE, $file );
while( <FILE> ) {
    my @field = split(/\s+/, $_);
    $total += $field[$col];
    $n++;
}
close( FILE );
 
my $avg = $total / $n;
print "$avg\n";

Razi

 
Damion Colin Nero <dcn208@nyu.edu> wrote:
I am looking for a perl script that can average small groups of numbers down columns (i.e. 50 out of 500 numbers).  I know this is a simple thing to do but I am a new user and have been having problems with it so if you have a script that I could use or that might be a good reference please let me know.  thanks.

Damion Nero 
Coruzzi Lab 
Department of Biology 
New York University 
766 Waverly building 
New York, NY 10003-6688 
Tel: (212) 998-3963 
email: dcn208@nyu.edu 


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

/**
 * Razi Khaja, Bioinformatics Analyst
 * The Hospital for Sick Children, Toronto
 * The Centre for Applied Genomics, www.tcag.ca
 * Tel 416-813-7032, Fax 416-813-8319
 */
From brian_osborne at cognia.com  Tue Jan  4 15:46:03 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Tue Jan  4 15:43:23 2005
Subject: [Bioperl-l] format checking
In-Reply-To: <1104849898.13555.13.camel@sb289.gbf-braunschweig.de>
Message-ID: <GAEDKMGOKFBLJPKCLKCCIEMPEGAA.brian_osborne@cognia.com>

Guido,

The answer is sometimes yes, sometimes no, Bioperl doesn't appear to be
consistent. This is Bug 1508:

http://bugzilla.bioperl.org/show_bug.cgi?id=1508


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Guido
Dieterich
Sent: Tuesday, January 04, 2005 9:45 AM
To: Bioperl List
Subject: [Bioperl-l] format checking


Hi bioperlaner,


            <code>$seqIO = Bio::SeqIO->new(-fh   => \*FH,
        -format=>'fasta');</code>
        I tried this
        as a fasta file

        gene_name>
        PPPPGGGAAAAA # any Sequence

        this does not create an error

        does the  Bio::SeqIO checks if a file or filehandle is in the
        appr opiate format,eg. a fasta format?
        It seems that not, or?

        Guido

--


Dr. Guido Dieterich
Dipl.-Biologe

BioComputing
SB  - Strukturbiologie                                     \==-|
GBF - Gesellschaft fuer Biotechnologische Forschung         \=/
0010010010100101110010
      German Research Centre for Biotechnology		    /-\
                                                           /-==|
0010100100111101010010
WWW: http://www.gbf.de         _/_/_/   _/_/_/   _/_/_/   |==-/
EMAIL: gdi@gbf.de            _/    _/  _/   _/  _/	   \=/
0100100100010010010101
                            _/        _/   _/  _/          /\
Mascheroder Weg 1          _/  _/    _/_/_/   _/_/_/      /=-\
1101001010100101010101
D-38124 Braunschweig      _/    _/  _/   _/  _/
Tel: +(49) 531 6181 745  _/    _/  _/   _/  _/
FAX: +(49) 531 2612 388   _/_/_/  _/_/_/   _/

http://www.gbf.de/sb


Es ist nicht genug, zu wissen, man muss auch anwenden.
Es ist nicht genug, zu wollen, man muss auch tun.
JOHANN WOLFGANG VON GOETHE
Deutscher Dichter
(1749 - 1832)
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Tue Jan  4 16:03:42 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jan  4 16:00:21 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <1104871954.3102.24.camel@localhost.localdomain>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
Message-ID: <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>


On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:

> Hi Jason,
>
> thanks for the advice. It seems as if the documentation of
> Bio::DB::Taxonomy is a bit out of sync.
>  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                                  -nodesfile => $nodesfile,
>                                  -namesfile => $namefile);
> What does 'flatfile' refer to here? It is not apparent upon looking at 
> the code for new.
>
See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned 
in the mail I sent, flatfile is for downloading the taxonomy DB from 
NCBI.  This lets you run it locally using an indexed  (BerkelyDB via 
DB_File) version of the file.

You must need the most up-to-date verion of the modules - works fine 
for me for both the entrez and flatfile code, but you may have to 
upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1 
code should work fine.


> I had somewhat better luck using the entrez version, but I got a 
> pretty amusing error
> message:
>
> MSG: can't create a species object for Homo sapiens (human) because it
> isn't a species but is a '' instead
>
> ###
> Full error and a dump of the script follow:
>
> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
> my $taxaid = $db->get_taxonid('Homo sapiens');
> my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
> print Dumper($species);
>
> ###
>
> Use of uninitialized value in string eq at
> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> Use of uninitialized value in sprintf at
> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>
> -------------------- WARNING ---------------------
> MSG: can't create a species object for Homo sapiens (human) because it
> isn't a species but is a '' instead
> ---------------------------------------------------
> Use of uninitialized value in string eq at
> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> Use of uninitialized value in sprintf at
> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>
> -------------------- WARNING ---------------------
> MSG: can't create a species object for Homo sapiens (human) because it
> isn't a species but is a '' instead
> ---------------------------------------------------
> $VAR1 = {
>           'TaxId' => '9606',
>           'Division' => 'mammals',
>           'GeneNumber' => '32775',
>           'Rank' => 'species',
>           'ProtNumber' => '247791',
>           'ScientificName' => 'Homo sapiens',
>           'CommonName' => 'human',
>           'NucNumber' => '9025800',
>           'GenNumber' => '25',
>           'StructNumber' => '5638'
>         };
> peter@anna:~/programs/bioperlTest$
>
>
> --best, peter
>
> On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
>> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
>> species object (or equivalent) using this code.  But you cannot (or
>> could not when I wrote this, not sure of the current status) get the
>> full classification from the NCBI taxonomy retrieval via cgi.  i.e. 
>> you
>> can only get genus and species for a taxon id and I don't know how to
>> walk up the hierarchy using the web API.  Earlier emails to NCBI 
>> seemed
>> to indicate this is all they intended to provide, but not sure what 
>> the
>> current status is.
>>
>>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI 
>> Entrez
>> over HTTP
>>    my $taxaid = $db->get_taxonid('Homo sapiens');
>>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
>>
>> You can get the full classification if you use the
>> Bio::DB::Taxonomy::flatfile factory which requires you to have
>> downloaded the taxonomy db flatfile from NCBI.  Since this is more
>> reliable (and faster) it is what I have tended to use for grouping 
>> sets
>> of seqDB search results, etc.
>>
>> -jason
>> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
>>
>>> Hi Bioperlers, hi Hilmar,
>>>
>>> after some thinking I have embarked on a lex/yacc parser for the 
>>> Entrez
>>> Gene ASN.1 format as the way of least resistance, although I am not
>>> sure
>>> how that would fit in to BioPerl. If anyone is interested in this (or
>>> has a better idea of how to go about it..), please drop me a line.
>>>
>>> In the meantime I have been looking at writing code to parse some of
>>> the
>>> "easy" Entrez gene documents, starting off with gene_info. This file
>>> includes the NCBI taxon id for each entry. I would like to convert 
>>> this
>>> to a Bio::Species object to pass to the following
>>> 	my $seq = $self->sequence_factory->create(
>>> 			     -verbose => $self->verbose(),
>>> 			     -accession_number => $geneID,
>>> 			     -desc => $description,
>>> 			     -display_id => $symbol,
>>> 			     -species =>  ???
>>> 			     -annotation => $ann);
>>>
>>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
>>> this sort of thing. However, the code for that is pretty preliminary.
>>> Is
>>> anyone working on this at the moment? Or is there a better way of 
>>> doing
>>> this (it seems a shame not to provide the actual species name if one
>>> has
>>> the taxid...)
>>>
>>> best
>>>
>>> Peter
>>>
>>>
>>>
>>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
>>>> Great to hear that someone is giving this a shot. Yes at this point 
>>>> is
>>>> appears that NCBI is only offering the ASN.1, not a conversion to 
>>>> XML.
>>>> Their asn2xml tool will not work with this ASN.1 format either, just
>>>> checked it to be sure. They do seem to be mulling the option of XML
>>>> though on the Gene FAQ. Maybe if enough people get in their ears 
>>>> they
>>>> will spend some effort towards that. After all, the entrez gene web
>>>> interface can display XML on demand - even though it looks fairly
>>>> hideous.
>>>>
>>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in
>>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two
>>>> years ago that I could find ... doesn't make me feel warm and fuzzy.
>>>>
>>>> In the absence of any XML available from NCBI, gene_info might be 
>>>> the
>>>> best start. An option could be to check for the presence of the 
>>>> other
>>>> tab-delimited files and use those that are present. These are
>>>> tab-delimited and hence the format itself is trivial so you can 
>>>> focus
>>>> entirely on setting up a Bio::Seq plus annotation that's
>>>> comparable/compatible to what the current SeqIO::locuslink does.
>>>>
>>>> My $0.02 (worth less and less almost every day).
>>>>
>>>> 	-hilmar
>>>>
>>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have been thinking about given a BioPerl EntrezGene parser a try
>>>>> since
>>>>> I have been a heavy user of locus link to date. One issue is that 
>>>>> the
>>>>> files that correspond to LL_tmpl (which was a flat file) are now in
>>>>> asn
>>>>> format
>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
>>>>> genehelp.html#query
>>>>> Although I saw some mention of ASN support in Bioperl by googling, 
>>>>> I
>>>>> can't seem to find any module that does this in the present
>>>>> distribution. What is the status on that? In any case, I will be
>>>>> working
>>>>> on this in the next month or two and if anything nice comes of it I
>>>>> will
>>>>> send it to you / BioPerpl.
>>>>>
>>>>> best wishes & happy holidays
>>>>>
>>>>> Peter
>>>>>
>>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
>>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
>>>>>> parsing
>>>>>> any input file, what you're asking is whether or not there is a
>>>>>> SeqIO
>>>>>> parser for NCBI Gene.
>>>>>>
>>>>>> The answer to that question is no, not yet. Anybody who feels
>>>>>> motivated
>>>>>> is welcome to give it a try ... Since I'll need it, I'll write the
>>>>>> parser if nobody else does within the next 3 months, but I'm not
>>>>>> going
>>>>>> to promise when exactly this will happen.
>>>>>>
>>>>>> 	-hilmar
>>>>>>
>>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was wondering with regards to bioperl-db the scripts and schema
>>>>>>> and
>>>>>>> load_seqdatabase.pl has there been preparation for integration of
>>>>>>> Entrez
>>>>>>> gene information when locuslink is phased out?  Or if it has
>>>>>>> already
>>>>>>> been
>>>>>>> changed could somebody point
>>>>>>> me to the documentation or changed code?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Annie.
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l@portal.open-bio.org
>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>> -- 
>>>>> Peter N. Robinson
>>>>> peter.robinson@t-online.de
>>>>> peter.robinson@charite.de
>>>>> http://www.charite.de/ch/medgen/robinson/
>>>>>
>>>>>
>>> -- 
>>> Peter N. Robinson
>>> peter.robinson@t-online.de
>>> peter.robinson@charite.de
>>> http://www.charite.de/ch/medgen/robinson/
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
> -- 
> Peter N. Robinson
> peter.robinson@t-online.de
> peter.robinson@charite.de
> http://www.charite.de/ch/medgen/robinson/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jateu001 at uni-duesseldorf.de  Wed Jan  5 07:10:31 2005
From: jateu001 at uni-duesseldorf.de (Jan Teune)
Date: Wed Jan  5 07:07:31 2005
Subject: [Bioperl-l] Bio::Biblio
Message-ID: <41DBD937.3090506@uni-duesseldorf.de>

Hello @ all,
I'm writing a small script to fetch PubMed-Articles. Since two weeks 
before Christmas, I have a problem to fetch Articles. Below is some Code 
and the Error-Message:

#!/usr/bin/perl -w
use Bio::Biblio;
my $pmid = "15542139";
my $biblio = new Bio::Biblio(
                 -access    => 'soap',
                 -location    => 'http://industry.ebi.ac.uk/soap/openBQS',
                 -destroy_on_exit => '0',
                 );
my $citation = $biblio->get_by_id($pmid);
print $citation;
The Error-Message:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: --- TRANSPORT ERROR ---
502 Proxy Error
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328
STACK: try{} block /usr/share/perl5/Bio/DB/Biblio/soap.pm:119
STACK: SOAP::Lite::call /usr/share/perl5/SOAP/Lite.pm:3006
STACK: try{} block /usr/share/perl5/SOAP/Lite.pm:2950
STACK: Bio::DB::Biblio::soap::get_by_id 
/usr/share/perl5/Bio/DB/Biblio/soap.pm:368
STACK: ./bibliotest.pl:9
-----------------------------------------------------------

I'm happy for any kind of help,

Jan  :-)

From khufaz83 at yahoo.com  Wed Jan  5 09:34:53 2005
From: khufaz83 at yahoo.com (hafiz hafiz)
Date: Wed Jan  5 09:31:40 2005
Subject: [Bioperl-l] Change format  in CGI.
Message-ID: <20050105143453.93548.qmail@web52509.mail.yahoo.com>

hii..

i have done to built searching sequence by seqIO in
url but i still can't change format using seqIO in URL
why?

please refer this url:

http://tiara.cs.usm.my/Bioprotein.html


first select searching by sequence and then select
libary
swissprot and enter a sequence. after that u can see
XML,
swissprot, fasta and Pir Menu. that the change format
function
and it still not working in url but it working well in
our terminal.  

________________________________________________________________________
Yahoo! Messenger - Communicate instantly..."Ping" 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html
From suzuki at cbl.umces.edu  Wed Jan  5 12:51:18 2005
From: suzuki at cbl.umces.edu (Marcelino Suzuki)
Date: Wed Jan  5 12:48:05 2005
Subject: [Bioperl-l] OS X bioperl, staden/read, install problems
Message-ID: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu>

	Well,  I tried all I could, but I keep getting the same error as  
srikanth patury:

http://bioperl.org/pipermail/bioperl-guts-l/2004-June/016855.html

	Got to the point I got no errors in the make step of bioperl-ext

	I installed io_lib (1.8.11)
	I copied os.h and config.h to /usr/local/include/io_lib
	I changed os.h to remove the "<:" and ">" around config.h
	I ranlib /usr/local/lib/libread.a
	after an error message I also
	ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a

	Is there any other thing that needs to be done.

	I am using perl 5.8.1

	Thanks

	Marcelino suzuki

	suzuki at cbl dot umces dot edu

From Marc.Logghe at devgen.com  Wed Jan  5 14:10:12 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Jan  5 14:07:17 2005
Subject: [Bioperl-l] Bio::Biblio
Message-ID: <BEE28BF86078B6429D6C780635718E21B3A085@morelia.be.devgen.com>

Happy Newyear to you all !

Jan, there is nothing to worry about concerning your code.
It is just that there are some electricity problems at the EBI which caused the soap server to be down for while.
So, most of the soap services are currently unavailable.
The original message you find on the Taverna mailing list:
http://sourceforge.net/mailarchive/forum.php?thread_id=6274312&forum_id=35847

So, I guess there is nothing that you can do besides waiting ;-)

HTH,
Marc


-----Oorspronkelijk bericht-----
Van: bioperl-l-bounces@portal.open-bio.org namens Jan Teune
Verzonden: wo 5-1-2005 13:10
Aan: bioperl-l@bioperl.org
Onderwerp: [Bioperl-l] Bio::Biblio
 
Hello @ all,
I'm writing a small script to fetch PubMed-Articles. Since two weeks 
before Christmas, I have a problem to fetch Articles. Below is some Code 
and the Error-Message:

#!/usr/bin/perl -w
use Bio::Biblio;
my $pmid = "15542139";
my $biblio = new Bio::Biblio(
                 -access    => 'soap',
                 -location    => 'http://industry.ebi.ac.uk/soap/openBQS',
                 -destroy_on_exit => '0',
                 );
my $citation = $biblio->get_by_id($pmid);
print $citation;
The Error-Message:
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: --- TRANSPORT ERROR ---
502 Proxy Error
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328
STACK: try{} block /usr/share/perl5/Bio/DB/Biblio/soap.pm:119
STACK: SOAP::Lite::call /usr/share/perl5/SOAP/Lite.pm:3006
STACK: try{} block /usr/share/perl5/SOAP/Lite.pm:2950
STACK: Bio::DB::Biblio::soap::get_by_id 
/usr/share/perl5/Bio/DB/Biblio/soap.pm:368
STACK: ./bibliotest.pl:9
-----------------------------------------------------------

I'm happy for any kind of help,

Jan  :-)

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From golharam at umdnj.edu  Wed Jan  5 15:41:33 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed Jan  5 15:33:03 2005
Subject: [Bioperl-l] Error parsing Genbank file
Message-ID: <007501c4f366$f1a14bb0$a6028a0a@GOLHARMOBILE1>

Hi all,

I have a Genbank file that Bio::SeqIO:genbank.pm is choking on.  The
entry is just a WGS entry referencing a bunch of other entries.  It does
on line 492 with the error "Unexpected error in feature table for
Skipping feature, attempting to recover".

I'm using the following code:

#!/usr/bin/perl

use strict;
use Bio::SeqIO;

my $usage = "$0 <genbank file> <fasta file>\n";
my $file = shift or die $usage;
my $outfilename = shift or die $usage;

my $infile = Bio::SeqIO->new('-file' => "<$file",
			    '-format' => "genbank");

my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
			    '-format' => "fasta");

while (my $seq = $infile->next_seq) {
#	print STDERR $seq->accession_number,"\n";
	
	$outfile->write_seq($seq);
}

Here is the contents of the genbank entry:

LOCUS       CAAB01000000           12381 rc    DNA     linear   VRT
22-AUG-2002
DEFINITION  Takifugu rubripes whole genome shotgun sequencing project.
ACCESSION   CAAB00000000
VERSION     CAAB00000000.1  GI:22418063
KEYWORDS    WGS.
SOURCE      Takifugu rubripes (Fugu rubripes)
  ORGANISM  Takifugu rubripes
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
            Actinopterygii; Neopterygii; Teleostei; Euteleostei;
Neoteleostei;
            Acanthomorpha; Acanthopterygii; Percomorpha;
Tetraodontiformes;
            Tetradontoidea; Tetraodontidae; Takifugu.
REFERENCE   1  (bases 1 to 12381)
  AUTHORS   The Fugu Genome Sequencing Consortium.
  TITLE     Direct Submission
  JOURNAL   Submitted (01-JUL-2002) The Fugu Genome Sequencing
Consortium,
            http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
COMMENT     The Takifugu rubripes whole genome shotgun (WGS) project has
the
            project accession CAAB00000000.  This version of the project
(01)
            has the accession number CAAB01000000, and consists of
sequences
            CAAB01000001-CAAB01012381.
FEATURES             Location/Qualifiers
     source          1..12381
                     /organism="Takifugu rubripes"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:31033"
WGS         CAAB01000001-CAAB01012381
//


-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam@umdnj.edu

From kmdaily at indiana.edu  Wed Jan  5 15:48:57 2005
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Wed Jan  5 15:46:08 2005
Subject: [Bioperl-l] reading multiple swissprot records from a single file
Message-ID: <BE45276F7BBEFE49B80193E22C5B1FED01B05FAE@iu-mssg-mbx04.exchange.iu.edu>

I'm having trouble using bioperl to parse a file with multiple (thousands) of swissprot records in them. Is there a way to do this with SeqIO and such? The way I understand it, if I use a filehandle to read in the data, it still is expecting only one record in the file. Can I use a FH to read in a record, which ends with //, then put this variable into a SeqIO object to manpulate it? I need to look at each record and decide if I want to keep it based on the features it has. I have a program using standard parsing techniques but want to do this with bioperl if possible. Thanks for any help.

Kenny Daily
IU School of Informatics
kmdaily at indiana dot edu


From brian_osborne at cognia.com  Wed Jan  5 16:07:28 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Jan  5 16:06:14 2005
Subject: [Bioperl-l] reading multiple swissprot records from a single file
In-Reply-To: <BE45276F7BBEFE49B80193E22C5B1FED01B05FAE@iu-mssg-mbx04.exchange.iu.edu>
Message-ID: <GAEDKMGOKFBLJPKCLKCCOEOKEGAA.brian_osborne@cognia.com>

Kenny,

It would be something like:

use strict;
use Bio::SeqIO;

my $seqio = Bio::SeqIO->(-file => "sprot42.dat", -format => "swiss");

while (my $seqobj = $seqio->next_seq) {
	# you now have a Sequence object, you can check its features
}

This would be the "standard" way. Yes, SeqIO understands a file handle as
well but there's no need to do it that way, I don't think.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily,
Kenneth Michael
Sent: Wednesday, January 05, 2005 3:49 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] reading multiple swissprot records from a single
file


I'm having trouble using bioperl to parse a file with multiple (thousands)
of swissprot records in them. Is there a way to do this with SeqIO and such?
The way I understand it, if I use a filehandle to read in the data, it still
is expecting only one record in the file. Can I use a FH to read in a
record, which ends with //, then put this variable into a SeqIO object to
manpulate it? I need to look at each record and decide if I want to keep it
based on the features it has. I have a program using standard parsing
techniques but want to do this with bioperl if possible. Thanks for any
help.

Kenny Daily
IU School of Informatics
kmdaily at indiana dot edu


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Wed Jan  5 16:36:55 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan  5 16:33:27 2005
Subject: [Bioperl-l] Error parsing Genbank file
In-Reply-To: <007501c4f366$f1a14bb0$a6028a0a@GOLHARMOBILE1>
References: <007501c4f366$f1a14bb0$a6028a0a@GOLHARMOBILE1>
Message-ID: <EBA1F676-5F61-11D9-AC45-000393C44276@duke.edu>

We can't parse WGS files.  The fix it needs is very similar to how we 
handle CONTIG entries if you want to have a go at fixing it.

On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote:

> Hi all,
>
> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on.  The
> entry is just a WGS entry referencing a bunch of other entries.  It 
> does
> on line 492 with the error "Unexpected error in feature table for
> Skipping feature, attempting to recover".
>
> I'm using the following code:
>
> #!/usr/bin/perl
>
> use strict;
> use Bio::SeqIO;
>
> my $usage = "$0 <genbank file> <fasta file>\n";
> my $file = shift or die $usage;
> my $outfilename = shift or die $usage;
>
> my $infile = Bio::SeqIO->new('-file' => "<$file",
> 			    '-format' => "genbank");
>
> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
> 			    '-format' => "fasta");
>
> while (my $seq = $infile->next_seq) {
> #	print STDERR $seq->accession_number,"\n";
> 	
> 	$outfile->write_seq($seq);
> }
>
> Here is the contents of the genbank entry:
>
> LOCUS       CAAB01000000           12381 rc    DNA     linear   VRT
> 22-AUG-2002
> DEFINITION  Takifugu rubripes whole genome shotgun sequencing project.
> ACCESSION   CAAB00000000
> VERSION     CAAB00000000.1  GI:22418063
> KEYWORDS    WGS.
> SOURCE      Takifugu rubripes (Fugu rubripes)
>   ORGANISM  Takifugu rubripes
>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
> Euteleostomi;
>             Actinopterygii; Neopterygii; Teleostei; Euteleostei;
> Neoteleostei;
>             Acanthomorpha; Acanthopterygii; Percomorpha;
> Tetraodontiformes;
>             Tetradontoidea; Tetraodontidae; Takifugu.
> REFERENCE   1  (bases 1 to 12381)
>   AUTHORS   The Fugu Genome Sequencing Consortium.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (01-JUL-2002) The Fugu Genome Sequencing
> Consortium,
>             http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
> COMMENT     The Takifugu rubripes whole genome shotgun (WGS) project 
> has
> the
>             project accession CAAB00000000.  This version of the 
> project
> (01)
>             has the accession number CAAB01000000, and consists of
> sequences
>             CAAB01000001-CAAB01012381.
> FEATURES             Location/Qualifiers
>      source          1..12381
>                      /organism="Takifugu rubripes"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:31033"
> WGS         CAAB01000001-CAAB01012381
> //
>
>
>
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
>
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam@umdnj.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Wed Jan  5 16:44:57 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan  5 16:41:27 2005
Subject: [Bioperl-l] reading multiple swissprot records from a single file
In-Reply-To: <BE45276F7BBEFE49B80193E22C5B1FED01B05FAE@iu-mssg-mbx04.exchange.iu.edu>
References: <BE45276F7BBEFE49B80193E22C5B1FED01B05FAE@iu-mssg-mbx04.exchange.iu.edu>
Message-ID: <0ABA4437-5F63-11D9-AC45-000393C44276@duke.edu>

It reads a stream of data which is delimited by the '//'.  It only 
processes one at a time.  You just keep calling next_seq until it gets 
to the end of the file or filehandle. That is why we typically 
construct the usage with a while loop.

For example if you wanted to make a new file which only had your 
keepers in it.

my $in = Bio::SeqIO->new(-format => 'swiss', -file => 'sprot42.dat');
my $out = Bio::SeqIO->new(-format=> 'swiss', -file =>'>keepers.swiss');

while( my $seq =$in->next_seq ) {
   my $keep = 0;
   for my $feature ($seq->get_SeqFeatures ) {
    # figure out if feature criteria is met, if so, set $keep =1;
   }
   if($keep) {
    $out->write_seq($seq);
   }
}

If you wanted to use a filehandle instead of a file just use the -fh 
parameter instead of -file.  See Bio::Root::IO for more information.

This might be useful if you were streaming in zcat [zcat reads gzipped 
files and produces a stream of the unzipped data].

  open(FH, "zcat sprot42.dat.gz |") || die("could not open file with 
zcat");  # the trailing '|' is necessary to tell perl to pipe the 
output
  my $in = Bio::SeqIO->new(-fh => \*FH, -format=> 'swiss');

OR save the handle in a variable

my $fh;
  open($fh, "zcat sprot42.dat.gz |") || die("could not open file with 
zcat");  # the trailing '|' is necessary to tell perl to pipe the 
output
  my $in = Bio::SeqIO->new(-fh => $fh, -format=> 'swiss');


-jason
On Jan 5, 2005, at 3:48 PM, Daily, Kenneth Michael wrote:

> I'm having trouble using bioperl to parse a file with multiple 
> (thousands) of swissprot records in them. Is there a way to do this 
> with SeqIO and such? The way I understand it, if I use a filehandle to 
> read in the data, it still is expecting only one record in the file. 
> Can I use a FH to read in a record, which ends with //, then put this 
> variable into a SeqIO object to manpulate it? I need to look at each 
> record and decide if I want to keep it based on the features it has. I 
> have a program using standard parsing techniques but want to do this 
> with bioperl if possible. Thanks for any help.
>
> Kenny Daily
> IU School of Informatics
> kmdaily at indiana dot edu
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From allenday at ucla.edu  Wed Jan  5 18:41:43 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan  5 17:39:42 2005
Subject: [Bioperl-l] Bio::Biblio
In-Reply-To: <BEE28BF86078B6429D6C780635718E21B3A085@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E21B3A085@morelia.be.devgen.com>
Message-ID: <Pine.LNX.4.58.0501051539560.10382@sumo.ctrl.ucla.edu>

Jan,

You might consider using the NCBI database via their 'eutils' service 
instead.  Have a look at Bio::DB::Biblio::eutils.  I find it's more 
reliable and that PubMed is more up-to-date and complete than that EBI 
server.

-Allen


On Wed, 5 Jan 2005, Marc Logghe wrote:

> Happy Newyear to you all !
> 
> Jan, there is nothing to worry about concerning your code.
> It is just that there are some electricity problems at the EBI which caused the soap server to be down for while.
> So, most of the soap services are currently unavailable.
> The original message you find on the Taverna mailing list:
> http://sourceforge.net/mailarchive/forum.php?thread_id=6274312&forum_id=35847
> 
> So, I guess there is nothing that you can do besides waiting ;-)
> 
> HTH,
> Marc
> 
> 
> -----Oorspronkelijk bericht-----
> Van: bioperl-l-bounces@portal.open-bio.org namens Jan Teune
> Verzonden: wo 5-1-2005 13:10
> Aan: bioperl-l@bioperl.org
> Onderwerp: [Bioperl-l] Bio::Biblio
>  
> Hello @ all,
> I'm writing a small script to fetch PubMed-Articles. Since two weeks 
> before Christmas, I have a problem to fetch Articles. Below is some Code 
> and the Error-Message:
> 
> #!/usr/bin/perl -w
> use Bio::Biblio;
> my $pmid = "15542139";
> my $biblio = new Bio::Biblio(
>                  -access    => 'soap',
>                  -location    => 'http://industry.ebi.ac.uk/soap/openBQS',
>                  -destroy_on_exit => '0',
>                  );
> my $citation = $biblio->get_by_id($pmid);
> print $citation;
> The Error-Message:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: --- TRANSPORT ERROR ---
> 502 Proxy Error
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:328
> STACK: try{} block /usr/share/perl5/Bio/DB/Biblio/soap.pm:119
> STACK: SOAP::Lite::call /usr/share/perl5/SOAP/Lite.pm:3006
> STACK: try{} block /usr/share/perl5/SOAP/Lite.pm:2950
> STACK: Bio::DB::Biblio::soap::get_by_id 
> /usr/share/perl5/Bio/DB/Biblio/soap.pm:368
> STACK: ./bibliotest.pl:9
> -----------------------------------------------------------
> 
> I'm happy for any kind of help,
> 
> Jan  :-)
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From wes.barris at csiro.au  Wed Jan  5 17:56:08 2005
From: wes.barris at csiro.au (Wes Barris)
Date: Wed Jan  5 17:52:40 2005
Subject: [Bioperl-l] SeqIO fails on masked sequences
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAU3JE01X0+U+HvsvN+JqCZAEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAU3JE01X0+U+HvsvN+JqCZAEAAAAA@ukonline.co.uk>
Message-ID: <41DC7088.7010101@csiro.au>

Nathan Haigh wrote:

> Ok, the "bug" seems to have been introduced in the last update to Bio::PrimarySeq.pm (v1.83) where X was added to the list of
> ambiguous characters in the _guess_alphabet subroutine.
> 
> Brian - do you remember why/what this was for?
> 
> Nathan

Hi Nathan,

I was just curious if you have found anything out regarding this?

> 
> 
> 
>>-----Original Message-----
>>From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
>>Sent: 16 December 2004 10:44
>>To: nathanhaigh@ukonline.co.uk; Wes Barris
>>Cc: Bioperl Mailing List
>>Subject: RE: [Bioperl-l] SeqIO fails on masked sequences
>>
>>
>>>When I use the script you supplied, I get the exception shown below.
>>>
>>>I'll try to get to the bottom of this.
>>>
>>>In the meantime, what OS are you both using and what version
>>>of Bioperl?
>>>
>>
>>Ah, yes that explains. Too much fiddling with PERL5LIB is not good ;-)
>>I did not realize I was acutally using bioperl 1.4.0. There it worked.
>>It fails indeed when using bioperl-release-1-5-0-rc1.
>>Apologies for confusing you people.
>>Cheers,
>>Marc
>>---
>>avast! Antivirus: Inbound message clean.
>>Virus Database (VPS): 0451-1, 14/12/2004
>>Tested on: 16/12/2004 10:47:57
>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>http://www.avast.com
>>
>>
> 
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0451-1, 14/12/2004
> Tested on: 16/12/2004 11:21:54
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 
> 


-- 
Wes Barris
E-Mail: Wes.Barris@csiro.au
From amackey at pcbi.upenn.edu  Wed Jan  5 18:28:24 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Jan  5 18:25:16 2005
Subject: [Bioperl-l] OS X bioperl, staden/read, install problems
In-Reply-To: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu>
References: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu>
Message-ID: <41DC7818.8060104@pcbi.upenn.edu>


Try editing Bio/SeqIO/staden/read.pm to include "-lz" in LIBS

-Aaron

Marcelino Suzuki wrote:

>     Well,  I tried all I could, but I keep getting the same error as  
> srikanth patury:
> 
> http://bioperl.org/pipermail/bioperl-guts-l/2004-June/016855.html
> 
>     Got to the point I got no errors in the make step of bioperl-ext
> 
>     I installed io_lib (1.8.11)
>     I copied os.h and config.h to /usr/local/include/io_lib
>     I changed os.h to remove the "<:" and ">" around config.h
>     I ranlib /usr/local/lib/libread.a
>     after an error message I also
>     ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a
> 
>     Is there any other thing that needs to be done.
> 
>     I am using perl 5.8.1
> 
>     Thanks
> 
>     Marcelino suzuki
> 
>     suzuki at cbl dot umces dot edu
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From rfsouza at cecm.usp.br  Wed Jan  5 16:08:45 2005
From: rfsouza at cecm.usp.br (rfsouza@cecm.usp.br)
Date: Wed Jan  5 18:37:00 2005
Subject: [Bioperl-l] Bug in SeqIO/swiss.pm
Message-ID: <34156.143.107.52.69.1104959325.squirrel@webmail.cecm.usp.br>

Hi,

I have found what might be a bug in the SeqIO parser for Swissprot flat files
(swiss.pm). The error message printed is

Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl
Gezira virus - [Okra-S << HERE hambat]$/ at
/home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/SeqIO/swiss.pm
line 985, <GEN0> line 10.

and the Swissprot entry is pasted below. The problem is a match operator
at line 985:

984 #if the organism belongs to taxid 32644 then no Bio::Species object.
985 return if grep { /^$binomial$/ } @Unknown_names;

I managed to fix this and have swiss.pm to parse the entire Uniprot release
2.1 by adding this line

$binomial =~ s/(\[|\])/\\$1/g;

just before line 985. Would anybody like to add this fix to the CVS
version of swiss.pm? Since this is the only entry which swiss.pm was not
able to
parse, out of 1520915 entries in Uniprot, I was considering if it is not an
annotation error in Uniprot, violating their own standard...

Greeting and happy new year :).
Robson

#==============

ID   Q8UYF6         STANDARD;      PRT;   258 AA.
AC   Q8UYF6;
DT   01-MAR-2002 (TrEMBLrel. 20, Created)
DT   01-MAR-2002 (TrEMBLrel. 20, Last sequence update)
DT   01-MAR-2004 (TrEMBLrel. 26, Last annotation update)
DE   Coat protein.
OS   Cotton leaf curl Gezira virus - [Okra-Shambat].
OC   Viruses; ssDNA viruses; Geminiviridae; Begomovirus.
OX   NCBI_TaxID=268964;
RN   [1]
RP   SEQUENCE FROM N.A.
RA   Idris A.M., Brown J.K.;
RT   "Molecular analysis of cotton leaf curl virus-Sudan reveals an
RT   evolutionary history of recombination.";
RL   Virus Genes 0:0-0(2002).
DR   EMBL; AY036008; AAK64541.1; -.
DR   GO; GO:0019028; C:viral capsid; IEA.
DR   GO; GO:0005198; F:structural molecule activity; IEA.
DR   InterPro; IPR000650; Gem_coat_AR1.
DR   InterPro; IPR000263; GV_A/BR1_coat.
DR   Pfam; PF00844; Gemini_coat; 1.
DR   PRINTS; PR00224; GEMCOATAR1.
DR   PRINTS; PR00223; GEMCOATARBR1.
DR   ProDom; PD000901; Gem_coat_AR1; 1.
KW   Coat protein.
SQ   SEQUENCE   258 AA;  29778 MW;  6FB1960A9D8763DD CRC64;
     MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY
     RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK
     VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV
     LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN
     PVYATLKIRI YFYDSVSN
//


From jason.stajich at duke.edu  Wed Jan  5 20:14:51 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan  5 20:12:01 2005
Subject: [Bioperl-l] Bug in SeqIO/swiss.pm
In-Reply-To: <34156.143.107.52.69.1104959325.squirrel@webmail.cecm.usp.br>
References: <34156.143.107.52.69.1104959325.squirrel@webmail.cecm.usp.br>
Message-ID: <5DA94780-5F80-11D9-A383-000393C44276@duke.edu>

Thanks for the report.  I believe I fixed this in my Nov 22 commit-  
revision 1.84- of Bio/SeqIO/swiss.pm so it will be in bioperl 1.5 or it  
is currently available from the code in CVS.

-jason
On Jan 5, 2005, at 4:08 PM, rfsouza@cecm.usp.br wrote:

> Hi,
>
> I have found what might be a bug in the SeqIO parser for Swissprot  
> flat files
> (swiss.pm). The error message printed is
>
> Invalid [] range "a-S" before HERE mark in regex m/^Cotton leaf curl
> Gezira virus - [Okra-S << HERE hambat]$/ at
> /home/users/rfsouza/projects/geral/lib/perl5/site_perl/5.8.1/Bio/ 
> SeqIO/swiss.pm
> line 985, <GEN0> line 10.
>
> and the Swissprot entry is pasted below. The problem is a match  
> operator
> at line 985:
>
> 984 #if the organism belongs to taxid 32644 then no Bio::Species  
> object.
> 985 return if grep { /^$binomial$/ } @Unknown_names;
>
> I managed to fix this and have swiss.pm to parse the entire Uniprot  
> release
> 2.1 by adding this line
>
> $binomial =~ s/(\[|\])/\\$1/g;
>
> just before line 985. Would anybody like to add this fix to the CVS
> version of swiss.pm? Since this is the only entry which swiss.pm was  
> not
> able to
> parse, out of 1520915 entries in Uniprot, I was considering if it is  
> not an
> annotation error in Uniprot, violating their own standard...
>
> Greeting and happy new year :).
> Robson
>
> #==============
>
> ID   Q8UYF6         STANDARD;      PRT;   258 AA.
> AC   Q8UYF6;
> DT   01-MAR-2002 (TrEMBLrel. 20, Created)
> DT   01-MAR-2002 (TrEMBLrel. 20, Last sequence update)
> DT   01-MAR-2004 (TrEMBLrel. 26, Last annotation update)
> DE   Coat protein.
> OS   Cotton leaf curl Gezira virus - [Okra-Shambat].
> OC   Viruses; ssDNA viruses; Geminiviridae; Begomovirus.
> OX   NCBI_TaxID=268964;
> RN   [1]
> RP   SEQUENCE FROM N.A.
> RA   Idris A.M., Brown J.K.;
> RT   "Molecular analysis of cotton leaf curl virus-Sudan reveals an
> RT   evolutionary history of recombination.";
> RL   Virus Genes 0:0-0(2002).
> DR   EMBL; AY036008; AAK64541.1; -.
> DR   GO; GO:0019028; C:viral capsid; IEA.
> DR   GO; GO:0005198; F:structural molecule activity; IEA.
> DR   InterPro; IPR000650; Gem_coat_AR1.
> DR   InterPro; IPR000263; GV_A/BR1_coat.
> DR   Pfam; PF00844; Gemini_coat; 1.
> DR   PRINTS; PR00224; GEMCOATAR1.
> DR   PRINTS; PR00223; GEMCOATARBR1.
> DR   ProDom; PD000901; Gem_coat_AR1; 1.
> KW   Coat protein.
> SQ   SEQUENCE   258 AA;  29778 MW;  6FB1960A9D8763DD CRC64;
>      MSKRPADIII STPASKVRRR LNFDSPGLSS ARAPTVLVTN KRRSWTNRPT YRKPRMYRMY
>      RSPDVPKGCE GPCKVQSYEQ RDDIKHTGIV RCVSDVTKGV GITHRTGKRF TIKSIYILGK
>      VWMDDNIKKQ NHTNNVMFFL VRDRRPYGNS PLDFGQVFNM FDNEPSTATV KNDLRDHFQV
>      LRKFTATVIG GPSGMKEQAL VRRFYRINSQ IVYNHQEAGK FENHTENAIL LYMACTHASN
>      PVYATLKIRI YFYDSVSN
> //
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From suzuki at cbl.umces.edu  Wed Jan  5 21:20:57 2005
From: suzuki at cbl.umces.edu (Marcelino Suzuki)
Date: Wed Jan  5 21:17:37 2005
Subject: [Bioperl-l] OS X bioperl, staden/read, install problems
In-Reply-To: <41DC7818.8060104@pcbi.upenn.edu>
References: <67117554-5F42-11D9-AC27-0003939E064E@cbl.umces.edu>
	<41DC7818.8060104@pcbi.upenn.edu>
Message-ID: <99CA1AE8-5F89-11D9-AC27-0003939E064E@cbl.umces.edu>

Aaron. Your suggestion worked, but I had to do some tweeking, since  
editing the file Bio/SeqIO/staden/read.pm on the directory that I  
untared from current_ext_stable.tar resulted in no errrors for 'make  
test' and 'make install' but I still got errors running a test file.  I  
edited read.pm as suggested and in  
/Library/Perl/5.8.1/Bio/SeqIO/staden/read.pm the ran the make in that  
directory.  I have no errors so I assume it is finally installed.

This was really quite a puzzle so I hope the following will help  
someone else who is trying to install bioperl-ext under OSX (I guess  
the same is true for other systems).

For someone else is trying to install bioperl-ext :
	
I did not have Inline so:

	sudo perl -MCPAN -e 'install Inline'

then I installed the staden io_lib 1.8.11 that is at  
ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/io_lib/   after untaring in  
/usr/local

	cd /usr/local/io_lib-1.8.11
	./configure
	sudo make
	sudo make install

no problems at all under OSX Panther

After numerous suggestions from the web I moved both os.h and config.h  
from  /usr/local/io_lib-1.8.11 to /usr/local/include/io_lib.  I edited  
os.h to remove the "<:" and ">" around config.h.

Then I untared in /usr/local the current_ext_stable.tar from the  
bioperl distribution, and:

	cd /usr/local/bioperl-ext-1.4
	
++++ I edited (following Aaron Mackey's suggestion)  
/usr/local/bioperl-ext-1.4/Bio/SeqIO/staden/read.pm and added the -lz  
option to LIBS in line 81, then

	ranlib /usr/local/lib/libread.a
	perl Makefile.pl IOLIB_LIB=/usr/local/lib  
IOLIB_INC=/usr/local/include/io_lib
	make
	
I got an error message and did
	
	ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a
	make
	make test
	make install

No errors so I ran a test and still got the same old error:

	The extension 'Bio::SeqIO::staden::read' is not properly installed in  
path:
   '/Library/Perl/5.8.1'

I repeated everything after ++++ above in the directory /Library/Perl

The 'make test' did not work, but now I don't get the error running my  
perl test script
	
Interestingly I tried to install the whole thing the exact same way on  
my powerbook, and I am having yet a different type of error:

like in http://bioperl.org/pipermail/bioperl-l/2004-January/014481.htm


	Marcelino


On Jan 5, 2005, at 6:28 PM, Aaron J. Mackey wrote:

>
> Try editing Bio/SeqIO/staden/read.pm to include "-lz" in LIBS
>
> -Aaron
>
> Marcelino Suzuki wrote:
>
>>     Well,  I tried all I could, but I keep getting the same error as   
>> srikanth patury:
>> http://bioperl.org/pipermail/bioperl-guts-l/2004-June/016855.html
>>     Got to the point I got no errors in the make step of bioperl-ext
>>     I installed io_lib (1.8.11)
>>     I copied os.h and config.h to /usr/local/include/io_lib
>>     I changed os.h to remove the "<:" and ">" around config.h
>>     I ranlib /usr/local/lib/libread.a
>>     after an error message I also
>>     ranlib /usr/local/bioperl-ext-1.4/Bio/Ext/Align/libs/libsw.a
>>     Is there any other thing that needs to be done.
>>     I am using perl 5.8.1
>>     Thanks
>>     Marcelino suzuki
>>     suzuki at cbl dot umces dot edu
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
======================================================================== 
====
             oOOOOo           			Marcelino Suzuki,  Assistant Professor
           oOOO            Chesapeake Biological Lab - Univ of Maryland  
Center Environm Science
        oOOOOOo.          		PO Box 38, One Williams St Solomons, MD 20688
     .oOOOOOOOOOo.                      suzuki@cbl.umces.edu  -   
http://cbl.umces.edu
   .oOOOOOOOOOOOOOOooo..    	 Ph 410-326-7291   FAX 410-326-7341
000000000000000000000000000000000000000000000000000000000000000000000000 
0000

From kishua2000 at hotmail.com  Thu Jan  6 04:26:30 2005
From: kishua2000 at hotmail.com (kishua2000 kishua)
Date: Thu Jan  6 11:31:20 2005
Subject: [Bioperl-l] Can't get length
Message-ID: <BAY23-F7D262208CCF5A24F39ACAC0930@phx.gbl>

An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050106/4fa704d4/attachment.htm
From tex at biocompute.net  Wed Jan  5 20:56:53 2005
From: tex at biocompute.net (James Thompson)
Date: Thu Jan  6 12:05:21 2005
Subject: [Bioperl-l] Can't get length
In-Reply-To: <BAY23-F7D262208CCF5A24F39ACAC0930@phx.gbl>
Message-ID: <Pine.LNX.4.44.0501052054570.19229-100000@biosysadmin.com>

Try using $seq->length() instead of $seq.length(). If you still have problems,
mail the list again and be sure to include your input file. Also, unless you're
using a very large sequence file, use the 'fasta' format rather than the
'largefasta' format.

Cheers,

James Thompson

On Thu, 6 Jan 2005, kishua2000 kishua wrote:

> Hello,
> ?
> I'm using seqIO object to load a DNA?sequence in a?fasta-like format (">"
> + some info, in the header). The header doesn't contain any info about
> the length of the sequence.
> ?
> $in? = Bio::SeqIO->new(-file =>$fastaFile , '-format' => 'largefasta');
> Then I load my sequence to a seq object
> ?
> my $seq = $in->next_seq()
> ?
> but?when I try to get the length of the sequence I get 0
> ?
> $len=$seq.length(); #----> 0
> ?
> so how to get simple?length ?
> ?
> 
> ________________________________________________________________________________
> Don't just search. Find. MSN Search Check out the new MSN Search!
> 

From paulo.david at netvisao.pt  Thu Jan  6 12:10:40 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Thu Jan  6 12:06:43 2005
Subject: [Bioperl-l] Can't get length
In-Reply-To: <BAY23-F7D262208CCF5A24F39ACAC0930@phx.gbl>
References: <BAY23-F7D262208CCF5A24F39ACAC0930@phx.gbl>
Message-ID: <41DD7110.3040005@netvisao.pt>

Hi,

Did you try $seq->length ?

-Paulo

kishua2000 kishua wrote:

> Hello,
>  
> I'm using seqIO object to load a DNA sequence in a fasta-like format 
> (">" + some info, in the header). The header doesn't contain any info 
> about the length of the sequence.
>  
> $in  = Bio::SeqIO->new(-file =>$fastaFile , '-format' => 'largefasta');
> Then I load my sequence to a seq object
>  
> my $seq = $in->next_seq()
>  
> but when I try to get the length of the sequence I get 0
>  
> $len=$seq.length(); #----> 0
>  
> so how to get simple length ?

From golharam at umdnj.edu  Thu Jan  6 16:21:18 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu Jan  6 16:14:07 2005
Subject: [Bioperl-l] Error parsing Genbank file
In-Reply-To: <EBA1F676-5F61-11D9-AC45-000393C44276@duke.edu>
Message-ID: <001101c4f435$a9722290$a6028a0a@GOLHARMOBILE1>

What is the fix for CONTIG entries....

BTW- I'm new to bioperl...

Ryan

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: Wednesday, January 05, 2005 4:37 PM
To: golharam@umdnj.edu
Cc: 'Bioperl List'
Subject: Re: [Bioperl-l] Error parsing Genbank file


We can't parse WGS files.  The fix it needs is very similar to how we 
handle CONTIG entries if you want to have a go at fixing it.

On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote:

> Hi all,
>
> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on.  The 
> entry is just a WGS entry referencing a bunch of other entries.  It 
> does on line 492 with the error "Unexpected error in feature table for
> Skipping feature, attempting to recover".
>
> I'm using the following code:
>
> #!/usr/bin/perl
>
> use strict;
> use Bio::SeqIO;
>
> my $usage = "$0 <genbank file> <fasta file>\n";
> my $file = shift or die $usage;
> my $outfilename = shift or die $usage;
>
> my $infile = Bio::SeqIO->new('-file' => "<$file",
> 			    '-format' => "genbank");
>
> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
> 			    '-format' => "fasta");
>
> while (my $seq = $infile->next_seq) {
> #	print STDERR $seq->accession_number,"\n";
> 	
> 	$outfile->write_seq($seq);
> }
>
> Here is the contents of the genbank entry:
>
> LOCUS       CAAB01000000           12381 rc    DNA     linear   VRT
> 22-AUG-2002
> DEFINITION  Takifugu rubripes whole genome shotgun sequencing project.
> ACCESSION   CAAB00000000
> VERSION     CAAB00000000.1  GI:22418063
> KEYWORDS    WGS.
> SOURCE      Takifugu rubripes (Fugu rubripes)
>   ORGANISM  Takifugu rubripes
>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
> Euteleostomi;
>             Actinopterygii; Neopterygii; Teleostei; Euteleostei; 
> Neoteleostei;
>             Acanthomorpha; Acanthopterygii; Percomorpha; 
> Tetraodontiformes;
>             Tetradontoidea; Tetraodontidae; Takifugu.
> REFERENCE   1  (bases 1 to 12381)
>   AUTHORS   The Fugu Genome Sequencing Consortium.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (01-JUL-2002) The Fugu Genome Sequencing
> Consortium,
>             http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
> COMMENT     The Takifugu rubripes whole genome shotgun (WGS) project 
> has
> the
>             project accession CAAB00000000.  This version of the
> project
> (01)
>             has the accession number CAAB01000000, and consists of
> sequences
>             CAAB01000001-CAAB01012381.
> FEATURES             Location/Qualifiers
>      source          1..12381
>                      /organism="Takifugu rubripes"
>                      /mol_type="genomic DNA"
>                      /db_xref="taxon:31033"
> WGS         CAAB01000001-CAAB01012381
> //
>
>
>
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
>
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam@umdnj.edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Thu Jan  6 17:14:04 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jan  6 17:10:55 2005
Subject: [Bioperl-l] Error parsing Genbank file
In-Reply-To: <001101c4f435$a9722290$a6028a0a@GOLHARMOBILE1>
References: <001101c4f435$a9722290$a6028a0a@GOLHARMOBILE1>
Message-ID: <468B79D3-6030-11D9-BE08-000393C44276@duke.edu>

Fixed in CVS.  You can grab the changes from http://cvs.open-bio.org/


Index: Bio/SeqIO/genbank.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/genbank.pm,v
retrieving revision 1.116
diff -r1.116 genbank.pm
71a72
 >  wgs             - Should contain a Bio::Annotation::SimpleValue 
object
465,466c466
<             last if(($buffer =~ /^BASE/o) || ($buffer =~ /^ORIGIN/o) 
||
<                     ($buffer =~ /^CONTIG/o) );
---
 >             last if( $buffer =~ /^BASE|ORIGIN|CONTIG|WGS/o);
517a518,522
 >         } elsif( s/^WGS\s+// ) {
 >             chomp;
 >             $annotation->add_Annotation(
 >                 'wgs',
 >                 Bio::Annotation::SimpleValue->new(-value => $_));
522c527,528
<         }
---
 >
 >                                      } else { warn($_); }
775a782,788
 >       # deal with WGS
 >       foreach my $wgs ( $seq->annotation->get_Annotations('wgs') ) {
 >           $self->_print(sprintf ("%-11s %s\n",'WGS',
 >                                  $wgs->value));
 >           $self->_show_dna(0);
 >       }
 >
On Jan 6, 2005, at 4:21 PM, Ryan Golhar wrote:

> What is the fix for CONTIG entries....
>
> BTW- I'm new to bioperl...
>
> Ryan
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: Wednesday, January 05, 2005 4:37 PM
> To: golharam@umdnj.edu
> Cc: 'Bioperl List'
> Subject: Re: [Bioperl-l] Error parsing Genbank file
>
>
> We can't parse WGS files.  The fix it needs is very similar to how we
> handle CONTIG entries if you want to have a go at fixing it.
>
> On Jan 5, 2005, at 3:41 PM, Ryan Golhar wrote:
>
>> Hi all,
>>
>> I have a Genbank file that Bio::SeqIO:genbank.pm is choking on.  The
>> entry is just a WGS entry referencing a bunch of other entries.  It
>> does on line 492 with the error "Unexpected error in feature table for
>> Skipping feature, attempting to recover".
>>
>> I'm using the following code:
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use Bio::SeqIO;
>>
>> my $usage = "$0 <genbank file> <fasta file>\n";
>> my $file = shift or die $usage;
>> my $outfilename = shift or die $usage;
>>
>> my $infile = Bio::SeqIO->new('-file' => "<$file",
>> 			    '-format' => "genbank");
>>
>> my $outfile = Bio::SeqIO->new(-'file' => ">$outfilename",
>> 			    '-format' => "fasta");
>>
>> while (my $seq = $infile->next_seq) {
>> #	print STDERR $seq->accession_number,"\n";
>> 	
>> 	$outfile->write_seq($seq);
>> }
>>
>> Here is the contents of the genbank entry:
>>
>> LOCUS       CAAB01000000           12381 rc    DNA     linear   VRT
>> 22-AUG-2002
>> DEFINITION  Takifugu rubripes whole genome shotgun sequencing project.
>> ACCESSION   CAAB00000000
>> VERSION     CAAB00000000.1  GI:22418063
>> KEYWORDS    WGS.
>> SOURCE      Takifugu rubripes (Fugu rubripes)
>>   ORGANISM  Takifugu rubripes
>>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>> Euteleostomi;
>>             Actinopterygii; Neopterygii; Teleostei; Euteleostei;
>> Neoteleostei;
>>             Acanthomorpha; Acanthopterygii; Percomorpha;
>> Tetraodontiformes;
>>             Tetradontoidea; Tetraodontidae; Takifugu.
>> REFERENCE   1  (bases 1 to 12381)
>>   AUTHORS   The Fugu Genome Sequencing Consortium.
>>   TITLE     Direct Submission
>>   JOURNAL   Submitted (01-JUL-2002) The Fugu Genome Sequencing
>> Consortium,
>>             http://www.fugubase.org/ http://www.jgi.doe.gov/fugu
>> COMMENT     The Takifugu rubripes whole genome shotgun (WGS) project
>> has
>> the
>>             project accession CAAB00000000.  This version of the
>> project
>> (01)
>>             has the accession number CAAB01000000, and consists of
>> sequences
>>             CAAB01000001-CAAB01012381.
>> FEATURES             Location/Qualifiers
>>      source          1..12381
>>                      /organism="Takifugu rubripes"
>>                      /mol_type="genomic DNA"
>>                      /db_xref="taxon:31033"
>> WGS         CAAB01000001-CAAB01012381
>> //
>>
>>
>>
>> -----
>> Ryan Golhar
>> Computational Biologist
>> The Informatics Institute at
>> The University of Medicine & Dentistry of NJ
>>
>> Phone: 973-972-5034
>> Fax: 973-972-7412
>> Email: golharam@umdnj.edu
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From Peter.Robinson at t-online.de  Thu Jan  6 15:44:27 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Thu Jan  6 20:02:46 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
	<1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
Message-ID: <1105044266.3084.27.camel@localhost.localdomain>

Dear Bioperlers,

I have started looking at writing some modules to parse the new Entrez
gene, which is kind of an expanded LocusLink. The really interesting
files are species specific and are in the ASN.1 format, and I am still
experimenting around with the best way of parsing them. To get started,
I am looking at the tab-delimited flat files. It seems to me that it
would be interesting to be able to parse gene_info and gene2accession
using the Bio::SeqIO system, the other files such as gene2unigene seem
less suited for this (the latter has just two entries which could be
parsed ad hoc easily enough).

In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
well as a test script (which contains a small excerpt of gene_info in
the data section) for comments and criticism to the list. I am presently
working on another module for Bio::SeqIO::gene2accession and plan to
write a demo script using both modules to convert NCBI accession numbers
to MGI accession numbers (which is something one might want to do in
order to use Gene Ontology for affymetrix data, although one needs
additional work for probesets which are only related to ESTs).

For the moment it seemed better to just parse in the NCBI taxon id into
the Bio::Species object (only this info is supplied by gene_info), and
expect users who need the information to use the taxonomy support of
other Bioperl modules in their scripts.

I will continue to work on parsing the species specific ASN.1 files, but
I will be trying a combination of lex/yacc/C to do this. If that works I
will look into trying perl support for lex/yacc for potential use in
Bioperl, but since I am not sure how long this will take me, I do not
want to scare off anyone else who would like to give this a shot.

best,
peter


On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
> 
> > Hi Jason,
> >
> > thanks for the advice. It seems as if the documentation of
> > Bio::DB::Taxonomy is a bit out of sync.
> >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
> >                                  -nodesfile => $nodesfile,
> >                                  -namesfile => $namefile);
> > What does 'flatfile' refer to here? It is not apparent upon looking at 
> > the code for new.
> >
> See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned 
> in the mail I sent, flatfile is for downloading the taxonomy DB from 
> NCBI.  This lets you run it locally using an indexed  (BerkelyDB via 
> DB_File) version of the file.
> 
> You must need the most up-to-date verion of the modules - works fine 
> for me for both the entrez and flatfile code, but you may have to 
> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1 
> code should work fine.
> 
> 
> 
> > I had somewhat better luck using the entrez version, but I got a 
> > pretty amusing error
> > message:
> >
> > MSG: can't create a species object for Homo sapiens (human) because it
> > isn't a species but is a '' instead
> >
> > ###
> > Full error and a dump of the script follow:
> >
> > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
> > my $taxaid = $db->get_taxonid('Homo sapiens');
> > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
> > print Dumper($species);
> >
> > ###
> >
> > Use of uninitialized value in string eq at
> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> > Use of uninitialized value in sprintf at
> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> >
> > -------------------- WARNING ---------------------
> > MSG: can't create a species object for Homo sapiens (human) because it
> > isn't a species but is a '' instead
> > ---------------------------------------------------
> > Use of uninitialized value in string eq at
> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> > Use of uninitialized value in sprintf at
> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> >
> > -------------------- WARNING ---------------------
> > MSG: can't create a species object for Homo sapiens (human) because it
> > isn't a species but is a '' instead
> > ---------------------------------------------------
> > $VAR1 = {
> >           'TaxId' => '9606',
> >           'Division' => 'mammals',
> >           'GeneNumber' => '32775',
> >           'Rank' => 'species',
> >           'ProtNumber' => '247791',
> >           'ScientificName' => 'Homo sapiens',
> >           'CommonName' => 'human',
> >           'NucNumber' => '9025800',
> >           'GenNumber' => '25',
> >           'StructNumber' => '5638'
> >         };
> > peter@anna:~/programs/bioperlTest$
> >
> >
> > --best, peter
> >
> > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
> >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
> >> species object (or equivalent) using this code.  But you cannot (or
> >> could not when I wrote this, not sure of the current status) get the
> >> full classification from the NCBI taxonomy retrieval via cgi.  i.e. 
> >> you
> >> can only get genus and species for a taxon id and I don't know how to
> >> walk up the hierarchy using the web API.  Earlier emails to NCBI 
> >> seemed
> >> to indicate this is all they intended to provide, but not sure what 
> >> the
> >> current status is.
> >>
> >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI 
> >> Entrez
> >> over HTTP
> >>    my $taxaid = $db->get_taxonid('Homo sapiens');
> >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
> >>
> >> You can get the full classification if you use the
> >> Bio::DB::Taxonomy::flatfile factory which requires you to have
> >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
> >> reliable (and faster) it is what I have tended to use for grouping 
> >> sets
> >> of seqDB search results, etc.
> >>
> >> -jason
> >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
> >>
> >>> Hi Bioperlers, hi Hilmar,
> >>>
> >>> after some thinking I have embarked on a lex/yacc parser for the 
> >>> Entrez
> >>> Gene ASN.1 format as the way of least resistance, although I am not
> >>> sure
> >>> how that would fit in to BioPerl. If anyone is interested in this (or
> >>> has a better idea of how to go about it..), please drop me a line.
> >>>
> >>> In the meantime I have been looking at writing code to parse some of
> >>> the
> >>> "easy" Entrez gene documents, starting off with gene_info. This file
> >>> includes the NCBI taxon id for each entry. I would like to convert 
> >>> this
> >>> to a Bio::Species object to pass to the following
> >>> 	my $seq = $self->sequence_factory->create(
> >>> 			     -verbose => $self->verbose(),
> >>> 			     -accession_number => $geneID,
> >>> 			     -desc => $description,
> >>> 			     -display_id => $symbol,
> >>> 			     -species =>  ???
> >>> 			     -annotation => $ann);
> >>>
> >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
> >>> this sort of thing. However, the code for that is pretty preliminary.
> >>> Is
> >>> anyone working on this at the moment? Or is there a better way of 
> >>> doing
> >>> this (it seems a shame not to provide the actual species name if one
> >>> has
> >>> the taxid...)
> >>>
> >>> best
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
> >>>> Great to hear that someone is giving this a shot. Yes at this point 
> >>>> is
> >>>> appears that NCBI is only offering the ASN.1, not a conversion to 
> >>>> XML.
> >>>> Their asn2xml tool will not work with this ASN.1 format either, just
> >>>> checked it to be sure. They do seem to be mulling the option of XML
> >>>> though on the Gene FAQ. Maybe if enough people get in their ears 
> >>>> they
> >>>> will spend some effort towards that. After all, the entrez gene web
> >>>> interface can display XML on demand - even though it looks fairly
> >>>> hideous.
> >>>>
> >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in
> >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two
> >>>> years ago that I could find ... doesn't make me feel warm and fuzzy.
> >>>>
> >>>> In the absence of any XML available from NCBI, gene_info might be 
> >>>> the
> >>>> best start. An option could be to check for the presence of the 
> >>>> other
> >>>> tab-delimited files and use those that are present. These are
> >>>> tab-delimited and hence the format itself is trivial so you can 
> >>>> focus
> >>>> entirely on setting up a Bio::Seq plus annotation that's
> >>>> comparable/compatible to what the current SeqIO::locuslink does.
> >>>>
> >>>> My $0.02 (worth less and less almost every day).
> >>>>
> >>>> 	-hilmar
> >>>>
> >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I have been thinking about given a BioPerl EntrezGene parser a try
> >>>>> since
> >>>>> I have been a heavy user of locus link to date. One issue is that 
> >>>>> the
> >>>>> files that correspond to LL_tmpl (which was a flat file) are now in
> >>>>> asn
> >>>>> format
> >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
> >>>>> genehelp.html#query
> >>>>> Although I saw some mention of ASN support in Bioperl by googling, 
> >>>>> I
> >>>>> can't seem to find any module that does this in the present
> >>>>> distribution. What is the status on that? In any case, I will be
> >>>>> working
> >>>>> on this in the next month or two and if anything nice comes of it I
> >>>>> will
> >>>>> send it to you / BioPerpl.
> >>>>>
> >>>>> best wishes & happy holidays
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
> >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
> >>>>>> parsing
> >>>>>> any input file, what you're asking is whether or not there is a
> >>>>>> SeqIO
> >>>>>> parser for NCBI Gene.
> >>>>>>
> >>>>>> The answer to that question is no, not yet. Anybody who feels
> >>>>>> motivated
> >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the
> >>>>>> parser if nobody else does within the next 3 months, but I'm not
> >>>>>> going
> >>>>>> to promise when exactly this will happen.
> >>>>>>
> >>>>>> 	-hilmar
> >>>>>>
> >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I was wondering with regards to bioperl-db the scripts and schema
> >>>>>>> and
> >>>>>>> load_seqdatabase.pl has there been preparation for integration of
> >>>>>>> Entrez
> >>>>>>> gene information when locuslink is phased out?  Or if it has
> >>>>>>> already
> >>>>>>> been
> >>>>>>> changed could somebody point
> >>>>>>> me to the documentation or changed code?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Annie.
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l@portal.open-bio.org
> >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>>
> >>>>> -- 
> >>>>> Peter N. Robinson
> >>>>> peter.robinson@t-online.de
> >>>>> peter.robinson@charite.de
> >>>>> http://www.charite.de/ch/medgen/robinson/
> >>>>>
> >>>>>
> >>> -- 
> >>> Peter N. Robinson
> >>> peter.robinson@t-online.de
> >>> peter.robinson@charite.de
> >>> http://www.charite.de/ch/medgen/robinson/
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >> --
> >> Jason Stajich
> >> jason.stajich at duke.edu
> >> http://www.duke.edu/~jes12/
> > -- 
> > Peter N. Robinson
> > peter.robinson@t-online.de
> > peter.robinson@charite.de
> > http://www.charite.de/ch/medgen/robinson/
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geneinfo.pm
Type: application/x-perl
Size: 10931 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050106/6754c375/geneinfo.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geneinfotest.pl
Type: application/x-perl
Size: 11184 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050106/6754c375/geneinfotest.bin
From skirov at utk.edu  Thu Jan  6 21:33:05 2005
From: skirov at utk.edu (Stefan A Kirov)
Date: Thu Jan  6 21:29:57 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <1105044266.3084.27.camel@localhost.localdomain>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
	<1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
	<1105044266.3084.27.camel@localhost.localdomain>
Message-ID: <Pine.GSO.4.58.0501062026170.1209@moe.usg.utk.edu>

Peter,
Why unigene can't be added as Bio::Annotation object for example? Peter,
would you mind if I give you a hand, as I am also doing some Entrez Gene
DB parsing.
Hilmar,
Getting back to your post, I have some concern about automatic
parsing of multiple files (if I got this right...). Say if one downloads
the whole Entrez Gene stuff and all is OK I don't see why this can't be
done. But if something goes wrong (and occasionally it will), it will be
really hard for the user to understand he misses parts of the data. Of
course this could be done through warnings, but what about people who
intentionally parse part of the DB? I guess one can add something like
-suppress_warning=>1/0.
Another issue that comes to mind is the approach of a stream is fine for
people with the whole DB on their minds. But of you need particular
record, I guess you you could index the files, but this totally different
game. Any volunteers?


On Thu, 6 Jan 2005, Peter Robinson wrote:

>Dear Bioperlers,
>
>I have started looking at writing some modules to parse the new Entrez
>gene, which is kind of an expanded LocusLink. The really interesting
>files are species specific and are in the ASN.1 format, and I am still
>experimenting around with the best way of parsing them. To get started,
>I am looking at the tab-delimited flat files. It seems to me that it
>would be interesting to be able to parse gene_info and gene2accession
>using the Bio::SeqIO system, the other files such as gene2unigene seem
>less suited for this (the latter has just two entries which could be
>parsed ad hoc easily enough).
>
>In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
>well as a test script (which contains a small excerpt of gene_info in
>the data section) for comments and criticism to the list. I am presently
>working on another module for Bio::SeqIO::gene2accession and plan to
>write a demo script using both modules to convert NCBI accession numbers
>to MGI accession numbers (which is something one might want to do in
>order to use Gene Ontology for affymetrix data, although one needs
>additional work for probesets which are only related to ESTs).
>
>For the moment it seemed better to just parse in the NCBI taxon id into
>the Bio::Species object (only this info is supplied by gene_info), and
>expect users who need the information to use the taxonomy support of
>other Bioperl modules in their scripts.
>
>I will continue to work on parsing the species specific ASN.1 files, but
>I will be trying a combination of lex/yacc/C to do this. If that works I
>will look into trying perl support for lex/yacc for potential use in
>Bioperl, but since I am not sure how long this will take me, I do not
>want to scare off anyone else who would like to give this a shot.
>
>best,
>peter
>
>
>On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
>> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
>>
>> > Hi Jason,
>> >
>> > thanks for the advice. It seems as if the documentation of
>> > Bio::DB::Taxonomy is a bit out of sync.
>> >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>> >                                  -nodesfile => $nodesfile,
>> >                                  -namesfile => $namefile);
>> > What does 'flatfile' refer to here? It is not apparent upon looking at
>> > the code for new.
>> >
>> See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned
>> in the mail I sent, flatfile is for downloading the taxonomy DB from
>> NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
>> DB_File) version of the file.
>>
>> You must need the most up-to-date verion of the modules - works fine
>> for me for both the entrez and flatfile code, but you may have to
>> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1
>> code should work fine.
>>
>>
>>
>> > I had somewhat better luck using the entrez version, but I got a
>> > pretty amusing error
>> > message:
>> >
>> > MSG: can't create a species object for Homo sapiens (human) because it
>> > isn't a species but is a '' instead
>> >
>> > ###
>> > Full error and a dump of the script follow:
>> >
>> > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
>> > my $taxaid = $db->get_taxonid('Homo sapiens');
>> > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
>> > print Dumper($species);
>> >
>> > ###
>> >
>> > Use of uninitialized value in string eq at
>> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>> > Use of uninitialized value in sprintf at
>> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>> >
>> > -------------------- WARNING ---------------------
>> > MSG: can't create a species object for Homo sapiens (human) because it
>> > isn't a species but is a '' instead
>> > ---------------------------------------------------
>> > Use of uninitialized value in string eq at
>> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>> > Use of uninitialized value in sprintf at
>> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>> >
>> > -------------------- WARNING ---------------------
>> > MSG: can't create a species object for Homo sapiens (human) because it
>> > isn't a species but is a '' instead
>> > ---------------------------------------------------
>> > $VAR1 = {
>> >           'TaxId' => '9606',
>> >           'Division' => 'mammals',
>> >           'GeneNumber' => '32775',
>> >           'Rank' => 'species',
>> >           'ProtNumber' => '247791',
>> >           'ScientificName' => 'Homo sapiens',
>> >           'CommonName' => 'human',
>> >           'NucNumber' => '9025800',
>> >           'GenNumber' => '25',
>> >           'StructNumber' => '5638'
>> >         };
>> > peter@anna:~/programs/bioperlTest$
>> >
>> >
>> > --best, peter
>> >
>> > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
>> >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
>> >> species object (or equivalent) using this code.  But you cannot (or
>> >> could not when I wrote this, not sure of the current status) get the
>> >> full classification from the NCBI taxonomy retrieval via cgi.  i.e.
>> >> you
>> >> can only get genus and species for a taxon id and I don't know how to
>> >> walk up the hierarchy using the web API.  Earlier emails to NCBI
>> >> seemed
>> >> to indicate this is all they intended to provide, but not sure what
>> >> the
>> >> current status is.
>> >>
>> >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
>> >> Entrez
>> >> over HTTP
>> >>    my $taxaid = $db->get_taxonid('Homo sapiens');
>> >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
>> >>
>> >> You can get the full classification if you use the
>> >> Bio::DB::Taxonomy::flatfile factory which requires you to have
>> >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
>> >> reliable (and faster) it is what I have tended to use for grouping
>> >> sets
>> >> of seqDB search results, etc.
>> >>
>> >> -jason
>> >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
>> >>
>> >>> Hi Bioperlers, hi Hilmar,
>> >>>
>> >>> after some thinking I have embarked on a lex/yacc parser for the
>> >>> Entrez
>> >>> Gene ASN.1 format as the way of least resistance, although I am not
>> >>> sure
>> >>> how that would fit in to BioPerl. If anyone is interested in this (or
>> >>> has a better idea of how to go about it..), please drop me a line.
>> >>>
>> >>> In the meantime I have been looking at writing code to parse some of
>> >>> the
>> >>> "easy" Entrez gene documents, starting off with gene_info. This file
>> >>> includes the NCBI taxon id for each entry. I would like to convert
>> >>> this
>> >>> to a Bio::Species object to pass to the following
>> >>> 	my $seq = $self->sequence_factory->create(
>> >>> 			     -verbose => $self->verbose(),
>> >>> 			     -accession_number => $geneID,
>> >>> 			     -desc => $description,
>> >>> 			     -display_id => $symbol,
>> >>> 			     -species =>  ???
>> >>> 			     -annotation => $ann);
>> >>>
>> >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
>> >>> this sort of thing. However, the code for that is pretty preliminary.
>> >>> Is
>> >>> anyone working on this at the moment? Or is there a better way of
>> >>> doing
>> >>> this (it seems a shame not to provide the actual species name if one
>> >>> has
>> >>> the taxid...)
>> >>>
>> >>> best
>> >>>
>> >>> Peter
>> >>>
>> >>>
>> >>>
>> >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
>> >>>> Great to hear that someone is giving this a shot. Yes at this point
>> >>>> is
>> >>>> appears that NCBI is only offering the ASN.1, not a conversion to
>> >>>> XML.
>> >>>> Their asn2xml tool will not work with this ASN.1 format either, just
>> >>>> checked it to be sure. They do seem to be mulling the option of XML
>> >>>> though on the Gene FAQ. Maybe if enough people get in their ears
>> >>>> they
>> >>>> will spend some effort towards that. After all, the entrez gene web
>> >>>> interface can display XML on demand - even though it looks fairly
>> >>>> hideous.
>> >>>>
>> >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in
>> >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two
>> >>>> years ago that I could find ... doesn't make me feel warm and fuzzy.
>> >>>>
>> >>>> In the absence of any XML available from NCBI, gene_info might be
>> >>>> the
>> >>>> best start. An option could be to check for the presence of the
>> >>>> other
>> >>>> tab-delimited files and use those that are present. These are
>> >>>> tab-delimited and hence the format itself is trivial so you can
>> >>>> focus
>> >>>> entirely on setting up a Bio::Seq plus annotation that's
>> >>>> comparable/compatible to what the current SeqIO::locuslink does.
>> >>>>
>> >>>> My $0.02 (worth less and less almost every day).
>> >>>>
>> >>>> 	-hilmar
>> >>>>
>> >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I have been thinking about given a BioPerl EntrezGene parser a try
>> >>>>> since
>> >>>>> I have been a heavy user of locus link to date. One issue is that
>> >>>>> the
>> >>>>> files that correspond to LL_tmpl (which was a flat file) are now in
>> >>>>> asn
>> >>>>> format
>> >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
>> >>>>> genehelp.html#query
>> >>>>> Although I saw some mention of ASN support in Bioperl by googling,
>> >>>>> I
>> >>>>> can't seem to find any module that does this in the present
>> >>>>> distribution. What is the status on that? In any case, I will be
>> >>>>> working
>> >>>>> on this in the next month or two and if anything nice comes of it I
>> >>>>> will
>> >>>>> send it to you / BioPerpl.
>> >>>>>
>> >>>>> best wishes & happy holidays
>> >>>>>
>> >>>>> Peter
>> >>>>>
>> >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
>> >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
>> >>>>>> parsing
>> >>>>>> any input file, what you're asking is whether or not there is a
>> >>>>>> SeqIO
>> >>>>>> parser for NCBI Gene.
>> >>>>>>
>> >>>>>> The answer to that question is no, not yet. Anybody who feels
>> >>>>>> motivated
>> >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the
>> >>>>>> parser if nobody else does within the next 3 months, but I'm not
>> >>>>>> going
>> >>>>>> to promise when exactly this will happen.
>> >>>>>>
>> >>>>>> 	-hilmar
>> >>>>>>
>> >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I was wondering with regards to bioperl-db the scripts and schema
>> >>>>>>> and
>> >>>>>>> load_seqdatabase.pl has there been preparation for integration of
>> >>>>>>> Entrez
>> >>>>>>> gene information when locuslink is phased out?  Or if it has
>> >>>>>>> already
>> >>>>>>> been
>> >>>>>>> changed could somebody point
>> >>>>>>> me to the documentation or changed code?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Annie.
>> >>>>>>> _______________________________________________
>> >>>>>>> Bioperl-l mailing list
>> >>>>>>> Bioperl-l@portal.open-bio.org
>> >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> >>>>>>>
>> >>>>>>>
>> >>>>> --
>> >>>>> Peter N. Robinson
>> >>>>> peter.robinson@t-online.de
>> >>>>> peter.robinson@charite.de
>> >>>>> http://www.charite.de/ch/medgen/robinson/
>> >>>>>
>> >>>>>
>> >>> --
>> >>> Peter N. Robinson
>> >>> peter.robinson@t-online.de
>> >>> peter.robinson@charite.de
>> >>> http://www.charite.de/ch/medgen/robinson/
>> >>>
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l@portal.open-bio.org
>> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> >>>
>> >>>
>> >> --
>> >> Jason Stajich
>> >> jason.stajich at duke.edu
>> >> http://www.duke.edu/~jes12/
>> > --
>> > Peter N. Robinson
>> > peter.robinson@t-online.de
>> > peter.robinson@charite.de
>> > http://www.charite.de/ch/medgen/robinson/
>> >
>> >
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>--
>Peter N. Robinson
>peter.robinson@t-online.de
>peter.robinson@charite.de
>http://www.charite.de/ch/medgen/robinson/
>
From Peter.Robinson at t-online.de  Fri Jan  7 01:51:03 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Fri Jan  7 01:47:20 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <Pine.GSO.4.58.0501062026170.1209@moe.usg.utk.edu>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
	<1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
	<1105044266.3084.27.camel@localhost.localdomain>
	<Pine.GSO.4.58.0501062026170.1209@moe.usg.utk.edu>
Message-ID: <1105080663.3142.16.camel@localhost.localdomain>

Hi Stefan,
happy to team up with you for Entrez Gene parsing. Since gene2unigene
has entries of the form "geneid\tunigeneid", it didnt seem worth the
trouble putting this information into a Bio::Annotation object in
isolation. On the other hand, parsing multiple Entrez Gene files at once
in order to synthesize various forms of infomration about an Entrez Gene
id seemed to depart from the style of the rest of Bio::SeqIO code.

Suggestions/thoughts, anyone?

-peter

On Fri, 2005-01-07 at 03:33, Stefan A Kirov wrote:
> Peter,
> Why unigene can't be added as Bio::Annotation object for example? Peter,
> would you mind if I give you a hand, as I am also doing some Entrez Gene
> DB parsing.
> Hilmar,
> Getting back to your post, I have some concern about automatic
> parsing of multiple files (if I got this right...). Say if one downloads
> the whole Entrez Gene stuff and all is OK I don't see why this can't be
> done. But if something goes wrong (and occasionally it will), it will be
> really hard for the user to understand he misses parts of the data. Of
> course this could be done through warnings, but what about people who
> intentionally parse part of the DB? I guess one can add something like
> -suppress_warning=>1/0.
> Another issue that comes to mind is the approach of a stream is fine for
> people with the whole DB on their minds. But of you need particular
> record, I guess you you could index the files, but this totally different
> game. Any volunteers?
> 
> 
> On Thu, 6 Jan 2005, Peter Robinson wrote:
> 
> >Dear Bioperlers,
> >
> >I have started looking at writing some modules to parse the new Entrez
> >gene, which is kind of an expanded LocusLink. The really interesting
> >files are species specific and are in the ASN.1 format, and I am still
> >experimenting around with the best way of parsing them. To get started,
> >I am looking at the tab-delimited flat files. It seems to me that it
> >would be interesting to be able to parse gene_info and gene2accession
> >using the Bio::SeqIO system, the other files such as gene2unigene seem
> >less suited for this (the latter has just two entries which could be
> >parsed ad hoc easily enough).
> >
> >In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
> >well as a test script (which contains a small excerpt of gene_info in
> >the data section) for comments and criticism to the list. I am presently
> >working on another module for Bio::SeqIO::gene2accession and plan to
> >write a demo script using both modules to convert NCBI accession numbers
> >to MGI accession numbers (which is something one might want to do in
> >order to use Gene Ontology for affymetrix data, although one needs
> >additional work for probesets which are only related to ESTs).
> >
> >For the moment it seemed better to just parse in the NCBI taxon id into
> >the Bio::Species object (only this info is supplied by gene_info), and
> >expect users who need the information to use the taxonomy support of
> >other Bioperl modules in their scripts.
> >
> >I will continue to work on parsing the species specific ASN.1 files, but
> >I will be trying a combination of lex/yacc/C to do this. If that works I
> >will look into trying perl support for lex/yacc for potential use in
> >Bioperl, but since I am not sure how long this will take me, I do not
> >want to scare off anyone else who would like to give this a shot.
> >
> >best,
> >peter
> >
> >
> >On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
> >> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > thanks for the advice. It seems as if the documentation of
> >> > Bio::DB::Taxonomy is a bit out of sync.
> >> >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
> >> >                                  -nodesfile => $nodesfile,
> >> >                                  -namesfile => $namefile);
> >> > What does 'flatfile' refer to here? It is not apparent upon looking at
> >> > the code for new.
> >> >
> >> See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned
> >> in the mail I sent, flatfile is for downloading the taxonomy DB from
> >> NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
> >> DB_File) version of the file.
> >>
> >> You must need the most up-to-date verion of the modules - works fine
> >> for me for both the entrez and flatfile code, but you may have to
> >> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1
> >> code should work fine.
> >>
> >>
> >>
> >> > I had somewhat better luck using the entrez version, but I got a
> >> > pretty amusing error
> >> > message:
> >> >
> >> > MSG: can't create a species object for Homo sapiens (human) because it
> >> > isn't a species but is a '' instead
> >> >
> >> > ###
> >> > Full error and a dump of the script follow:
> >> >
> >> > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
> >> > my $taxaid = $db->get_taxonid('Homo sapiens');
> >> > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
> >> > print Dumper($species);
> >> >
> >> > ###
> >> >
> >> > Use of uninitialized value in string eq at
> >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> >> > Use of uninitialized value in sprintf at
> >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> >> >
> >> > -------------------- WARNING ---------------------
> >> > MSG: can't create a species object for Homo sapiens (human) because it
> >> > isn't a species but is a '' instead
> >> > ---------------------------------------------------
> >> > Use of uninitialized value in string eq at
> >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> >> > Use of uninitialized value in sprintf at
> >> > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> >> >
> >> > -------------------- WARNING ---------------------
> >> > MSG: can't create a species object for Homo sapiens (human) because it
> >> > isn't a species but is a '' instead
> >> > ---------------------------------------------------
> >> > $VAR1 = {
> >> >           'TaxId' => '9606',
> >> >           'Division' => 'mammals',
> >> >           'GeneNumber' => '32775',
> >> >           'Rank' => 'species',
> >> >           'ProtNumber' => '247791',
> >> >           'ScientificName' => 'Homo sapiens',
> >> >           'CommonName' => 'human',
> >> >           'NucNumber' => '9025800',
> >> >           'GenNumber' => '25',
> >> >           'StructNumber' => '5638'
> >> >         };
> >> > peter@anna:~/programs/bioperlTest$
> >> >
> >> >
> >> > --best, peter
> >> >
> >> > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
> >> >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
> >> >> species object (or equivalent) using this code.  But you cannot (or
> >> >> could not when I wrote this, not sure of the current status) get the
> >> >> full classification from the NCBI taxonomy retrieval via cgi.  i.e.
> >> >> you
> >> >> can only get genus and species for a taxon id and I don't know how to
> >> >> walk up the hierarchy using the web API.  Earlier emails to NCBI
> >> >> seemed
> >> >> to indicate this is all they intended to provide, but not sure what
> >> >> the
> >> >> current status is.
> >> >>
> >> >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
> >> >> Entrez
> >> >> over HTTP
> >> >>    my $taxaid = $db->get_taxonid('Homo sapiens');
> >> >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
> >> >>
> >> >> You can get the full classification if you use the
> >> >> Bio::DB::Taxonomy::flatfile factory which requires you to have
> >> >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
> >> >> reliable (and faster) it is what I have tended to use for grouping
> >> >> sets
> >> >> of seqDB search results, etc.
> >> >>
> >> >> -jason
> >> >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
> >> >>
> >> >>> Hi Bioperlers, hi Hilmar,
> >> >>>
> >> >>> after some thinking I have embarked on a lex/yacc parser for the
> >> >>> Entrez
> >> >>> Gene ASN.1 format as the way of least resistance, although I am not
> >> >>> sure
> >> >>> how that would fit in to BioPerl. If anyone is interested in this (or
> >> >>> has a better idea of how to go about it..), please drop me a line.
> >> >>>
> >> >>> In the meantime I have been looking at writing code to parse some of
> >> >>> the
> >> >>> "easy" Entrez gene documents, starting off with gene_info. This file
> >> >>> includes the NCBI taxon id for each entry. I would like to convert
> >> >>> this
> >> >>> to a Bio::Species object to pass to the following
> >> >>> 	my $seq = $self->sequence_factory->create(
> >> >>> 			     -verbose => $self->verbose(),
> >> >>> 			     -accession_number => $geneID,
> >> >>> 			     -desc => $description,
> >> >>> 			     -display_id => $symbol,
> >> >>> 			     -species =>  ???
> >> >>> 			     -annotation => $ann);
> >> >>>
> >> >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
> >> >>> this sort of thing. However, the code for that is pretty preliminary.
> >> >>> Is
> >> >>> anyone working on this at the moment? Or is there a better way of
> >> >>> doing
> >> >>> this (it seems a shame not to provide the actual species name if one
> >> >>> has
> >> >>> the taxid...)
> >> >>>
> >> >>> best
> >> >>>
> >> >>> Peter
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
> >> >>>> Great to hear that someone is giving this a shot. Yes at this point
> >> >>>> is
> >> >>>> appears that NCBI is only offering the ASN.1, not a conversion to
> >> >>>> XML.
> >> >>>> Their asn2xml tool will not work with this ASN.1 format either, just
> >> >>>> checked it to be sure. They do seem to be mulling the option of XML
> >> >>>> though on the Gene FAQ. Maybe if enough people get in their ears
> >> >>>> they
> >> >>>> will spend some effort towards that. After all, the entrez gene web
> >> >>>> interface can display XML on demand - even though it looks fairly
> >> >>>> hideous.
> >> >>>>
> >> >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in
> >> >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two
> >> >>>> years ago that I could find ... doesn't make me feel warm and fuzzy.
> >> >>>>
> >> >>>> In the absence of any XML available from NCBI, gene_info might be
> >> >>>> the
> >> >>>> best start. An option could be to check for the presence of the
> >> >>>> other
> >> >>>> tab-delimited files and use those that are present. These are
> >> >>>> tab-delimited and hence the format itself is trivial so you can
> >> >>>> focus
> >> >>>> entirely on setting up a Bio::Seq plus annotation that's
> >> >>>> comparable/compatible to what the current SeqIO::locuslink does.
> >> >>>>
> >> >>>> My $0.02 (worth less and less almost every day).
> >> >>>>
> >> >>>> 	-hilmar
> >> >>>>
> >> >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> I have been thinking about given a BioPerl EntrezGene parser a try
> >> >>>>> since
> >> >>>>> I have been a heavy user of locus link to date. One issue is that
> >> >>>>> the
> >> >>>>> files that correspond to LL_tmpl (which was a flat file) are now in
> >> >>>>> asn
> >> >>>>> format
> >> >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
> >> >>>>> genehelp.html#query
> >> >>>>> Although I saw some mention of ASN support in Bioperl by googling,
> >> >>>>> I
> >> >>>>> can't seem to find any module that does this in the present
> >> >>>>> distribution. What is the status on that? In any case, I will be
> >> >>>>> working
> >> >>>>> on this in the next month or two and if anything nice comes of it I
> >> >>>>> will
> >> >>>>> send it to you / BioPerpl.
> >> >>>>>
> >> >>>>> best wishes & happy holidays
> >> >>>>>
> >> >>>>> Peter
> >> >>>>>
> >> >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
> >> >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
> >> >>>>>> parsing
> >> >>>>>> any input file, what you're asking is whether or not there is a
> >> >>>>>> SeqIO
> >> >>>>>> parser for NCBI Gene.
> >> >>>>>>
> >> >>>>>> The answer to that question is no, not yet. Anybody who feels
> >> >>>>>> motivated
> >> >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the
> >> >>>>>> parser if nobody else does within the next 3 months, but I'm not
> >> >>>>>> going
> >> >>>>>> to promise when exactly this will happen.
> >> >>>>>>
> >> >>>>>> 	-hilmar
> >> >>>>>>
> >> >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
> >> >>>>>>
> >> >>>>>>> Hi,
> >> >>>>>>>
> >> >>>>>>> I was wondering with regards to bioperl-db the scripts and schema
> >> >>>>>>> and
> >> >>>>>>> load_seqdatabase.pl has there been preparation for integration of
> >> >>>>>>> Entrez
> >> >>>>>>> gene information when locuslink is phased out?  Or if it has
> >> >>>>>>> already
> >> >>>>>>> been
> >> >>>>>>> changed could somebody point
> >> >>>>>>> me to the documentation or changed code?
> >> >>>>>>>
> >> >>>>>>> Thanks,
> >> >>>>>>> Annie.
> >> >>>>>>> _______________________________________________
> >> >>>>>>> Bioperl-l mailing list
> >> >>>>>>> Bioperl-l@portal.open-bio.org
> >> >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>> --
> >> >>>>> Peter N. Robinson
> >> >>>>> peter.robinson@t-online.de
> >> >>>>> peter.robinson@charite.de
> >> >>>>> http://www.charite.de/ch/medgen/robinson/
> >> >>>>>
> >> >>>>>
> >> >>> --
> >> >>> Peter N. Robinson
> >> >>> peter.robinson@t-online.de
> >> >>> peter.robinson@charite.de
> >> >>> http://www.charite.de/ch/medgen/robinson/
> >> >>>
> >> >>> _______________________________________________
> >> >>> Bioperl-l mailing list
> >> >>> Bioperl-l@portal.open-bio.org
> >> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >> >>>
> >> >>>
> >> >> --
> >> >> Jason Stajich
> >> >> jason.stajich at duke.edu
> >> >> http://www.duke.edu/~jes12/
> >> > --
> >> > Peter N. Robinson
> >> > peter.robinson@t-online.de
> >> > peter.robinson@charite.de
> >> > http://www.charite.de/ch/medgen/robinson/
> >> >
> >> >
> >> --
> >> Jason Stajich
> >> jason.stajich at duke.edu
> >> http://www.duke.edu/~jes12/
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >--
> >Peter N. Robinson
> >peter.robinson@t-online.de
> >peter.robinson@charite.de
> >http://www.charite.de/ch/medgen/robinson/
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From michael.watson at bbsrc.ac.uk  Fri Jan  7 05:11:13 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri Jan  7 05:11:05 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AA5@iahce2knas1.iah.bbsrc.reserved>

Hi

I have looked through the archives and this problem did come up once
before, but without resolution (as far as I can see).  I'm using
bioperl-1.4 and NCBI blast with the -m option, using SearchIO and the
blasttable format.

What I see is this:

------------- EXCEPTION  -------------
MSG: Undefined sub-sequence (3264,3268). Valid range = 3252 - 3268
STACK Bio::Search::HSP::HSPI::matches
/usr/local/bioperl-1.4/Bio/Search/HSP/HSPI.pm:711
STACK (eval) /usr/local/bioperl-1.4/Bio/Search/SearchUtils.pm:365
STACK Bio::Search::SearchUtils::_adjust_contigs
/usr/local/bioperl-1.4/Bio/Search/SearchUtils.pm:364
STACK Bio::Search::SearchUtils::tile_hsps
/usr/local/bioperl-1.4/Bio/Search/SearchUtils.pm:170
STACK Bio::Search::Hit::GenericHit::start
/usr/local/bioperl-1.4/Bio/Search/Hit/GenericHit.pm:899
STACK main::parse_blast ../split_and_blast.pl:65
STACK toplevel ../split_and_blast.pl:32

--------------------------------------

The code is as you would expect:

while (my $result = $searchio->next_result) {
        while(my $hit = $result->next_hit) {
                my $start  = $hit->start;

And it is that call to $hit->start that sets off the whole trace.

Any ideas?

Thanks
Mick

Michael Watson
Head of Informatics
Institute for Animal Health,
Compton Laboratory,
Compton,
Newbury,
Berkshire RG20 7NN
UK

Phone : +44 (0)1635 578411 ext. 2535
Mobile: +44 (0)7990 827831
E-mail: michael.watson@bbsrc.ac.uk
  

From Marc.Logghe at devgen.com  Fri Jan  7 05:41:12 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Fri Jan  7 05:38:02 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
Message-ID: <BEE28BF86078B6429D6C780635718E219050A3@morelia.be.devgen.com>

> while (my $result = $searchio->next_result) {
>         while(my $hit = $result->next_hit) {
>                 my $start  = $hit->start;
> 
> And it is that call to $hit->start that sets off the whole trace.
> 
> Any ideas?


Hi Mick,
Have you tried one of these ?:

my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is same as 'sbjct'

or 

my $start = $hit->hsp->start('sbjct');


I think in all cases it defaults to 'query'. So it should not crash but give you the start position of the query.
I am afraid I can't explain the crash, sorry.

Marc

From michael.watson at bbsrc.ac.uk  Fri Jan  7 05:50:08 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri Jan  7 05:49:00 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
Message-ID: <8975119BCD0AC5419D61A9CF1A923E950121B91C@iahce2knas1.iah.bbsrc.reserved>

Hi

Having done some more tests with this:

$hit->start()

Actually returns a string which is the concatenation of query start and
subject end!  (btw I'm using the "-m 8" option) - surely this isn't the
desired option????

If I change it to:

$hit->start('query')

Then I get the correct start back, but I still get the stack trace
error.

The two co-ordinate sets which cause the problem (3264-3268 and
3252-3268) are on adjacent lines in the file (3252-3268 is the next line
after 3264-3268) and are to the SAME subject, ie they are two HSPs of
the same hit (in theory) but they are to two VERY different parts of the
query.

I'm guessing the way blasttable handles multiple HSPs is causing the
trouble.

Mick

-----Original Message-----
From: Marc Logghe [mailto:Marc.Logghe@devgen.com] 
Sent: 07 January 2005 10:41
To: michael watson (IAH-C); Bioperl List
Subject: RE: [Bioperl-l] Error parsing blast results with blasttable


> while (my $result = $searchio->next_result) {
>         while(my $hit = $result->next_hit) {
>                 my $start  = $hit->start;
> 
> And it is that call to $hit->start that sets off the whole trace.
> 
> Any ideas?


Hi Mick,
Have you tried one of these ?:

my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is same
as 'sbjct'

or 

my $start = $hit->hsp->start('sbjct');


I think in all cases it defaults to 'query'. So it should not crash but
give you the start position of the query. I am afraid I can't explain
the crash, sorry.

Marc

From nathanhaigh at ukonline.co.uk  Fri Jan  7 06:39:16 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Fri Jan  7 06:37:42 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCIEPBEGAA.brian_osborne@cognia.com>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAsv4vITWl4EipjvWFr5eI0QEAAAAA@ukonline.co.uk>

There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO object's alphabet is set, next_seq() results in this being undef
and then proceeds to guess the alphabet again, therefore this like the following do not work:

my $seq_in  = Bio::SeqIO->new(-format=>$format, -fh => \*DATA);

$seq_in->alphabet('protein');

Should setting the SeqIO object's alphabet be honoured even if it is set to the wrong type or the sequences are not of that
alphabet?

 
I have a bug fix, that allows you to set the alphabet through the SeqIO object, but it doesn't do any sort of checking to see if all
the seqs in the object are of the correct type. Essentially, the alphabet is set in one of the following ways:

1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all the seqs that belong to the $seq_in object obtain their
alphabet from the SeqIO object, dna in this case, irrespective of whether or not it is actually protein.

2) If alphabet has not been set in this way, the first sequence is used to guess the alphabet of the SeqIO object, from which all
the sequences obtain their alphabet.

 
Possible limitations:

1)     all seqs in the SeqIO object can only be of the same type - no testing done to see if this is not the case.

 
Does this sound ok and reasonable?

Nathan

 
-----Original Message-----
From: Brian Osborne [mailto:brian_osborne@cognia.com] 
Sent: 06 January 2005 12:25
To: nathanhaigh@ukonline.co.uk
Subject: RE: SeqIO fails on masked sequences

 
Nathan,

 
The idea is that a sequence with a high proportion of X is more likely to be DNA than protein. The examples I had in mind are
unfinished genomic sequence, and there are countless entries in Genbank/EMBL like this. So, someone wrote in and said that their
genomic sequence was being characterized as protein since the fraction [gatc] was less than 85%, it was mostly X. By contrast, there
are no protein sequences with X in them in these public databases, if I'm not mistaken. So I maintain that in the world of public
databases this is the way to go.

 
Now if you venture into the world of sequence analysis it's going to be a different story, since you'll likely mask protein with X,
not N, obviously. May I ask, if this person knows his/her sequence is protein then why doesn't s/he set its alphabet to "protein"?
Or why don't they mask with A or Z or O or something?

 
They'll be problems either way. What is one's reference? Public sequence or the less well-defined set of possible sequences?

 
Brian O.

-----Original Message-----
From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
Sent: Wednesday, January 05, 2005 7:38 PM
To: 'Brian Osborne'
Subject: FW: SeqIO fails on masked sequences

You committed a change to Bio::PrimarySeq where 'X' was added to the class of characters that are stripped out of sequences in the
_guess_alphabet subroutine. Do you know why sequences containing X were causing a problem, and why X was added to the class of
chars?

 
It's causing a problem for someone who has a sequence that containes all masked chars (i.e. all X's), which should still be
"guessable" as protein.

 
Cheers

Nathan

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0501-0, 04/01/2005
Tested on: 06/01/2005 00:36:20
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Inbound message clean.
Virus Database (VPS): 0501-0, 04/01/2005
Tested on: 07/01/2005 00:35:30
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0501-0, 04/01/2005
Tested on: 07/01/2005 11:39:14
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From sdavis2 at mail.nih.gov  Fri Jan  7 06:41:49 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Fri Jan  7 06:38:47 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <1105080663.3142.16.camel@localhost.localdomain>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
	<1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
	<1105044266.3084.27.camel@localhost.localdomain>
	<Pine.GSO.4.58.0501062026170.1209@moe.usg.utk.edu>
	<1105080663.3142.16.camel@localhost.localdomain>
Message-ID: <1E370C03-60A1-11D9-AD91-000D933565E8@mail.nih.gov>

I think the power of bioperl is in dealing with entire Entrez Gene 
objects.  For dealing with gene_info, gene2unigene, or generifs files 
in isolation, I'm not sure that an object model is necessary or 
efficient.  However, as many of us do want to deal with Gene objects, I 
think that having a parser that constructs these rich objects is 
important.  That said, I think there may be a NEED two parsers, one for 
species-specific ASN.1 files and one for the tab-delimited files.  The 
ASN.1 parser fits the SeqIO model rather well, I would suppose, but is 
limited by the fact that each species must be downloaded and parsed 
separately.  However, for the vast majority of folks dealing with only 
one or two species, the ease of downloading a single, self-contained 
file for a species or two of interest and passing the file through an 
ASN.1 Gene parser is quite appealing.  Then, for the comparative 
genomicists or those with a need for more than a few species, the 
tab-delimited option could be made available for parsing the text 
files.  Despite my second sentence above, I agree with Stefan that 
having a parser that deals with each text file in isolation (with the 
only required file being gene_info) is quite appealing, allowing the 
user to have a way to choose what files to parse and add to the object. 
  (This is only important because of the number of Gene records and 
needing to complete the parse/object construction in a reasonable 
amount of time.)

I know that having two parsers is not ideal (and that suggesting this 
is a bit of a cop-out), but NCBI has chosen a path that may necessitate 
both solutions to meet the needs of all users.  I would also certainly 
be willing to help out.

Sean

On Jan 7, 2005, at 1:51 AM, Peter Robinson wrote:

> Hi Stefan,
> happy to team up with you for Entrez Gene parsing. Since gene2unigene
> has entries of the form "geneid\tunigeneid", it didnt seem worth the
> trouble putting this information into a Bio::Annotation object in
> isolation. On the other hand, parsing multiple Entrez Gene files at 
> once
> in order to synthesize various forms of infomration about an Entrez 
> Gene
> id seemed to depart from the style of the rest of Bio::SeqIO code.
>
> Suggestions/thoughts, anyone?
>
> -peter
>
> On Fri, 2005-01-07 at 03:33, Stefan A Kirov wrote:
>> Peter,
>> Why unigene can't be added as Bio::Annotation object for example? 
>> Peter,
>> would you mind if I give you a hand, as I am also doing some Entrez 
>> Gene
>> DB parsing.
>> Hilmar,
>> Getting back to your post, I have some concern about automatic
>> parsing of multiple files (if I got this right...). Say if one 
>> downloads
>> the whole Entrez Gene stuff and all is OK I don't see why this can't 
>> be
>> done. But if something goes wrong (and occasionally it will), it will 
>> be
>> really hard for the user to understand he misses parts of the data. Of
>> course this could be done through warnings, but what about people who
>> intentionally parse part of the DB? I guess one can add something like
>> -suppress_warning=>1/0.
>> Another issue that comes to mind is the approach of a stream is fine 
>> for
>> people with the whole DB on their minds. But of you need particular
>> record, I guess you you could index the files, but this totally 
>> different
>> game. Any volunteers?
>>
>>
>> On Thu, 6 Jan 2005, Peter Robinson wrote:
>>
>>> Dear Bioperlers,
>>>
>>> I have started looking at writing some modules to parse the new 
>>> Entrez
>>> gene, which is kind of an expanded LocusLink. The really interesting
>>> files are species specific and are in the ASN.1 format, and I am 
>>> still
>>> experimenting around with the best way of parsing them. To get 
>>> started,
>>> I am looking at the tab-delimited flat files. It seems to me that it
>>> would be interesting to be able to parse gene_info and gene2accession
>>> using the Bio::SeqIO system, the other files such as gene2unigene 
>>> seem
>>> less suited for this (the latter has just two entries which could be
>>> parsed ad hoc easily enough).
>>>
>>> In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm 
>>> as
>>> well as a test script (which contains a small excerpt of gene_info in
>>> the data section) for comments and criticism to the list. I am 
>>> presently
>>> working on another module for Bio::SeqIO::gene2accession and plan to
>>> write a demo script using both modules to convert NCBI accession 
>>> numbers
>>> to MGI accession numbers (which is something one might want to do in
>>> order to use Gene Ontology for affymetrix data, although one needs
>>> additional work for probesets which are only related to ESTs).
>>>
>>> For the moment it seemed better to just parse in the NCBI taxon id 
>>> into
>>> the Bio::Species object (only this info is supplied by gene_info), 
>>> and
>>> expect users who need the information to use the taxonomy support of
>>> other Bioperl modules in their scripts.
>>>
>>> I will continue to work on parsing the species specific ASN.1 files, 
>>> but
>>> I will be trying a combination of lex/yacc/C to do this. If that 
>>> works I
>>> will look into trying perl support for lex/yacc for potential use in
>>> Bioperl, but since I am not sure how long this will take me, I do not
>>> want to scare off anyone else who would like to give this a shot.
>>>
>>> best,
>>> peter
>>>
>>>
>>> On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
>>>> On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> thanks for the advice. It seems as if the documentation of
>>>>> Bio::DB::Taxonomy is a bit out of sync.
>>>>>  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>>>>>                                  -nodesfile => $nodesfile,
>>>>>                                  -namesfile => $namefile);
>>>>> What does 'flatfile' refer to here? It is not apparent upon 
>>>>> looking at
>>>>> the code for new.
>>>>>
>>>> See Bio::DB::Taxonomy::flatfile for more information.  As I 
>>>> mentioned
>>>> in the mail I sent, flatfile is for downloading the taxonomy DB from
>>>> NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
>>>> DB_File) version of the file.
>>>>
>>>> You must need the most up-to-date verion of the modules - works fine
>>>> for me for both the entrez and flatfile code, but you may have to
>>>> upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 
>>>> RC1
>>>> code should work fine.
>>>>
>>>>
>>>>
>>>>> I had somewhat better luck using the entrez version, but I got a
>>>>> pretty amusing error
>>>>> message:
>>>>>
>>>>> MSG: can't create a species object for Homo sapiens (human) 
>>>>> because it
>>>>> isn't a species but is a '' instead
>>>>>
>>>>> ###
>>>>> Full error and a dump of the script follow:
>>>>>
>>>>> my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
>>>>> my $taxaid = $db->get_taxonid('Homo sapiens');
>>>>> my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
>>>>> print Dumper($species);
>>>>>
>>>>> ###
>>>>>
>>>>> Use of uninitialized value in string eq at
>>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>>>>> Use of uninitialized value in sprintf at
>>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>>>>>
>>>>> -------------------- WARNING ---------------------
>>>>> MSG: can't create a species object for Homo sapiens (human) 
>>>>> because it
>>>>> isn't a species but is a '' instead
>>>>> ---------------------------------------------------
>>>>> Use of uninitialized value in string eq at
>>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>>>>> Use of uninitialized value in sprintf at
>>>>> /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>>>>>
>>>>> -------------------- WARNING ---------------------
>>>>> MSG: can't create a species object for Homo sapiens (human) 
>>>>> because it
>>>>> isn't a species but is a '' instead
>>>>> ---------------------------------------------------
>>>>> $VAR1 = {
>>>>>           'TaxId' => '9606',
>>>>>           'Division' => 'mammals',
>>>>>           'GeneNumber' => '32775',
>>>>>           'Rank' => 'species',
>>>>>           'ProtNumber' => '247791',
>>>>>           'ScientificName' => 'Homo sapiens',
>>>>>           'CommonName' => 'human',
>>>>>           'NucNumber' => '9025800',
>>>>>           'GenNumber' => '25',
>>>>>           'StructNumber' => '5638'
>>>>>         };
>>>>> peter@anna:~/programs/bioperlTest$
>>>>>
>>>>>
>>>>> --best, peter
>>>>>
>>>>> On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
>>>>>> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
>>>>>> species object (or equivalent) using this code.  But you cannot 
>>>>>> (or
>>>>>> could not when I wrote this, not sure of the current status) get 
>>>>>> the
>>>>>> full classification from the NCBI taxonomy retrieval via cgi.  
>>>>>> i.e.
>>>>>> you
>>>>>> can only get genus and species for a taxon id and I don't know 
>>>>>> how to
>>>>>> walk up the hierarchy using the web API.  Earlier emails to NCBI
>>>>>> seemed
>>>>>> to indicate this is all they intended to provide, but not sure 
>>>>>> what
>>>>>> the
>>>>>> current status is.
>>>>>>
>>>>>>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
>>>>>> Entrez
>>>>>> over HTTP
>>>>>>    my $taxaid = $db->get_taxonid('Homo sapiens');
>>>>>>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
>>>>>>
>>>>>> You can get the full classification if you use the
>>>>>> Bio::DB::Taxonomy::flatfile factory which requires you to have
>>>>>> downloaded the taxonomy db flatfile from NCBI.  Since this is more
>>>>>> reliable (and faster) it is what I have tended to use for grouping
>>>>>> sets
>>>>>> of seqDB search results, etc.
>>>>>>
>>>>>> -jason
>>>>>> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
>>>>>>
>>>>>>> Hi Bioperlers, hi Hilmar,
>>>>>>>
>>>>>>> after some thinking I have embarked on a lex/yacc parser for the
>>>>>>> Entrez
>>>>>>> Gene ASN.1 format as the way of least resistance, although I am 
>>>>>>> not
>>>>>>> sure
>>>>>>> how that would fit in to BioPerl. If anyone is interested in 
>>>>>>> this (or
>>>>>>> has a better idea of how to go about it..), please drop me a 
>>>>>>> line.
>>>>>>>
>>>>>>> In the meantime I have been looking at writing code to parse 
>>>>>>> some of
>>>>>>> the
>>>>>>> "easy" Entrez gene documents, starting off with gene_info. This 
>>>>>>> file
>>>>>>> includes the NCBI taxon id for each entry. I would like to 
>>>>>>> convert
>>>>>>> this
>>>>>>> to a Bio::Species object to pass to the following
>>>>>>> 	my $seq = $self->sequence_factory->create(
>>>>>>> 			     -verbose => $self->verbose(),
>>>>>>> 			     -accession_number => $geneID,
>>>>>>> 			     -desc => $description,
>>>>>>> 			     -display_id => $symbol,
>>>>>>> 			     -species =>  ???
>>>>>>> 			     -annotation => $ann);
>>>>>>>
>>>>>>> and saw the Bio::Taxonomy::FactoryI code, which appears to want 
>>>>>>> to do
>>>>>>> this sort of thing. However, the code for that is pretty 
>>>>>>> preliminary.
>>>>>>> Is
>>>>>>> anyone working on this at the moment? Or is there a better way of
>>>>>>> doing
>>>>>>> this (it seems a shame not to provide the actual species name if 
>>>>>>> one
>>>>>>> has
>>>>>>> the taxid...)
>>>>>>>
>>>>>>> best
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
>>>>>>>> Great to hear that someone is giving this a shot. Yes at this 
>>>>>>>> point
>>>>>>>> is
>>>>>>>> appears that NCBI is only offering the ASN.1, not a conversion 
>>>>>>>> to
>>>>>>>> XML.
>>>>>>>> Their asn2xml tool will not work with this ASN.1 format either, 
>>>>>>>> just
>>>>>>>> checked it to be sure. They do seem to be mulling the option of 
>>>>>>>> XML
>>>>>>>> though on the Gene FAQ. Maybe if enough people get in their ears
>>>>>>>> they
>>>>>>>> will spend some effort towards that. After all, the entrez gene 
>>>>>>>> web
>>>>>>>> interface can display XML on demand - even though it looks 
>>>>>>>> fairly
>>>>>>>> hideous.
>>>>>>>>
>>>>>>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 
>>>>>>>> support in
>>>>>>>> perl is actually thin - there is Convert::ASN1 at version 0.18 
>>>>>>>> two
>>>>>>>> years ago that I could find ... doesn't make me feel warm and 
>>>>>>>> fuzzy.
>>>>>>>>
>>>>>>>> In the absence of any XML available from NCBI, gene_info might 
>>>>>>>> be
>>>>>>>> the
>>>>>>>> best start. An option could be to check for the presence of the
>>>>>>>> other
>>>>>>>> tab-delimited files and use those that are present. These are
>>>>>>>> tab-delimited and hence the format itself is trivial so you can
>>>>>>>> focus
>>>>>>>> entirely on setting up a Bio::Seq plus annotation that's
>>>>>>>> comparable/compatible to what the current SeqIO::locuslink does.
>>>>>>>>
>>>>>>>> My $0.02 (worth less and less almost every day).
>>>>>>>>
>>>>>>>> 	-hilmar
>>>>>>>>
>>>>>>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been thinking about given a BioPerl EntrezGene parser a 
>>>>>>>>> try
>>>>>>>>> since
>>>>>>>>> I have been a heavy user of locus link to date. One issue is 
>>>>>>>>> that
>>>>>>>>> the
>>>>>>>>> files that correspond to LL_tmpl (which was a flat file) are 
>>>>>>>>> now in
>>>>>>>>> asn
>>>>>>>>> format
>>>>>>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
>>>>>>>>> genehelp.html#query
>>>>>>>>> Although I saw some mention of ASN support in Bioperl by 
>>>>>>>>> googling,
>>>>>>>>> I
>>>>>>>>> can't seem to find any module that does this in the present
>>>>>>>>> distribution. What is the status on that? In any case, I will 
>>>>>>>>> be
>>>>>>>>> working
>>>>>>>>> on this in the next month or two and if anything nice comes of 
>>>>>>>>> it I
>>>>>>>>> will
>>>>>>>>> send it to you / BioPerpl.
>>>>>>>>>
>>>>>>>>> best wishes & happy holidays
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
>>>>>>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
>>>>>>>>>> parsing
>>>>>>>>>> any input file, what you're asking is whether or not there is 
>>>>>>>>>> a
>>>>>>>>>> SeqIO
>>>>>>>>>> parser for NCBI Gene.
>>>>>>>>>>
>>>>>>>>>> The answer to that question is no, not yet. Anybody who feels
>>>>>>>>>> motivated
>>>>>>>>>> is welcome to give it a try ... Since I'll need it, I'll 
>>>>>>>>>> write the
>>>>>>>>>> parser if nobody else does within the next 3 months, but I'm 
>>>>>>>>>> not
>>>>>>>>>> going
>>>>>>>>>> to promise when exactly this will happen.
>>>>>>>>>>
>>>>>>>>>> 	-hilmar
>>>>>>>>>>
>>>>>>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I was wondering with regards to bioperl-db the scripts and 
>>>>>>>>>>> schema
>>>>>>>>>>> and
>>>>>>>>>>> load_seqdatabase.pl has there been preparation for 
>>>>>>>>>>> integration of
>>>>>>>>>>> Entrez
>>>>>>>>>>> gene information when locuslink is phased out?  Or if it has
>>>>>>>>>>> already
>>>>>>>>>>> been
>>>>>>>>>>> changed could somebody point
>>>>>>>>>>> me to the documentation or changed code?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Annie.
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l@portal.open-bio.org
>>>>>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Peter N. Robinson
>>>>>>>>> peter.robinson@t-online.de
>>>>>>>>> peter.robinson@charite.de
>>>>>>>>> http://www.charite.de/ch/medgen/robinson/
>>>>>>>>>
>>>>>>>>>
>>>>>>> --
>>>>>>> Peter N. Robinson
>>>>>>> peter.robinson@t-online.de
>>>>>>> peter.robinson@charite.de
>>>>>>> http://www.charite.de/ch/medgen/robinson/
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l@portal.open-bio.org
>>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Jason Stajich
>>>>>> jason.stajich at duke.edu
>>>>>> http://www.duke.edu/~jes12/
>>>>> --
>>>>> Peter N. Robinson
>>>>> peter.robinson@t-online.de
>>>>> peter.robinson@charite.de
>>>>> http://www.charite.de/ch/medgen/robinson/
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at duke.edu
>>>> http://www.duke.edu/~jes12/
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>> --
>>> Peter N. Robinson
>>> peter.robinson@t-online.de
>>> peter.robinson@charite.de
>>> http://www.charite.de/ch/medgen/robinson/
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> -- 
> Peter N. Robinson
> peter.robinson@t-online.de
> peter.robinson@charite.de
> http://www.charite.de/ch/medgen/robinson/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at duke.edu  Fri Jan  7 08:03:47 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan  7 08:00:43 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E950121B91C@iahce2knas1.iah.bbsrc.reserved>
References: <8975119BCD0AC5419D61A9CF1A923E950121B91C@iahce2knas1.iah.bbsrc.reserved>
Message-ID: <91955B6C-60AC-11D9-ACAB-000393C44276@duke.edu>

$hit->start is a convience function which first tiles the HSPs and then 
gives you the smallest start.

If you look at the documentation for the method you see that not giving 
it a type will give you start in the query and hit.

  Usage     : $sbjct->start( [seq_type] );
  Purpose   : Gets the start coordinate for the query, sbjct, or both 
sequences
            : in the BlastHit object. If there is more than one HSP, the 
lowest start
            : value of all HSPs is returned.
  Example   : $qbeg = $sbjct->start('query');
            : $sbeg = $sbjct->start('hit');
            : ($qbeg, $sbeg) = $sbjct->start();
  Returns   : scalar context: integer
            : array context without args: list of two integers 
(queryStart, sbjctStart)
            : Array context can be "induced" by providing an argument of 
'list' or 'array'.
  Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' 
(default = 'query')
              ('sbjct' is synonymous with 'hit')
  Throws    : n/a
  Comments  : This method requires that all HSPs be tiled. If there is 
more than one
            : HSP and they have not already been tiled, they will be 
tiled first automatically..
            : Remember that the start and end coordinates of all HSPs are
            : normalized so that start < end. Strand information can be
            : obtained by calling $hit->strand().

I don't know why you are seeing concatenated positions unless you are 
somehow getting it in array context and then turning it into a string.


I really don't use this, if I want tiled HSPs I use WU-BLAST with the 
-links and build compatible HSP groups.

What are you trying to get - the smallest hit or query start? Just the 
start/end for HSPs?

If this is somehow a blasttable specific problem will try and see if 
can figure out why.

-jason

On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> Having done some more tests with this:
>
> $hit->start()
>
> Actually returns a string which is the concatenation of query start and
> subject end!  (btw I'm using the "-m 8" option) - surely this isn't the
> desired option????
>
> If I change it to:
>
> $hit->start('query')
>
> Then I get the correct start back, but I still get the stack trace
> error.
>
> The two co-ordinate sets which cause the problem (3264-3268 and
> 3252-3268) are on adjacent lines in the file (3252-3268 is the next 
> line
> after 3264-3268) and are to the SAME subject, ie they are two HSPs of
> the same hit (in theory) but they are to two VERY different parts of 
> the
> query.
>
> I'm guessing the way blasttable handles multiple HSPs is causing the
> trouble.
>
> Mick
>
> -----Original Message-----
> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
> Sent: 07 January 2005 10:41
> To: michael watson (IAH-C); Bioperl List
> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable
>
>
>> while (my $result = $searchio->next_result) {
>>         while(my $hit = $result->next_hit) {
>>                 my $start  = $hit->start;
>>
>> And it is that call to $hit->start that sets off the whole trace.
>>
>> Any ideas?
>
>
> Hi Mick,
> Have you tried one of these ?:
>
> my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is 
> same
> as 'sbjct'
>
> or
>
> my $start = $hit->hsp->start('sbjct');
>
>
> I think in all cases it defaults to 'query'. So it should not crash but
> give you the start position of the query. I am afraid I can't explain
> the crash, sorry.
>
> Marc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From michael.watson at bbsrc.ac.uk  Fri Jan  7 08:21:25 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri Jan  7 08:20:40 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AB3@iahce2knas1.iah.bbsrc.reserved>

Hi

I submitted a bug which contains some blasttable output, example code,
and the error produced.

Mick

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: 07 January 2005 13:04
To: michael watson (IAH-C)
Cc: Bioperl List; Marc Logghe
Subject: Re: [Bioperl-l] Error parsing blast results with blasttable


$hit->start is a convience function which first tiles the HSPs and then 
gives you the smallest start.

If you look at the documentation for the method you see that not giving 
it a type will give you start in the query and hit.

  Usage     : $sbjct->start( [seq_type] );
  Purpose   : Gets the start coordinate for the query, sbjct, or both 
sequences
            : in the BlastHit object. If there is more than one HSP, the

lowest start
            : value of all HSPs is returned.
  Example   : $qbeg = $sbjct->start('query');
            : $sbeg = $sbjct->start('hit');
            : ($qbeg, $sbeg) = $sbjct->start();
  Returns   : scalar context: integer
            : array context without args: list of two integers 
(queryStart, sbjctStart)
            : Array context can be "induced" by providing an argument of

'list' or 'array'.
  Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' 
(default = 'query')
              ('sbjct' is synonymous with 'hit')
  Throws    : n/a
  Comments  : This method requires that all HSPs be tiled. If there is 
more than one
            : HSP and they have not already been tiled, they will be 
tiled first automatically..
            : Remember that the start and end coordinates of all HSPs
are
            : normalized so that start < end. Strand information can be
            : obtained by calling $hit->strand().

I don't know why you are seeing concatenated positions unless you are 
somehow getting it in array context and then turning it into a string.


I really don't use this, if I want tiled HSPs I use WU-BLAST with the 
-links and build compatible HSP groups.

What are you trying to get - the smallest hit or query start? Just the 
start/end for HSPs?

If this is somehow a blasttable specific problem will try and see if 
can figure out why.

-jason

On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> Having done some more tests with this:
>
> $hit->start()
>
> Actually returns a string which is the concatenation of query start 
> and subject end!  (btw I'm using the "-m 8" option) - surely this 
> isn't the desired option????
>
> If I change it to:
>
> $hit->start('query')
>
> Then I get the correct start back, but I still get the stack trace 
> error.
>
> The two co-ordinate sets which cause the problem (3264-3268 and
> 3252-3268) are on adjacent lines in the file (3252-3268 is the next
> line
> after 3264-3268) and are to the SAME subject, ie they are two HSPs of
> the same hit (in theory) but they are to two VERY different parts of 
> the
> query.
>
> I'm guessing the way blasttable handles multiple HSPs is causing the 
> trouble.
>
> Mick
>
> -----Original Message-----
> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
> Sent: 07 January 2005 10:41
> To: michael watson (IAH-C); Bioperl List
> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable
>
>
>> while (my $result = $searchio->next_result) {
>>         while(my $hit = $result->next_hit) {
>>                 my $start  = $hit->start;
>>
>> And it is that call to $hit->start that sets off the whole trace.
>>
>> Any ideas?
>
>
> Hi Mick,
> Have you tried one of these ?:
>
> my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is
> same
> as 'sbjct'
>
> or
>
> my $start = $hit->hsp->start('sbjct');
>
>
> I think in all cases it defaults to 'query'. So it should not crash 
> but give you the start position of the query. I am afraid I can't 
> explain the crash, sorry.
>
> Marc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From jason.stajich at duke.edu  Fri Jan  7 08:29:51 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan  7 08:26:22 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E89AB3@iahce2knas1.iah.bbsrc.reserved>
References: <8975119BCD0AC5419D61A9CF1A923E95E89AB3@iahce2knas1.iah.bbsrc.reserved>
Message-ID: <358EC678-60B0-11D9-ACAB-000393C44276@duke.edu>

Right - but back to my question - what do you want to be getting out?  
Do you want the smallest HSP start position if you are calling 
$hit->start('query')?  Are you hoping for fancy HSP tiling?

I'm pretty sure the problem you are showing has to do with being unable 
to build a single compatible tiling path for a set of HSPs.  I just 
think the code for doing this is just too brittle.  There may also be a 
bug since the blasttable parser has less data available too it and that 
may be the cause as well, so will have to be investigated nonetheless.

-jason
On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> I submitted a bug which contains some blasttable output, example code,
> and the error produced.
>
> Mick
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 07 January 2005 13:04
> To: michael watson (IAH-C)
> Cc: Bioperl List; Marc Logghe
> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable
>
>
> $hit->start is a convience function which first tiles the HSPs and then
> gives you the smallest start.
>
> If you look at the documentation for the method you see that not giving
> it a type will give you start in the query and hit.
>
>   Usage     : $sbjct->start( [seq_type] );
>   Purpose   : Gets the start coordinate for the query, sbjct, or both
> sequences
>             : in the BlastHit object. If there is more than one HSP, 
> the
>
> lowest start
>             : value of all HSPs is returned.
>   Example   : $qbeg = $sbjct->start('query');
>             : $sbeg = $sbjct->start('hit');
>             : ($qbeg, $sbeg) = $sbjct->start();
>   Returns   : scalar context: integer
>             : array context without args: list of two integers
> (queryStart, sbjctStart)
>             : Array context can be "induced" by providing an argument 
> of
>
> 'list' or 'array'.
>   Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct'
> (default = 'query')
>               ('sbjct' is synonymous with 'hit')
>   Throws    : n/a
>   Comments  : This method requires that all HSPs be tiled. If there is
> more than one
>             : HSP and they have not already been tiled, they will be
> tiled first automatically..
>             : Remember that the start and end coordinates of all HSPs
> are
>             : normalized so that start < end. Strand information can be
>             : obtained by calling $hit->strand().
>
> I don't know why you are seeing concatenated positions unless you are
> somehow getting it in array context and then turning it into a string.
>
>
> I really don't use this, if I want tiled HSPs I use WU-BLAST with the
> -links and build compatible HSP groups.
>
> What are you trying to get - the smallest hit or query start? Just the
> start/end for HSPs?
>
> If this is somehow a blasttable specific problem will try and see if
> can figure out why.
>
> -jason
>
> On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> Having done some more tests with this:
>>
>> $hit->start()
>>
>> Actually returns a string which is the concatenation of query start
>> and subject end!  (btw I'm using the "-m 8" option) - surely this
>> isn't the desired option????
>>
>> If I change it to:
>>
>> $hit->start('query')
>>
>> Then I get the correct start back, but I still get the stack trace
>> error.
>>
>> The two co-ordinate sets which cause the problem (3264-3268 and
>> 3252-3268) are on adjacent lines in the file (3252-3268 is the next
>> line
>> after 3264-3268) and are to the SAME subject, ie they are two HSPs of
>> the same hit (in theory) but they are to two VERY different parts of
>> the
>> query.
>>
>> I'm guessing the way blasttable handles multiple HSPs is causing the
>> trouble.
>>
>> Mick
>>
>> -----Original Message-----
>> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
>> Sent: 07 January 2005 10:41
>> To: michael watson (IAH-C); Bioperl List
>> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable
>>
>>
>>> while (my $result = $searchio->next_result) {
>>>         while(my $hit = $result->next_hit) {
>>>                 my $start  = $hit->start;
>>>
>>> And it is that call to $hit->start that sets off the whole trace.
>>>
>>> Any ideas?
>>
>>
>> Hi Mick,
>> Have you tried one of these ?:
>>
>> my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is
>> same
>> as 'sbjct'
>>
>> or
>>
>> my $start = $hit->hsp->start('sbjct');
>>
>>
>> I think in all cases it defaults to 'query'. So it should not crash
>> but give you the start position of the query. I am afraid I can't
>> explain the crash, sorry.
>>
>> Marc
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From michael.watson at bbsrc.ac.uk  Fri Jan  7 08:34:21 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri Jan  7 08:31:57 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AB4@iahce2knas1.iah.bbsrc.reserved>

Actually, the very first answer from Marc solved a bug in my script -
I'm simply trying to get the query start and end of the HSPs, I don't
need them to be linked together into a coherent hit object with multiple
HSPs, I'd be happy with them separate.  I'm not trying to do anything
fancy, just mark up the HSPs as features on the query sequence.

When running the exact same script using "blast" format instead of
"blasttable" I get exactly what I need - I was trying to use blasttable
for efficiency sake though.

Thanks
Mick

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: 07 January 2005 13:30
To: michael watson (IAH-C)
Cc: Bioperl List; Marc Logghe
Subject: Re: [Bioperl-l] Error parsing blast results with blasttable


Right - but back to my question - what do you want to be getting out?  
Do you want the smallest HSP start position if you are calling 
$hit->start('query')?  Are you hoping for fancy HSP tiling?

I'm pretty sure the problem you are showing has to do with being unable 
to build a single compatible tiling path for a set of HSPs.  I just 
think the code for doing this is just too brittle.  There may also be a 
bug since the blasttable parser has less data available too it and that 
may be the cause as well, so will have to be investigated nonetheless.

-jason
On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> I submitted a bug which contains some blasttable output, example code,

> and the error produced.
>
> Mick
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 07 January 2005 13:04
> To: michael watson (IAH-C)
> Cc: Bioperl List; Marc Logghe
> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable
>
>
> $hit->start is a convience function which first tiles the HSPs and 
> then gives you the smallest start.
>
> If you look at the documentation for the method you see that not 
> giving it a type will give you start in the query and hit.
>
>   Usage     : $sbjct->start( [seq_type] );
>   Purpose   : Gets the start coordinate for the query, sbjct, or both
> sequences
>             : in the BlastHit object. If there is more than one HSP,
> the
>
> lowest start
>             : value of all HSPs is returned.
>   Example   : $qbeg = $sbjct->start('query');
>             : $sbeg = $sbjct->start('hit');
>             : ($qbeg, $sbeg) = $sbjct->start();
>   Returns   : scalar context: integer
>             : array context without args: list of two integers 
> (queryStart, sbjctStart)
>             : Array context can be "induced" by providing an argument
> of
>
> 'list' or 'array'.
>   Argument  : In scalar context: seq_type = 'query' or 'hit' or 
> 'sbjct' (default = 'query')
>               ('sbjct' is synonymous with 'hit')
>   Throws    : n/a
>   Comments  : This method requires that all HSPs be tiled. If there is

> more than one
>             : HSP and they have not already been tiled, they will be 
> tiled first automatically..
>             : Remember that the start and end coordinates of all HSPs 
> are
>             : normalized so that start < end. Strand information can
be
>             : obtained by calling $hit->strand().
>
> I don't know why you are seeing concatenated positions unless you are 
> somehow getting it in array context and then turning it into a string.
>
>
> I really don't use this, if I want tiled HSPs I use WU-BLAST with the 
> -links and build compatible HSP groups.
>
> What are you trying to get - the smallest hit or query start? Just the

> start/end for HSPs?
>
> If this is somehow a blasttable specific problem will try and see if 
> can figure out why.
>
> -jason
>
> On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> Having done some more tests with this:
>>
>> $hit->start()
>>
>> Actually returns a string which is the concatenation of query start 
>> and subject end!  (btw I'm using the "-m 8" option) - surely this 
>> isn't the desired option????
>>
>> If I change it to:
>>
>> $hit->start('query')
>>
>> Then I get the correct start back, but I still get the stack trace 
>> error.
>>
>> The two co-ordinate sets which cause the problem (3264-3268 and
>> 3252-3268) are on adjacent lines in the file (3252-3268 is the next 
>> line after 3264-3268) and are to the SAME subject, ie they are two 
>> HSPs of the same hit (in theory) but they are to two VERY different 
>> parts of the
>> query.
>>
>> I'm guessing the way blasttable handles multiple HSPs is causing the 
>> trouble.
>>
>> Mick
>>
>> -----Original Message-----
>> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
>> Sent: 07 January 2005 10:41
>> To: michael watson (IAH-C); Bioperl List
>> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable
>>
>>
>>> while (my $result = $searchio->next_result) {
>>>         while(my $hit = $result->next_hit) {
>>>                 my $start  = $hit->start;
>>>
>>> And it is that call to $hit->start that sets off the whole trace.
>>>
>>> Any ideas?
>>
>>
>> Hi Mick,
>> Have you tried one of these ?:
>>
>> my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is 
>> same as 'sbjct'
>>
>> or
>>
>> my $start = $hit->hsp->start('sbjct');
>>
>>
>> I think in all cases it defaults to 'query'. So it should not crash 
>> but give you the start position of the query. I am afraid I can't 
>> explain the crash, sorry.
>>
>> Marc
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org 
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From jason.stajich at duke.edu  Fri Jan  7 08:48:30 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan  7 08:44:58 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95E89AB4@iahce2knas1.iah.bbsrc.reserved>
References: <8975119BCD0AC5419D61A9CF1A923E95E89AB4@iahce2knas1.iah.bbsrc.reserved>
Message-ID: <D0B29DEA-60B2-11D9-ACAB-000393C44276@duke.edu>


On Jan 7, 2005, at 8:34 AM, michael watson ((IAH-C)) wrote:

> Actually, the very first answer from Marc solved a bug in my script -
> I'm simply trying to get the query start and end of the HSPs, I don't
> need them to be linked together into a coherent hit object with 
> multiple
> HSPs, I'd be happy with them separate.  I'm not trying to do anything
> fancy, just mark up the HSPs as features on the query sequence.
>
> When running the exact same script using "blast" format instead of
> "blasttable" I get exactly what I need - I was trying to use blasttable
> for efficiency sake though.
>

Sure.  that is the right thing to do.  but calling $hit->start 
$hit->end is not really doing what you want so don't use it.


If you want the start and end of HSPs you need to be calling that on 
the HSPs themselves.  If you look at the searchIO howto
you'll see this construct

while (my $result = $searchio->next_result) {
         while(my $hit = $result->next_hit) {
		while( my $hsp = $hit->next_hsp) {
			print $hsp->query->stary, " ",$hsp->query->end, "\n";
		}
	}
}

That will work and get you the HSP start and end for the query.  use 
$hsp->hit->start and $hsp->hit->end to get the start/end of the hit 
sequence coordinates in the HSP.

-jason


> Thanks
> Mick
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 07 January 2005 13:30
> To: michael watson (IAH-C)
> Cc: Bioperl List; Marc Logghe
> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable
>
>
> Right - but back to my question - what do you want to be getting out?
> Do you want the smallest HSP start position if you are calling
> $hit->start('query')?  Are you hoping for fancy HSP tiling?
>
> I'm pretty sure the problem you are showing has to do with being unable
> to build a single compatible tiling path for a set of HSPs.  I just
> think the code for doing this is just too brittle.  There may also be a
> bug since the blasttable parser has less data available too it and that
> may be the cause as well, so will have to be investigated nonetheless.
>
> -jason
> On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> I submitted a bug which contains some blasttable output, example code,
>
>> and the error produced.
>>
>> Mick
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich@duke.edu]
>> Sent: 07 January 2005 13:04
>> To: michael watson (IAH-C)
>> Cc: Bioperl List; Marc Logghe
>> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable
>>
>>
>> $hit->start is a convience function which first tiles the HSPs and
>> then gives you the smallest start.
>>
>> If you look at the documentation for the method you see that not
>> giving it a type will give you start in the query and hit.
>>
>>   Usage     : $sbjct->start( [seq_type] );
>>   Purpose   : Gets the start coordinate for the query, sbjct, or both
>> sequences
>>             : in the BlastHit object. If there is more than one HSP,
>> the
>>
>> lowest start
>>             : value of all HSPs is returned.
>>   Example   : $qbeg = $sbjct->start('query');
>>             : $sbeg = $sbjct->start('hit');
>>             : ($qbeg, $sbeg) = $sbjct->start();
>>   Returns   : scalar context: integer
>>             : array context without args: list of two integers
>> (queryStart, sbjctStart)
>>             : Array context can be "induced" by providing an argument
>> of
>>
>> 'list' or 'array'.
>>   Argument  : In scalar context: seq_type = 'query' or 'hit' or
>> 'sbjct' (default = 'query')
>>               ('sbjct' is synonymous with 'hit')
>>   Throws    : n/a
>>   Comments  : This method requires that all HSPs be tiled. If there is
>
>> more than one
>>             : HSP and they have not already been tiled, they will be
>> tiled first automatically..
>>             : Remember that the start and end coordinates of all HSPs
>> are
>>             : normalized so that start < end. Strand information can
> be
>>             : obtained by calling $hit->strand().
>>
>> I don't know why you are seeing concatenated positions unless you are
>> somehow getting it in array context and then turning it into a string.
>>
>>
>> I really don't use this, if I want tiled HSPs I use WU-BLAST with the
>> -links and build compatible HSP groups.
>>
>> What are you trying to get - the smallest hit or query start? Just the
>
>> start/end for HSPs?
>>
>> If this is somehow a blasttable specific problem will try and see if
>> can figure out why.
>>
>> -jason
>>
>> On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote:
>>
>>> Hi
>>>
>>> Having done some more tests with this:
>>>
>>> $hit->start()
>>>
>>> Actually returns a string which is the concatenation of query start
>>> and subject end!  (btw I'm using the "-m 8" option) - surely this
>>> isn't the desired option????
>>>
>>> If I change it to:
>>>
>>> $hit->start('query')
>>>
>>> Then I get the correct start back, but I still get the stack trace
>>> error.
>>>
>>> The two co-ordinate sets which cause the problem (3264-3268 and
>>> 3252-3268) are on adjacent lines in the file (3252-3268 is the next
>>> line after 3264-3268) and are to the SAME subject, ie they are two
>>> HSPs of the same hit (in theory) but they are to two VERY different
>>> parts of the
>>> query.
>>>
>>> I'm guessing the way blasttable handles multiple HSPs is causing the
>>> trouble.
>>>
>>> Mick
>>>
>>> -----Original Message-----
>>> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
>>> Sent: 07 January 2005 10:41
>>> To: michael watson (IAH-C); Bioperl List
>>> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable
>>>
>>>
>>>> while (my $result = $searchio->next_result) {
>>>>         while(my $hit = $result->next_hit) {
>>>>                 my $start  = $hit->start;
>>>>
>>>> And it is that call to $hit->start that sets off the whole trace.
>>>>
>>>> Any ideas?
>>>
>>>
>>> Hi Mick,
>>> Have you tried one of these ?:
>>>
>>> my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is
>>> same as 'sbjct'
>>>
>>> or
>>>
>>> my $start = $hit->hsp->start('sbjct');
>>>
>>>
>>> I think in all cases it defaults to 'query'. So it should not crash
>>> but give you the start position of the query. I am afraid I can't
>>> explain the crash, sorry.
>>>
>>> Marc
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From michael.watson at bbsrc.ac.uk  Fri Jan  7 08:51:18 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Fri Jan  7 08:48:50 2005
Subject: [Bioperl-l] Error parsing blast results with blasttable
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89AB7@iahce2knas1.iah.bbsrc.reserved>

OK, I was under the impression (mistakenly) that hits from blasttable
output didn't have HSPs.

-----Original Message-----
From: Jason Stajich [mailto:jason.stajich@duke.edu] 
Sent: 07 January 2005 13:49
To: michael watson (IAH-C)
Cc: Bioperl List; Marc Logghe
Subject: Re: [Bioperl-l] Error parsing blast results with blasttable


On Jan 7, 2005, at 8:34 AM, michael watson ((IAH-C)) wrote:

> Actually, the very first answer from Marc solved a bug in my script - 
> I'm simply trying to get the query start and end of the HSPs, I don't 
> need them to be linked together into a coherent hit object with 
> multiple HSPs, I'd be happy with them separate.  I'm not trying to do 
> anything fancy, just mark up the HSPs as features on the query 
> sequence.
>
> When running the exact same script using "blast" format instead of 
> "blasttable" I get exactly what I need - I was trying to use 
> blasttable for efficiency sake though.
>

Sure.  that is the right thing to do.  but calling $hit->start 
$hit->end is not really doing what you want so don't use it.


If you want the start and end of HSPs you need to be calling that on 
the HSPs themselves.  If you look at the searchIO howto
you'll see this construct

while (my $result = $searchio->next_result) {
         while(my $hit = $result->next_hit) {
		while( my $hsp = $hit->next_hsp) {
			print $hsp->query->stary, " ",$hsp->query->end,
"\n";
		}
	}
}

That will work and get you the HSP start and end for the query.  use 
$hsp->hit->start and $hsp->hit->end to get the start/end of the hit 
sequence coordinates in the HSP.

-jason


> Thanks
> Mick
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 07 January 2005 13:30
> To: michael watson (IAH-C)
> Cc: Bioperl List; Marc Logghe
> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable
>
>
> Right - but back to my question - what do you want to be getting out? 
> Do you want the smallest HSP start position if you are calling 
> $hit->start('query')?  Are you hoping for fancy HSP tiling?
>
> I'm pretty sure the problem you are showing has to do with being 
> unable to build a single compatible tiling path for a set of HSPs.  I 
> just think the code for doing this is just too brittle.  There may 
> also be a bug since the blasttable parser has less data available too 
> it and that may be the cause as well, so will have to be investigated 
> nonetheless.
>
> -jason
> On Jan 7, 2005, at 8:21 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> I submitted a bug which contains some blasttable output, example 
>> code,
>
>> and the error produced.
>>
>> Mick
>>
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich@duke.edu]
>> Sent: 07 January 2005 13:04
>> To: michael watson (IAH-C)
>> Cc: Bioperl List; Marc Logghe
>> Subject: Re: [Bioperl-l] Error parsing blast results with blasttable
>>
>>
>> $hit->start is a convience function which first tiles the HSPs and 
>> then gives you the smallest start.
>>
>> If you look at the documentation for the method you see that not 
>> giving it a type will give you start in the query and hit.
>>
>>   Usage     : $sbjct->start( [seq_type] );
>>   Purpose   : Gets the start coordinate for the query, sbjct, or both
>> sequences
>>             : in the BlastHit object. If there is more than one HSP, 
>> the
>>
>> lowest start
>>             : value of all HSPs is returned.
>>   Example   : $qbeg = $sbjct->start('query');
>>             : $sbeg = $sbjct->start('hit');
>>             : ($qbeg, $sbeg) = $sbjct->start();
>>   Returns   : scalar context: integer
>>             : array context without args: list of two integers 
>> (queryStart, sbjctStart)
>>             : Array context can be "induced" by providing an argument

>> of
>>
>> 'list' or 'array'.
>>   Argument  : In scalar context: seq_type = 'query' or 'hit' or 
>> 'sbjct' (default = 'query')
>>               ('sbjct' is synonymous with 'hit')
>>   Throws    : n/a
>>   Comments  : This method requires that all HSPs be tiled. If there 
>> is
>
>> more than one
>>             : HSP and they have not already been tiled, they will be 
>> tiled first automatically..
>>             : Remember that the start and end coordinates of all HSPs

>> are
>>             : normalized so that start < end. Strand information can
> be
>>             : obtained by calling $hit->strand().
>>
>> I don't know why you are seeing concatenated positions unless you are

>> somehow getting it in array context and then turning it into a 
>> string.
>>
>>
>> I really don't use this, if I want tiled HSPs I use WU-BLAST with the

>> -links and build compatible HSP groups.
>>
>> What are you trying to get - the smallest hit or query start? Just 
>> the
>
>> start/end for HSPs?
>>
>> If this is somehow a blasttable specific problem will try and see if 
>> can figure out why.
>>
>> -jason
>>
>> On Jan 7, 2005, at 5:50 AM, michael watson ((IAH-C)) wrote:
>>
>>> Hi
>>>
>>> Having done some more tests with this:
>>>
>>> $hit->start()
>>>
>>> Actually returns a string which is the concatenation of query start 
>>> and subject end!  (btw I'm using the "-m 8" option) - surely this 
>>> isn't the desired option????
>>>
>>> If I change it to:
>>>
>>> $hit->start('query')
>>>
>>> Then I get the correct start back, but I still get the stack trace 
>>> error.
>>>
>>> The two co-ordinate sets which cause the problem (3264-3268 and
>>> 3252-3268) are on adjacent lines in the file (3252-3268 is the next 
>>> line after 3264-3268) and are to the SAME subject, ie they are two 
>>> HSPs of the same hit (in theory) but they are to two VERY different 
>>> parts of the query.
>>>
>>> I'm guessing the way blasttable handles multiple HSPs is causing the

>>> trouble.
>>>
>>> Mick
>>>
>>> -----Original Message-----
>>> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
>>> Sent: 07 January 2005 10:41
>>> To: michael watson (IAH-C); Bioperl List
>>> Subject: RE: [Bioperl-l] Error parsing blast results with blasttable
>>>
>>>
>>>> while (my $result = $searchio->next_result) {
>>>>         while(my $hit = $result->next_hit) {
>>>>                 my $start  = $hit->start;
>>>>
>>>> And it is that call to $hit->start that sets off the whole trace.
>>>>
>>>> Any ideas?
>>>
>>>
>>> Hi Mick,
>>> Have you tried one of these ?:
>>>
>>> my $start = $hit->start('sbjct');  # or 'query' or 'hit'. Latter is 
>>> same as 'sbjct'
>>>
>>> or
>>>
>>> my $start = $hit->hsp->start('sbjct');
>>>
>>>
>>> I think in all cases it defaults to 'query'. So it should not crash 
>>> but give you the start position of the query. I am afraid I can't 
>>> explain the crash, sorry.
>>>
>>> Marc
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org 
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From golharam at umdnj.edu  Fri Jan  7 14:36:35 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri Jan  7 14:28:01 2005
Subject: [Bioperl-l] libgd
Message-ID: <005d01c4f4f0$3349f5f0$a6028a0a@GOLHARMOBILE1>

Hi all,

Not sure where this should be posted, so forgive me if I'm posting it in
the wrong place.

I'm trying to use Bioperl with RedHat Enterprise Linux v3.  RedHat
provides gd v1.8, however bioperl requires > 2.something.  

So, I downloaded gd 2.0.33 and rebuilt it using a spec file I found from
verion 2.0.16 (I think).  

When I tried upgrading the package using:
[root@hydrogen i386]# rpm -Uvh gd-2.0.33-1.i386.rpm
gd-devel-2.0.33-1.i386.rpm

I get the error:
error: Failed dependencies:
        libgd.so.1.8 is needed by (installed) glibc-utils-2.3.2-95.30

If I do a 'ls -lp /usr/lib/libgd*', I get:

-rw-r--r--    1 root     root       212978 Jun 17  2003 /usr/lib/libgd.a
lrwxrwxrwx    1 root     root           14 Jan  7 13:23
/usr/lib/libgd.so -> libgd.so.1.8.4
lrwxrwxrwx    1 root     root           14 Jan  7 13:23
/usr/lib/libgd.so.1 -> libgd.so.1.8.4
lrwxrwxrwx    1 root     root           14 Jan  7 13:23
/usr/lib/libgd.so.1.8 -> libgd.so.1.8.4
-rwxr-xr-x    1 root     root       183332 Jun 17  2003
/usr/lib/libgd.so.1.8.4


If I do a 'rpm -qpl gd-2.0.33-1.i386.rpm', I get:

/usr/lib/libgd.so.1
/usr/lib/libgd.so.1.8
/usr/lib/libgd.so.2
/usr/lib/libgd.so.2.0.0
/usr/share/doc/gd-2.0.33
/usr/share/doc/gd-2.0.33/README-JPEG.TXT
/usr/share/doc/gd-2.0.33/README.TXT
/usr/share/doc/gd-2.0.33/entities.html
/usr/share/doc/gd-2.0.33/index.html

Here is the spec file I'm using (minus some unnecessary stuff).  Any
ideas?

Summary: A graphics library for quick creation of PNG, GIF or JPEG
images.
Name: gd
Version: 2.0.33
Release: 1
URL: http://www.boutell.com/gd/
Source0: http://www.boutell.com/gd/http/gd-%{version}.tar.gz
License: BSD-style
Group: System Environment/Libraries
BuildRoot: %{_tmppath}/%{name}-root
Prereq: /sbin/ldconfig
BuildPrereq: freetype-devel, libjpeg-devel, libpng-devel, zlib-devel
%define shlibver %(echo %{version} | cut -f-2 -d.)

%prep
%setup -q

%build
./configure --prefix=$RPM_BUILD_ROOT/usr
make

%install
[ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT
make install
(cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1)
(cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1.8)
rm -rf $RPM_BUILD_ROOT%{_libdir}/libgd.la

%clean
[ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT

%post -p /sbin/ldconfig

%postun -p /sbin/ldconfig

%files
%defattr(-,root,root)
%doc README.TXT README-JPEG.TXT index.html entities.html
%{_libdir}/*.so.*

%files progs
%defattr(-,root,root)
%{_bindir}/*

%files devel
%defattr(-,root,root)
%{_includedir}/*
%{_libdir}/*.so
%{_libdir}/*.a


-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam@umdnj.edu

From golharam at umdnj.edu  Fri Jan  7 16:39:56 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Fri Jan  7 16:31:23 2005
Subject: [Bioperl-l] (no subject)
Message-ID: <005e01c4f501$6f0be740$a6028a0a@GOLHARMOBILE1>

Hi all,

I have a bunch of protein ids, and I'm attempting to obtain the cds that
corresponds to the id.  I can locate the sequence feature in the genbank
file, however, when I make a call to  $feature->spliced_seq, I can an
error 'cannot get remote location for ... without a valid
Bio::DB::RandomAccessI database handle'.  Its line 546 of
Bio::SeqFeatureI.pm.

I suspect the problem is because in the genbank file, this particular
entry reads:

Join(AF072550.1:61..103,AF072550.1:5359..5524)

I'm wondering if the accession number is throwing the parser off.  Does
anyone have any experience with this?

Ryan

From jason.stajich at duke.edu  Fri Jan  7 16:45:24 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan  7 16:41:55 2005
Subject: [Bioperl-l] getting remote sequence features with spliced_seq
In-Reply-To: <005e01c4f501$6f0be740$a6028a0a@GOLHARMOBILE1>
References: <005e01c4f501$6f0be740$a6028a0a@GOLHARMOBILE1>
Message-ID: <7008278A-60F5-11D9-ACAB-000393C44276@duke.edu>

Pass in Bio::DB::GenBank handle to achieve this magic.

my $dbh = Bio::DB::GenBank->new();
$feature->spliced_seq($dbh);

 From Bio::SeqFeatureI

        spliced_seq

          Title   : spliced_seq

          Usage   : $seq = $feature->spliced_seq()
                    $seq = 
$feature_with_remote_locations->spliced_seq($db_for_seqs)

          Function: Provides a sequence of the feature which is the most
                    semantically "relevant" feature for this sequence. A 
default
                    implementation is provided which for simple cases 
returns just
                    the sequence, but for split cases, loops over the 
split location
                    to return the sequence. In the case of split 
locations with
                    remote locations, eg

                    join(AB000123:5567-5589,80..1144)

                    in the case when a database object is passed in, it 
will attempt
                    to retrieve the sequence from the database object, 
and "Do the right thing",
                    however if no database object is provided, it will 
generate the correct
                    number of N's (DNA) or X's (protein, though this is 
unlikely).

                    This function is deliberately "magical" attempting 
to second guess
                    what a user wants as "the" sequence for this feature.

                    Implementing classes are free to override this 
method with their
                    own magic if they have a better idea what the user 
wants.

          Args    : [optional] A L<Bio::DB::RandomAccessI> compliant 
object if
                               one needs to retrieve remote seqs.
                    [optional] boolean if the locations should not be 
sorted
                               by start location.
          Returns : A L<Bio::PrimarySeqI> object


On Jan 7, 2005, at 4:39 PM, Ryan Golhar wrote:

> Hi all,
>
> I have a bunch of protein ids, and I'm attempting to obtain the cds 
> that
> corresponds to the id.  I can locate the sequence feature in the 
> genbank
> file, however, when I make a call to  $feature->spliced_seq, I can an
> error 'cannot get remote location for ... without a valid
> Bio::DB::RandomAccessI database handle'.  Its line 546 of
> Bio::SeqFeatureI.pm.
>
> I suspect the problem is because in the genbank file, this particular
> entry reads:
>
> Join(AF072550.1:61..103,AF072550.1:5359..5524)
>
> I'm wondering if the accession number is throwing the parser off.  Does
> anyone have any experience with this?
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From tex at biocompute.net  Fri Jan  7 03:23:47 2005
From: tex at biocompute.net (James Thompson)
Date: Fri Jan  7 18:31:41 2005
Subject: [Bioperl-l] libgd
In-Reply-To: <005d01c4f4f0$3349f5f0$a6028a0a@GOLHARMOBILE1>
Message-ID: <Pine.LNX.4.44.0501070317340.11722-100000@biosysadmin.com>

Ryan,

If you don't need gcc-utils, you can try uninstalling that package and then
instsalling your gd-devel packages. That would be a quick and easy fix, but
hoping for that may be a bit unlikely. I can't imagine why utilities for a 
compiler would need libgd, but that's just me.

Another option is to use the Bioperl RPM from http://biolinux.org/bioperl.html,
IIRC it contains a 2.0.x version of the GD library in it. You may want to try
this on a testing system before risking a production system, but for what it's
worth I've successfully used the RPM on RedHat 9.0 and Fedora Core 2. 

Best of luck solving your problem.

Cheers,

James Thompson

On Fri, 7 Jan 2005, Ryan Golhar wrote:

> Hi all,
> 
> Not sure where this should be posted, so forgive me if I'm posting it in
> the wrong place.
> 
> I'm trying to use Bioperl with RedHat Enterprise Linux v3.  RedHat
> provides gd v1.8, however bioperl requires > 2.something.  
> 
> So, I downloaded gd 2.0.33 and rebuilt it using a spec file I found from
> verion 2.0.16 (I think).  
> 
> When I tried upgrading the package using:
> [root@hydrogen i386]# rpm -Uvh gd-2.0.33-1.i386.rpm
> gd-devel-2.0.33-1.i386.rpm
> 
> I get the error:
> error: Failed dependencies:
>         libgd.so.1.8 is needed by (installed) glibc-utils-2.3.2-95.30
> 
> If I do a 'ls -lp /usr/lib/libgd*', I get:
> 
> -rw-r--r--    1 root     root       212978 Jun 17  2003 /usr/lib/libgd.a
> lrwxrwxrwx    1 root     root           14 Jan  7 13:23
> /usr/lib/libgd.so -> libgd.so.1.8.4
> lrwxrwxrwx    1 root     root           14 Jan  7 13:23
> /usr/lib/libgd.so.1 -> libgd.so.1.8.4
> lrwxrwxrwx    1 root     root           14 Jan  7 13:23
> /usr/lib/libgd.so.1.8 -> libgd.so.1.8.4
> -rwxr-xr-x    1 root     root       183332 Jun 17  2003
> /usr/lib/libgd.so.1.8.4
> 
> 
> If I do a 'rpm -qpl gd-2.0.33-1.i386.rpm', I get:
> 
> /usr/lib/libgd.so.1
> /usr/lib/libgd.so.1.8
> /usr/lib/libgd.so.2
> /usr/lib/libgd.so.2.0.0
> /usr/share/doc/gd-2.0.33
> /usr/share/doc/gd-2.0.33/README-JPEG.TXT
> /usr/share/doc/gd-2.0.33/README.TXT
> /usr/share/doc/gd-2.0.33/entities.html
> /usr/share/doc/gd-2.0.33/index.html
> 
> Here is the spec file I'm using (minus some unnecessary stuff).  Any
> ideas?
> 
> Summary: A graphics library for quick creation of PNG, GIF or JPEG
> images.
> Name: gd
> Version: 2.0.33
> Release: 1
> URL: http://www.boutell.com/gd/
> Source0: http://www.boutell.com/gd/http/gd-%{version}.tar.gz
> License: BSD-style
> Group: System Environment/Libraries
> BuildRoot: %{_tmppath}/%{name}-root
> Prereq: /sbin/ldconfig
> BuildPrereq: freetype-devel, libjpeg-devel, libpng-devel, zlib-devel
> %define shlibver %(echo %{version} | cut -f-2 -d.)
> 
> %prep
> %setup -q
> 
> %build
> ./configure --prefix=$RPM_BUILD_ROOT/usr
> make
> 
> %install
> [ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT
> make install
> (cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1)
> (cd $RPM_BUILD_ROOT/usr/lib && ln -s libgd.so.2.0.0 libgd.so.1.8)
> rm -rf $RPM_BUILD_ROOT%{_libdir}/libgd.la
> 
> %clean
> [ "$RPM_BUILD_ROOT" != "/" ] && rm -fr $RPM_BUILD_ROOT
> 
> %post -p /sbin/ldconfig
> 
> %postun -p /sbin/ldconfig
> 
> %files
> %defattr(-,root,root)
> %doc README.TXT README-JPEG.TXT index.html entities.html
> %{_libdir}/*.so.*
> 
> %files progs
> %defattr(-,root,root)
> %{_bindir}/*
> 
> %files devel
> %defattr(-,root,root)
> %{_includedir}/*
> %{_libdir}/*.so
> %{_libdir}/*.a
> 
> 
> -----
> Ryan Golhar
> Computational Biologist
> The Informatics Institute at
> The University of Medicine & Dentistry of NJ
> 
> Phone: 973-972-5034
> Fax: 973-972-7412
> Email: golharam@umdnj.edu
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From hlapp at gmx.net  Sat Jan  8 02:37:36 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Jan  8 02:34:12 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAsv4vITWl4EipjvWFr5eI0QEAAAAA@ukonline.co.uk>
Message-ID: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net>

You should not require by default that all sequences in one file be of 
the same type (alphabet). We never have required this, nor documented 
that it is a (not enforced) requirement, and so there may be people out 
there relying on this 'feature'.

	-hilmar

On Friday, January 7, 2005, at 03:39  AM, Nathan Haigh wrote:

> There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO 
> object's alphabet is set, next_seq() results in this being undef
> and then proceeds to guess the alphabet again, therefore this like the 
> following do not work:
>
> my $seq_in  = Bio::SeqIO->new(-format=>$format, -fh => \*DATA);
>
> $seq_in->alphabet('protein');
>
> Should setting the SeqIO object's alphabet be honoured even if it is 
> set to the wrong type or the sequences are not of that
> alphabet?
>
>
>
> I have a bug fix, that allows you to set the alphabet through the 
> SeqIO object, but it doesn't do any sort of checking to see if all
> the seqs in the object are of the correct type. Essentially, the 
> alphabet is set in one of the following ways:
>
> 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all 
> the seqs that belong to the $seq_in object obtain their
> alphabet from the SeqIO object, dna in this case, irrespective of 
> whether or not it is actually protein.
>
> 2) If alphabet has not been set in this way, the first sequence is 
> used to guess the alphabet of the SeqIO object, from which all
> the sequences obtain their alphabet.
>
>
>
> Possible limitations:
>
> 1)     all seqs in the SeqIO object can only be of the same type - no 
> testing done to see if this is not the case.
>
>
>
> Does this sound ok and reasonable?
>
> Nathan
>
>
>
> -----Original Message-----
> From: Brian Osborne [mailto:brian_osborne@cognia.com]
> Sent: 06 January 2005 12:25
> To: nathanhaigh@ukonline.co.uk
> Subject: RE: SeqIO fails on masked sequences
>
>
>
> Nathan,
>
>
>
> The idea is that a sequence with a high proportion of X is more likely 
> to be DNA than protein. The examples I had in mind are
> unfinished genomic sequence, and there are countless entries in 
> Genbank/EMBL like this. So, someone wrote in and said that their
> genomic sequence was being characterized as protein since the fraction 
> [gatc] was less than 85%, it was mostly X. By contrast, there
> are no protein sequences with X in them in these public databases, if 
> I'm not mistaken. So I maintain that in the world of public
> databases this is the way to go.
>
>
>
> Now if you venture into the world of sequence analysis it's going to 
> be a different story, since you'll likely mask protein with X,
> not N, obviously. May I ask, if this person knows his/her sequence is 
> protein then why doesn't s/he set its alphabet to "protein"?
> Or why don't they mask with A or Z or O or something?
>
>
>
> They'll be problems either way. What is one's reference? Public 
> sequence or the less well-defined set of possible sequences?
>
>
>
> Brian O.
>
> -----Original Message-----
> From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
> Sent: Wednesday, January 05, 2005 7:38 PM
> To: 'Brian Osborne'
> Subject: FW: SeqIO fails on masked sequences
>
> You committed a change to Bio::PrimarySeq where 'X' was added to the 
> class of characters that are stripped out of sequences in the
> _guess_alphabet subroutine. Do you know why sequences containing X 
> were causing a problem, and why X was added to the class of
> chars?
>
>
>
> It's causing a problem for someone who has a sequence that containes 
> all masked chars (i.e. all X's), which should still be
> "guessable" as protein.
>
>
>
> Cheers
>
> Nathan
>
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0501-0, 04/01/2005
> Tested on: 06/01/2005 00:36:20
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0501-0, 04/01/2005
> Tested on: 07/01/2005 00:35:30
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
>
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0501-0, 04/01/2005
> Tested on: 07/01/2005 11:39:14
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Sat Jan  8 02:45:28 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Jan  8 02:41:53 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <Pine.GSO.4.58.0501062026170.1209@moe.usg.utk.edu>
Message-ID: <43B77B65-6149-11D9-947F-000A959EB4C4@gmx.net>


On Thursday, January 6, 2005, at 06:33  PM, Stefan A Kirov wrote:

> Hilmar,
> Getting back to your post, I have some concern about automatic
> parsing of multiple files (if I got this right...). Say if one 
> downloads
> the whole Entrez Gene stuff and all is OK I don't see why this can't be
> done. But if something goes wrong (and occasionally it will), it will 
> be
> really hard for the user to understand he misses parts of the data.

By going wrong you mean partial downloads resulting from interrupted 
file transfer sessions? If so, then this is no different from parsing 
other (e.g. Genbank) downloaded and therefore possibly truncated files. 
If by wrong you mean certain files are absent, then yes, I mean that 
there presence is optional, and certainly the parser could warn, unless 
warnings are suppressed.

> [...]
> Another issue that comes to mind is the approach of a stream is fine 
> for
> people with the whole DB on their minds. But of you need particular
> record, I guess you you could index the files, but this totally 
> different
> game.

right. You'd write a Bio::Index::<name> module for this.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Sat Jan  8 13:09:56 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Jan  8 13:06:58 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <1105080663.3142.16.camel@localhost.localdomain>
Message-ID: <80BA8B98-61A0-11D9-947F-000A959EB4C4@gmx.net>


On Thursday, January 6, 2005, at 10:51  PM, Peter Robinson wrote:

> On the other hand, parsing multiple Entrez Gene files at once
> in order to synthesize various forms of infomration about an Entrez 
> Gene
> id seemed to depart from the style of the rest of Bio::SeqIO code.

I don't think so at all. It only appears so because most other formats 
happen to come in a single file. The OntologyIO GO parser e.g. takes 
any number of files.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From claratsm at hkusua.hku.hk  Sun Jan  9 11:02:36 2005
From: claratsm at hkusua.hku.hk (claratsm@hkusua.hku.hk)
Date: Sun Jan  9 14:50:34 2005
Subject: [Bioperl-l] Problem about bioperl SeqIO
Message-ID: <1105286556.41e1559cb72f3@imp.webmail.hku.hk>

Hi,

I am a new user of bioperl and I have encountered some problems when using it
for programming. As i try to deal with a very large file, as large as a large
chromosome contigs data, so I use 
    my $seqio_mfa = Bio::SeqIO->new('-file' => $seq_file, 
                                    '-format' => 'largefasta');
However, whenever I get a subseq, it generates one temp file. And the number of
temp files generated is too large (even though most are empty) such that the
program stop with exception...something like cannot create temp file any more.
Am I able to delete the temp file during the running of my program? How can i
get the temp file name of each new sequence generated? If i can get the temp
file name, is it safe to delete the file using rmdir function?

Can anybody help me!!
Thank you


From Marc.Logghe at devgen.com  Sun Jan  9 15:20:18 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Sun Jan  9 15:22:17 2005
Subject: [Bioperl-l] Problem about bioperl SeqIO
Message-ID: <BEE28BF86078B6429D6C780635718E21B3A091@morelia.be.devgen.com>

Hi,
Not sure, but the temporary files should be deleted as soon as the objects are destroyed.
Maybe you keep a lot of Bio::Seq::LargePrimarySeq references in scope so that for all of them the temp files are kept open ?
You *can* get to the filename but it is not intended to do so because it is a private method (_filename).

HTH,
Marc


-----Oorspronkelijk bericht-----
Van: bioperl-l-bounces@portal.open-bio.org namens claratsm@hkusua.hku.hk
Verzonden: zo 9-1-2005 17:02
Aan: bioperl-l@portal.open-bio.org
Onderwerp: [Bioperl-l] Problem about bioperl SeqIO
 
Hi,

I am a new user of bioperl and I have encountered some problems when using it
for programming. As i try to deal with a very large file, as large as a large
chromosome contigs data, so I use 
    my $seqio_mfa = Bio::SeqIO->new('-file' => $seq_file, 
                                    '-format' => 'largefasta');
However, whenever I get a subseq, it generates one temp file. And the number of
temp files generated is too large (even though most are empty) such that the
program stop with exception...something like cannot create temp file any more.
Am I able to delete the temp file during the running of my program? How can i
get the temp file name of each new sequence generated? If i can get the temp
file name, is it safe to delete the file using rmdir function?

Can anybody help me!!
Thank you


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From Peter.Robinson at t-online.de  Sun Jan  9 17:01:02 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Sun Jan  9 16:57:08 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <80BA8B98-61A0-11D9-947F-000A959EB4C4@gmx.net>
References: <80BA8B98-61A0-11D9-947F-000A959EB4C4@gmx.net>
Message-ID: <1105308062.3757.14.camel@localhost.localdomain>

I meant that there is information about a single gene spread across
various Entrez Gene files, so if one were to parse them all at once, one
would have to keep a lot of info in memory, especially since the order
of the entries is not necessarily the same across files; for instance,
gene2unigene is ordered according to the UniGene identifiers, and
gene2accession is not; if one wanted to add the unigene info to all
entries in one fell swoop, this would seem to require keeping entries
either in memory or in some indexed file. 

In contrast, the ontology files you mention are more independent of one
another, so there is no particular difficulty in combining flat files
for the three subontologies.

I am starting to think that it might make the most sense to concentrate
on the ASN.1 files. It think it should be reasonably simple to do this
with a kind of recursive descent strategy, either using some CPAN
modules or perhaps better self-rolled. At the moment I have not seen any
modules that appear to be great candidates for lexing the ASN.1 text
(ideas anyone?). 

-peter

On Sat, 2005-01-08 at 19:09, Hilmar Lapp wrote:
> On Thursday, January 6, 2005, at 10:51  PM, Peter Robinson wrote:
> 
> > On the other hand, parsing multiple Entrez Gene files at once
> > in order to synthesize various forms of infomration about an Entrez 
> > Gene
> > id seemed to depart from the style of the rest of Bio::SeqIO code.
> 
> I don't think so at all. It only appears so because most other formats 
> happen to come in a single file. The OntologyIO GO parser e.g. takes 
> any number of files.
> 
> 	-hilmar
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From wes.barris at csiro.au  Sun Jan  9 18:42:33 2005
From: wes.barris at csiro.au (Wes Barris)
Date: Sun Jan  9 18:39:03 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net>
References: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net>
Message-ID: <41E1C169.4010302@csiro.au>

Hilmar Lapp wrote:
> You should not require by default that all sequences in one file be of 
> the same type (alphabet). We never have required this, nor documented 
> that it is a (not enforced) requirement, and so there may be people out 
> there relying on this 'feature'.

Mixing both DNA and protein sequences in one file and then attempting
to process it seems like kind of a bizarre thing to want to do.  If
the alphabet is explicitly specified, isn't there a way to make that
take precedence?

> 
>     -hilmar
> 
> On Friday, January 7, 2005, at 03:39  AM, Nathan Haigh wrote:
> 
>> There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO 
>> object's alphabet is set, next_seq() results in this being undef
>> and then proceeds to guess the alphabet again, therefore this like the 
>> following do not work:
>>
>> my $seq_in  = Bio::SeqIO->new(-format=>$format, -fh => \*DATA);
>>
>> $seq_in->alphabet('protein');
>>
>> Should setting the SeqIO object's alphabet be honoured even if it is 
>> set to the wrong type or the sequences are not of that
>> alphabet?
>>
>>
>>
>> I have a bug fix, that allows you to set the alphabet through the 
>> SeqIO object, but it doesn't do any sort of checking to see if all
>> the seqs in the object are of the correct type. Essentially, the 
>> alphabet is set in one of the following ways:
>>
>> 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all 
>> the seqs that belong to the $seq_in object obtain their
>> alphabet from the SeqIO object, dna in this case, irrespective of 
>> whether or not it is actually protein.
>>
>> 2) If alphabet has not been set in this way, the first sequence is 
>> used to guess the alphabet of the SeqIO object, from which all
>> the sequences obtain their alphabet.
>>
>>
>>
>> Possible limitations:
>>
>> 1)     all seqs in the SeqIO object can only be of the same type - no 
>> testing done to see if this is not the case.
>>
>>
>>
>> Does this sound ok and reasonable?
>>
>> Nathan
>>
>>
>>
>> -----Original Message-----
>> From: Brian Osborne [mailto:brian_osborne@cognia.com]
>> Sent: 06 January 2005 12:25
>> To: nathanhaigh@ukonline.co.uk
>> Subject: RE: SeqIO fails on masked sequences
>>
>>
>>
>> Nathan,
>>
>>
>>
>> The idea is that a sequence with a high proportion of X is more likely 
>> to be DNA than protein. The examples I had in mind are
>> unfinished genomic sequence, and there are countless entries in 
>> Genbank/EMBL like this. So, someone wrote in and said that their
>> genomic sequence was being characterized as protein since the fraction 
>> [gatc] was less than 85%, it was mostly X. By contrast, there
>> are no protein sequences with X in them in these public databases, if 
>> I'm not mistaken. So I maintain that in the world of public
>> databases this is the way to go.
>>
>>
>>
>> Now if you venture into the world of sequence analysis it's going to 
>> be a different story, since you'll likely mask protein with X,
>> not N, obviously. May I ask, if this person knows his/her sequence is 
>> protein then why doesn't s/he set its alphabet to "protein"?
>> Or why don't they mask with A or Z or O or something?
>>
>>
>>
>> They'll be problems either way. What is one's reference? Public 
>> sequence or the less well-defined set of possible sequences?
>>
>>
>>
>> Brian O.
>>
>> -----Original Message-----
>> From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
>> Sent: Wednesday, January 05, 2005 7:38 PM
>> To: 'Brian Osborne'
>> Subject: FW: SeqIO fails on masked sequences
>>
>> You committed a change to Bio::PrimarySeq where 'X' was added to the 
>> class of characters that are stripped out of sequences in the
>> _guess_alphabet subroutine. Do you know why sequences containing X 
>> were causing a problem, and why X was added to the class of
>> chars?
>>
>>
>>
>> It's causing a problem for someone who has a sequence that containes 
>> all masked chars (i.e. all X's), which should still be
>> "guessable" as protein.
>>
>>
>>
>> Cheers
>>
>> Nathan
>>
>> ---
>> avast! Antivirus: Outbound message clean.
>> Virus Database (VPS): 0501-0, 04/01/2005
>> Tested on: 06/01/2005 00:36:20
>> avast! is copyright (c) 2000-2003 ALWIL Software.
>> http://www.avast.com
>>
>>
>>
>> ---
>> avast! Antivirus: Inbound message clean.
>> Virus Database (VPS): 0501-0, 04/01/2005
>> Tested on: 07/01/2005 00:35:30
>> avast! is copyright (c) 2000-2003 ALWIL Software.
>> http://www.avast.com
>>
>>
>>
>>
>> ---
>> avast! Antivirus: Outbound message clean.
>> Virus Database (VPS): 0501-0, 04/01/2005
>> Tested on: 07/01/2005 11:39:14
>> avast! is copyright (c) 2000-2003 ALWIL Software.
>> http://www.avast.com
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


-- 
Wes Barris
E-Mail: Wes.Barris@csiro.au
From nathanhaigh at ukonline.co.uk  Sun Jan  9 19:35:08 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Sun Jan  9 19:31:55 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <41E1C169.4010302@csiro.au>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAeVeNDtxHsUODBP3z/PTkvQEAAAAA@ukonline.co.uk>

> -----Original Message-----
> From: Wes Barris [mailto:wes.barris@csiro.au]
> Sent: 09 January 2005 23:43
> To: Hilmar Lapp
> Cc: nathanhaigh@ukonline.co.uk; 'Bioperl list'; 'Brian Osborne'
> Subject: Re: [Bioperl-l] RE: SeqIO fails on masked sequences
> 
> Hilmar Lapp wrote:
> > You should not require by default that all sequences in one file be of
> > the same type (alphabet). We never have required this, nor documented
> > that it is a (not enforced) requirement, and so there may be people out
> > there relying on this 'feature'.
> 
> Mixing both DNA and protein sequences in one file and then attempting
> to process it seems like kind of a bizarre thing to want to do.  If
> the alphabet is explicitly specified, isn't there a way to make that
> take precedence?

Why are you then able to set the alphabet of a SeqIO object if whenever you call next_seq() it trys to guess the alphabet of the
sequence anyway? It seems more logical to me, that the user can specify the alphabet without worrying about bioperl guessing it, and
getting it wrong, or not setting it at all.

> 
> >
> >     -hilmar
> >
> > On Friday, January 7, 2005, at 03:39  AM, Nathan Haigh wrote:
> >
> >> There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO
> >> object's alphabet is set, next_seq() results in this being undef
> >> and then proceeds to guess the alphabet again, therefore this like the
> >> following do not work:
> >>
> >> my $seq_in  = Bio::SeqIO->new(-format=>$format, -fh => \*DATA);
> >>
> >> $seq_in->alphabet('protein');
> >>
> >> Should setting the SeqIO object's alphabet be honoured even if it is
> >> set to the wrong type or the sequences are not of that
> >> alphabet?
> >>
> >>
> >>
> >> I have a bug fix, that allows you to set the alphabet through the
> >> SeqIO object, but it doesn't do any sort of checking to see if all
> >> the seqs in the object are of the correct type. Essentially, the
> >> alphabet is set in one of the following ways:
> >>
> >> 1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all
> >> the seqs that belong to the $seq_in object obtain their
> >> alphabet from the SeqIO object, dna in this case, irrespective of
> >> whether or not it is actually protein.
> >>
> >> 2) If alphabet has not been set in this way, the first sequence is
> >> used to guess the alphabet of the SeqIO object, from which all
> >> the sequences obtain their alphabet.
> >>
> >>
> >>
> >> Possible limitations:
> >>
> >> 1)     all seqs in the SeqIO object can only be of the same type - no
> >> testing done to see if this is not the case.
> >>
> >>
> >>
> >> Does this sound ok and reasonable?
> >>
> >> Nathan
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Brian Osborne [mailto:brian_osborne@cognia.com]
> >> Sent: 06 January 2005 12:25
> >> To: nathanhaigh@ukonline.co.uk
> >> Subject: RE: SeqIO fails on masked sequences
> >>
> >>
> >>
> >> Nathan,
> >>
> >>
> >>
> >> The idea is that a sequence with a high proportion of X is more likely
> >> to be DNA than protein. The examples I had in mind are
> >> unfinished genomic sequence, and there are countless entries in
> >> Genbank/EMBL like this. So, someone wrote in and said that their
> >> genomic sequence was being characterized as protein since the fraction
> >> [gatc] was less than 85%, it was mostly X. By contrast, there
> >> are no protein sequences with X in them in these public databases, if
> >> I'm not mistaken. So I maintain that in the world of public
> >> databases this is the way to go.
> >>
> >>
> >>
> >> Now if you venture into the world of sequence analysis it's going to
> >> be a different story, since you'll likely mask protein with X,
> >> not N, obviously. May I ask, if this person knows his/her sequence is
> >> protein then why doesn't s/he set its alphabet to "protein"?
> >> Or why don't they mask with A or Z or O or something?
> >>
> >>
> >>
> >> They'll be problems either way. What is one's reference? Public
> >> sequence or the less well-defined set of possible sequences?
> >>
> >>
> >>
> >> Brian O.
> >>
> >> -----Original Message-----
> >> From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
> >> Sent: Wednesday, January 05, 2005 7:38 PM
> >> To: 'Brian Osborne'
> >> Subject: FW: SeqIO fails on masked sequences
> >>
> >> You committed a change to Bio::PrimarySeq where 'X' was added to the
> >> class of characters that are stripped out of sequences in the
> >> _guess_alphabet subroutine. Do you know why sequences containing X
> >> were causing a problem, and why X was added to the class of
> >> chars?
> >>
> >>
> >>
> >> It's causing a problem for someone who has a sequence that containes
> >> all masked chars (i.e. all X's), which should still be
> >> "guessable" as protein.
> >>
> >>
> >>
> >> Cheers
> >>
> >> Nathan
> >>
> >> ---
> >> avast! Antivirus: Outbound message clean.
> >> Virus Database (VPS): 0501-0, 04/01/2005
> >> Tested on: 06/01/2005 00:36:20
> >> avast! is copyright (c) 2000-2003 ALWIL Software.
> >> http://www.avast.com
> >>
> >>
> >>
> >> ---
> >> avast! Antivirus: Inbound message clean.
> >> Virus Database (VPS): 0501-0, 04/01/2005
> >> Tested on: 07/01/2005 00:35:30
> >> avast! is copyright (c) 2000-2003 ALWIL Software.
> >> http://www.avast.com
> >>
> >>
> >>
> >>
> >> ---
> >> avast! Antivirus: Outbound message clean.
> >> Virus Database (VPS): 0501-0, 04/01/2005
> >> Tested on: 07/01/2005 11:39:14
> >> avast! is copyright (c) 2000-2003 ALWIL Software.
> >> http://www.avast.com
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> 
> 
> --
> Wes Barris
> E-Mail: Wes.Barris@csiro.au
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0501-1, 07/01/2005
> Tested on: 10/01/2005 00:20:13
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0501-1, 07/01/2005
Tested on: 10/01/2005 00:30:15
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From wes.barris at csiro.au  Sun Jan  9 20:05:13 2005
From: wes.barris at csiro.au (Wes Barris)
Date: Sun Jan  9 20:03:10 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAeVeNDtxHsUODBP3z/PTkvQEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAeVeNDtxHsUODBP3z/PTkvQEAAAAA@ukonline.co.uk>
Message-ID: <41E1D4C9.1020806@csiro.au>

Nathan Haigh wrote:

>>-----Original Message-----
>>From: Wes Barris [mailto:wes.barris@csiro.au]
>>Sent: 09 January 2005 23:43
>>To: Hilmar Lapp
>>Cc: nathanhaigh@ukonline.co.uk; 'Bioperl list'; 'Brian Osborne'
>>Subject: Re: [Bioperl-l] RE: SeqIO fails on masked sequences
>>
>>Hilmar Lapp wrote:
>>
>>>You should not require by default that all sequences in one file be of
>>>the same type (alphabet). We never have required this, nor documented
>>>that it is a (not enforced) requirement, and so there may be people out
>>>there relying on this 'feature'.
>>
>>Mixing both DNA and protein sequences in one file and then attempting
>>to process it seems like kind of a bizarre thing to want to do.  If
>>the alphabet is explicitly specified, isn't there a way to make that
>>take precedence?
> 
> 
> Why are you then able to set the alphabet of a SeqIO object if whenever you call next_seq() it trys to guess the alphabet of the
> sequence anyway? It seems more logical to me, that the user can specify the alphabet without worrying about bioperl guessing it, and
> getting it wrong, or not setting it at all.

I am guessing that you meant to direct this question to Hilmar because
I agree with you.  If one specifies the alphabet, bioperl should not
subsequently try to guess it.

> 
> 
>>>    -hilmar
>>>
>>>On Friday, January 7, 2005, at 03:39  AM, Nathan Haigh wrote:
>>>
>>>
>>>>There appears to be an anomaly with Bio::Seq::fasta. If the SeqIO
>>>>object's alphabet is set, next_seq() results in this being undef
>>>>and then proceeds to guess the alphabet again, therefore this like the
>>>>following do not work:
>>>>
>>>>my $seq_in  = Bio::SeqIO->new(-format=>$format, -fh => \*DATA);
>>>>
>>>>$seq_in->alphabet('protein');
>>>>
>>>>Should setting the SeqIO object's alphabet be honoured even if it is
>>>>set to the wrong type or the sequences are not of that
>>>>alphabet?
>>>>
>>>>
>>>>
>>>>I have a bug fix, that allows you to set the alphabet through the
>>>>SeqIO object, but it doesn't do any sort of checking to see if all
>>>>the seqs in the object are of the correct type. Essentially, the
>>>>alphabet is set in one of the following ways:
>>>>
>>>>1) if the SeqIO object is set using e.g. $seq_in->alphabet('dna'); all
>>>>the seqs that belong to the $seq_in object obtain their
>>>>alphabet from the SeqIO object, dna in this case, irrespective of
>>>>whether or not it is actually protein.
>>>>
>>>>2) If alphabet has not been set in this way, the first sequence is
>>>>used to guess the alphabet of the SeqIO object, from which all
>>>>the sequences obtain their alphabet.
>>>>
>>>>
>>>>
>>>>Possible limitations:
>>>>
>>>>1)     all seqs in the SeqIO object can only be of the same type - no
>>>>testing done to see if this is not the case.
>>>>
>>>>
>>>>
>>>>Does this sound ok and reasonable?
>>>>
>>>>Nathan
>>>>
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: Brian Osborne [mailto:brian_osborne@cognia.com]
>>>>Sent: 06 January 2005 12:25
>>>>To: nathanhaigh@ukonline.co.uk
>>>>Subject: RE: SeqIO fails on masked sequences
>>>>
>>>>
>>>>
>>>>Nathan,
>>>>
>>>>
>>>>
>>>>The idea is that a sequence with a high proportion of X is more likely
>>>>to be DNA than protein. The examples I had in mind are
>>>>unfinished genomic sequence, and there are countless entries in
>>>>Genbank/EMBL like this. So, someone wrote in and said that their
>>>>genomic sequence was being characterized as protein since the fraction
>>>>[gatc] was less than 85%, it was mostly X. By contrast, there
>>>>are no protein sequences with X in them in these public databases, if
>>>>I'm not mistaken. So I maintain that in the world of public
>>>>databases this is the way to go.
>>>>
>>>>
>>>>
>>>>Now if you venture into the world of sequence analysis it's going to
>>>>be a different story, since you'll likely mask protein with X,
>>>>not N, obviously. May I ask, if this person knows his/her sequence is
>>>>protein then why doesn't s/he set its alphabet to "protein"?
>>>>Or why don't they mask with A or Z or O or something?
>>>>
>>>>
>>>>
>>>>They'll be problems either way. What is one's reference? Public
>>>>sequence or the less well-defined set of possible sequences?
>>>>
>>>>
>>>>
>>>>Brian O.
>>>>
>>>>-----Original Message-----
>>>>From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
>>>>Sent: Wednesday, January 05, 2005 7:38 PM
>>>>To: 'Brian Osborne'
>>>>Subject: FW: SeqIO fails on masked sequences
>>>>
>>>>You committed a change to Bio::PrimarySeq where 'X' was added to the
>>>>class of characters that are stripped out of sequences in the
>>>>_guess_alphabet subroutine. Do you know why sequences containing X
>>>>were causing a problem, and why X was added to the class of
>>>>chars?
>>>>
>>>>
>>>>
>>>>It's causing a problem for someone who has a sequence that containes
>>>>all masked chars (i.e. all X's), which should still be
>>>>"guessable" as protein.
>>>>
>>>>
>>>>
>>>>Cheers
>>>>
>>>>Nathan
>>>>
>>>>---
>>>>avast! Antivirus: Outbound message clean.
>>>>Virus Database (VPS): 0501-0, 04/01/2005
>>>>Tested on: 06/01/2005 00:36:20
>>>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>>>http://www.avast.com
>>>>
>>>>
>>>>
>>>>---
>>>>avast! Antivirus: Inbound message clean.
>>>>Virus Database (VPS): 0501-0, 04/01/2005
>>>>Tested on: 07/01/2005 00:35:30
>>>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>>>http://www.avast.com
>>>>
>>>>
>>>>
>>>>
>>>>---
>>>>avast! Antivirus: Outbound message clean.
>>>>Virus Database (VPS): 0501-0, 04/01/2005
>>>>Tested on: 07/01/2005 11:39:14
>>>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>>>http://www.avast.com
>>>>
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l@portal.open-bio.org
>>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>
>>
>>--
>>Wes Barris
>>E-Mail: Wes.Barris@csiro.au
>>---
>>avast! Antivirus: Inbound message clean.
>>Virus Database (VPS): 0501-1, 07/01/2005
>>Tested on: 10/01/2005 00:20:13
>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>http://www.avast.com
>>
>>
> 
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0501-1, 07/01/2005
> Tested on: 10/01/2005 00:30:15
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 
> 


-- 
Wes Barris
E-Mail: Wes.Barris@csiro.au
From hlapp at gmx.net  Mon Jan 10 03:13:35 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon Jan 10 03:11:55 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <41E1D4C9.1020806@csiro.au>
Message-ID: <866FB518-62DF-11D9-911B-000A959EB4C4@gmx.net>


On Sunday, January 9, 2005, at 05:05  PM, Wes Barris wrote:

>>> Hilmar Lapp wrote:
>>>
>>>> You should not require by default that all sequences in one file be 
>>>> of
>>>> the same type (alphabet). We never have required this, nor 
>>>> documented
>>>> that it is a (not enforced) requirement, and so there may be people 
>>>> out
>>>> there relying on this 'feature'.
>>>
>>> Mixing both DNA and protein sequences in one file and then attempting
>>> to process it seems like kind of a bizarre thing to want to do.  If
>>> the alphabet is explicitly specified, isn't there a way to make that
>>> take precedence?
>> Why are you then able to set the alphabet of a SeqIO object if 
>> whenever you call next_seq() it trys to guess the alphabet of the
>> sequence anyway? It seems more logical to me, that the user can 
>> specify the alphabet without worrying about bioperl guessing it, and
>> getting it wrong, or not setting it at all.
>
> I am guessing that you meant to direct this question to Hilmar because
> I agree with you.  If one specifies the alphabet, bioperl should not
> subsequently try to guess it.

Right, that's what I agree with too. If an alphabet set for the stream 
gets reset to undef after every sequence then I'd call that a bug.

My point was, if the user doesn't specify the alphabet, then don't make 
assumptions that you don't absolutely have to make. You had suggested 
to guess the alphabet from the first sequence in this case and then 
assume every subsequent sequence in that stream will have that same 
alphabet. That's what I think is not a good idea and not necessary 
either. If the user doesn't preset the alphabet, just keep on guessing 
for every new sequence.

Mixing alphabets is indeed bizarre but people who do bizarre things are 
everywhere.

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From nathanhaigh at ukonline.co.uk  Mon Jan 10 03:50:02 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Mon Jan 10 03:46:40 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <866FB518-62DF-11D9-911B-000A959EB4C4@gmx.net>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAlFUv3RREqkuAwfgoVlKckwEAAAAA@ukonline.co.uk>

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net]
> Sent: 10 January 2005 08:14
> To: Wes Barris
> Cc: nathanhaigh@ukonline.co.uk; 'Bioperl list'; 'Brian Osborne'
> Subject: Re: [Bioperl-l] RE: SeqIO fails on masked sequences
> 
> 
> On Sunday, January 9, 2005, at 05:05  PM, Wes Barris wrote:
> 
> >>> Hilmar Lapp wrote:
> >>>
> >>>> You should not require by default that all sequences in one file be
> >>>> of
> >>>> the same type (alphabet). We never have required this, nor
> >>>> documented
> >>>> that it is a (not enforced) requirement, and so there may be people
> >>>> out
> >>>> there relying on this 'feature'.
> >>>
> >>> Mixing both DNA and protein sequences in one file and then attempting
> >>> to process it seems like kind of a bizarre thing to want to do.  If
> >>> the alphabet is explicitly specified, isn't there a way to make that
> >>> take precedence?
> >> Why are you then able to set the alphabet of a SeqIO object if
> >> whenever you call next_seq() it trys to guess the alphabet of the
> >> sequence anyway? It seems more logical to me, that the user can
> >> specify the alphabet without worrying about bioperl guessing it, and
> >> getting it wrong, or not setting it at all.
> >
> > I am guessing that you meant to direct this question to Hilmar because
> > I agree with you.  If one specifies the alphabet, bioperl should not
> > subsequently try to guess it.
> 
> Right, that's what I agree with too. If an alphabet set for the stream
> gets reset to undef after every sequence then I'd call that a bug.
> 

agreed :o)

> My point was, if the user doesn't specify the alphabet, then don't make
> assumptions that you don't absolutely have to make. You had suggested
> to guess the alphabet from the first sequence in this case and then
> assume every subsequent sequence in that stream will have that same
> alphabet. That's what I think is not a good idea and not necessary
> either. If the user doesn't preset the alphabet, just keep on guessing
> for every new sequence.
> 

Hmm, yes I think the former was what I had suggested, but soon realised this wasn't a good thing and forgot to correct myself later.
I'll get this fix ready today hopefully.

Nath

> Mixing alphabets is indeed bizarre but people who do bizarre things are
> everywhere.
> 
> 	-hilmar
> 
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0501-1, 07/01/2005
> Tested on: 10/01/2005 08:36:48
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0501-1, 07/01/2005
Tested on: 10/01/2005 08:49:42
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From taerwin at tpg.com.au  Mon Jan 10 18:47:47 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Mon Jan 10 18:46:46 2005
Subject: [Bioperl-l] Storing Blast object in a local database
Message-ID: <1105400867.4274.4.camel@bacp4>

Hi all,

Is it possible to store a blast object
(Bio::Search::Result::BlastResult) in a mysql database?

Any pointers would be appreciated.

Regards,

Tim

From barry.moore at genetics.utah.edu  Mon Jan 10 20:10:28 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon Jan 10 20:06:57 2005
Subject: [Bioperl-l] BioDB.pm
Message-ID: <41E32784.6060409@genetics.utah.edu>

I've just installed bioperl 1.4 (bioperl-core, bioperl-run and 
bioperl-db) on a new system (Debian woody). I run a test script that 
works fine on my old system and get an error that BioDB.pm can't be 
found.  Sure enough BioDB.pm isn't on my new system, but it is on the 
old (also bioperl 1.4 Debian woody).  I look in cvs and BioDB.pm is 
there, but I look in the distribution downloaded from bioperl.org and it 
seems to be missing BioDB.pm and several other files?  I can get the 
files from cvs, but is this an error in the distribution file?

Barry

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From smarkel at scitegic.com  Mon Jan 10 21:05:04 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 10 21:01:59 2005
Subject: [Bioperl-l] RE: SeqIO fails on masked sequences
In-Reply-To: <41E1C169.4010302@csiro.au>
References: <2AA3B49A-6148-11D9-947F-000A959EB4C4@gmx.net>
	<41E1C169.4010302@csiro.au>
Message-ID: <41E33450.5080809@scitegic.com>

PDB distibutes a FASTA file of the sequences associated with
the structures in the database.  The FASTA file contains both
nucleotides and proteins.  See pdb_seqres.txt in
ftp://ftp.rcsb.org/pub/pdb/derived_data/.

Scott

Wes Barris wrote:
> Hilmar Lapp wrote:
> 
>> You should not require by default that all sequences in one file be of 
>> the same type (alphabet). We never have required this, nor documented 
>> that it is a (not enforced) requirement, and so there may be people 
>> out there relying on this 'feature'.
> 
> 
> Mixing both DNA and protein sequences in one file and then attempting
> to process it seems like kind of a bizarre thing to want to do.  If
> the alphabet is explicitly specified, isn't there a way to make that
> take precedence?
> 
>>
>>     -hilmar
-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

From hlapp at gnf.org  Mon Jan 10 23:03:52 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jan 10 23:00:22 2005
Subject: [Bioperl-l] BioDB.pm
In-Reply-To: <41E32784.6060409@genetics.utah.edu>
References: <41E32784.6060409@genetics.utah.edu>
Message-ID: <CE257821-6385-11D9-8B3A-000A95AE92B0@gnf.org>

Did the test script that you ran come with bioperl? Bio::DB::BioDB 
comes with bioperl-db, and is not needed for anything else. Also, 
bioperl-db is not included in the bioperl 1.4 distribution. If you want 
it, you do need to obtain from CVS at this point. Let me know if you 
have problems with that.

Also, if you do want to use bioperl-db I do recommend you obtain 
bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 
release. The 1.4.0 release has problems in the interpro and GO ontology 
parsers, and 1.4.1 was never released in anticipation of 1.5.

	-hilmar

On Jan 10, 2005, at 5:10 PM, Barry Moore wrote:

> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and 
> bioperl-db) on a new system (Debian woody). I run a test script that 
> works fine on my old system and get an error that BioDB.pm can't be 
> found.  Sure enough BioDB.pm isn't on my new system, but it is on the 
> old (also bioperl 1.4 Debian woody).  I look in cvs and BioDB.pm is 
> there, but I look in the distribution downloaded from bioperl.org and 
> it seems to be missing BioDB.pm and several other files?  I can get 
> the files from cvs, but is this an error in the distribution file?
>
> Barry
>
> -- 
> Barry Moore
> Dept. of Human Genetics
> University of Utah
> Salt Lake City, UT
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From Marc.Logghe at devgen.com  Tue Jan 11 04:18:17 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Tue Jan 11 04:18:12 2005
Subject: [Bioperl-l] Storing Blast object in a local database
Message-ID: <BEE28BF86078B6429D6C780635718E219050AF@morelia.be.devgen.com>

Hi,

> Is it possible to store a blast object
> (Bio::Search::Result::BlastResult) in a mysql database?
> 
> Any pointers would be appreciated.

I only know that the biosql schema contains a SIMILARITY table that should be suited to store similarity results. However:
a) no fields are available to store the homology strings (query, hsp, consensus) and b) no API code is available (yet) to load the objects. Guess Hilmar can tell more about this.
It is possible however to store blast results in a GFF or Chado database.
Quite a while ago (before the time that the gbrowse plugin Aligner.pm existed) we turned blast results into GFF format. We used tags to store the homology strings. Of course, this also needed to make a customized plugin in order to dump the alignments afterwards. BTW, Bioperl contains a script to turn SearchIO results into GFF (bp_search2gff.pl) but needs some adaptations in case you also want to have the homology strings.
Like I already mentioned, gbrowse actually stores alignments (bla(s)t results) in Chado and these can be dumped using the Aligner plugin. See for yourself at http://www.wormbase.org/db/seq/gbrowse/wormbase?name=I%3A12765180..12775179;source=wormbase;width=800;version=100;label=CG-OP-ESTB-ESTO and dump the alignments. I am not sure about how everything is stored in the database and how the alignments are regenerated. I asume both the hits and query sequences are in the database plus the search results as features. Based on the locations associated with the features and the sequences, the alignements are regenerated by the plugin.
Of course all this runs in the framework of gbrowse and this is probably not what you need.
BioSQL would be a better option in case you only want to store the results and you don't need a gbrowse environment. But then, you need to write the API ;-)

Regards,
Marc

From nathanhaigh at ukonline.co.uk  Tue Jan 11 04:35:46 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Tue Jan 11 04:32:20 2005
Subject: [Bioperl-l] developer cvs login
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAlZ4kiJierUGwqPD6bXHpKwEAAAAA@ukonline.co.uk>

I have recently received a developer cvs login account, but I'm unsure how to login. I will mainly use a Windows box but I also use
Linux. I have cvsnt installed under windows and have used it to checkout bioperl anonymously, but don't know how to login and commit
to cvs, could one of the existing developers help me out?

 
Thanks

Nathan

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-0, 10/01/2005
Tested on: 11/01/2005 09:32:18
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From Marc.Logghe at devgen.com  Tue Jan 11 04:47:36 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Tue Jan 11 04:44:19 2005
Subject: [Bioperl-l] developer cvs login
Message-ID: <BEE28BF86078B6429D6C780635718E219050B1@morelia.be.devgen.com>

Hi Nathan,
by coincidence I had to find the very same out for myself, a split second ago.
The stuff I needed to know was here:
http://bioperl.org/UserInfo/CVShelp.shtml

HTH,
Marc

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of 
> Nathan Haigh
> Sent: Tuesday, January 11, 2005 10:36 AM
> To: 'Bioperl list'
> Subject: [Bioperl-l] developer cvs login
> 
> 
> I have recently received a developer cvs login account, but 
> I'm unsure how to login. I will mainly use a Windows box but 
> I also use
> Linux. I have cvsnt installed under windows and have used it 
> to checkout bioperl anonymously, but don't know how to login 
> and commit
> to cvs, could one of the existing developers help me out?
> 
>  
> 
> Thanks
> 
> Nathan
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0502-0, 10/01/2005
> Tested on: 11/01/2005 09:32:18
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From nathanhaigh at ukonline.co.uk  Tue Jan 11 06:54:07 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Tue Jan 11 06:50:49 2005
Subject: [Bioperl-l] developer cvs login
In-Reply-To: <BEE28BF86078B6429D6C780635718E219050B1@morelia.be.devgen.com>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA94dqbECdMEqN5KNoCxg4oQEAAAAA@ukonline.co.uk>

Hmm, no go as far as getting it to work from a windows box without cygwin. Does anyone know if/how to setup ssh for windows, should
it be possible to get putty (or something else) as the ssh client?

Thanks
Nathan

> -----Original Message-----
> From: Marc Logghe [mailto:Marc.Logghe@devgen.com]
> Sent: 11 January 2005 09:48
> To: nathanhaigh@ukonline.co.uk; Bioperl list
> Subject: RE: [Bioperl-l] developer cvs login
> 
> Hi Nathan,
> by coincidence I had to find the very same out for myself, a split second ago.
> The stuff I needed to know was here:
> http://bioperl.org/UserInfo/CVShelp.shtml
> 
> HTH,
> Marc
> 
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org
> > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of
> > Nathan Haigh
> > Sent: Tuesday, January 11, 2005 10:36 AM
> > To: 'Bioperl list'
> > Subject: [Bioperl-l] developer cvs login
> >
> >
> > I have recently received a developer cvs login account, but
> > I'm unsure how to login. I will mainly use a Windows box but
> > I also use
> > Linux. I have cvsnt installed under windows and have used it
> > to checkout bioperl anonymously, but don't know how to login
> > and commit
> > to cvs, could one of the existing developers help me out?
> >
> >
> >
> > Thanks
> >
> > Nathan
> >
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0502-0, 10/01/2005
> > Tested on: 11/01/2005 09:32:18
> > avast! is copyright (c) 2000-2003 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0502-0, 10/01/2005
> Tested on: 11/01/2005 09:52:06
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-0, 10/01/2005
Tested on: 11/01/2005 11:53:53
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-0, 10/01/2005
Tested on: 11/01/2005 11:54:05
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From sdavis2 at mail.nih.gov  Tue Jan 11 06:22:48 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Jan 11 07:30:26 2005
Subject: [Bioperl-l] Storing Blast object in a local database
In-Reply-To: <BEE28BF86078B6429D6C780635718E219050AF@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E219050AF@morelia.be.devgen.com>
Message-ID: <1FB5849E-63C3-11D9-A2FE-000D933565E8@mail.nih.gov>

I'm not positive how Wormbase does it, but in my version of Gbrowse,  
the sequences are stored and the aligner plugin aligns them (the ones  
in the current window), irrespective of the blat results, which are  
stored as any features are stored (something compatible with GFF).  So,  
the realignment doesn't rely on the blat results.

Sean

On Jan 11, 2005, at 4:18 AM, Marc Logghe wrote:

> Hi,
>
>> Is it possible to store a blast object
>> (Bio::Search::Result::BlastResult) in a mysql database?
>>
>> Any pointers would be appreciated.
>
> I only know that the biosql schema contains a SIMILARITY table that  
> should be suited to store similarity results. However:
> a) no fields are available to store the homology strings (query, hsp,  
> consensus) and b) no API code is available (yet) to load the objects.  
> Guess Hilmar can tell more about this.
> It is possible however to store blast results in a GFF or Chado  
> database.
> Quite a while ago (before the time that the gbrowse plugin Aligner.pm  
> existed) we turned blast results into GFF format. We used tags to  
> store the homology strings. Of course, this also needed to make a  
> customized plugin in order to dump the alignments afterwards. BTW,  
> Bioperl contains a script to turn SearchIO results into GFF  
> (bp_search2gff.pl) but needs some adaptations in case you also want to  
> have the homology strings.
> Like I already mentioned, gbrowse actually stores alignments (bla(s)t  
> results) in Chado and these can be dumped using the Aligner plugin.  
> See for yourself at  
> http://www.wormbase.org/db/seq/gbrowse/wormbase? 
> name=I%3A12765180..12775179;source=wormbase;width=800;version=100; 
> label=CG-OP-ESTB-ESTO and dump the alignments. I am not sure about how  
> everything is stored in the database and how the alignments are  
> regenerated. I asume both the hits and query sequences are in the  
> database plus the search results as features. Based on the locations  
> associated with the features and the sequences, the alignements are  
> regenerated by the plugin.
> Of course all this runs in the framework of gbrowse and this is  
> probably not what you need.
> BioSQL would be a better option in case you only want to store the  
> results and you don't need a gbrowse environment. But then, you need  
> to write the API ;-)
>
> Regards,
> Marc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From Marc.Logghe at devgen.com  Tue Jan 11 08:23:55 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Tue Jan 11 08:22:25 2005
Subject: [Bioperl-l] developer cvs login
Message-ID: <BEE28BF86078B6429D6C780635718E219050B8@morelia.be.devgen.com>


> -----Original Message-----
> From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
> Sent: Tuesday, January 11, 2005 12:54 PM
> To: Marc Logghe; 'Bioperl list'
> Subject: RE: [Bioperl-l] developer cvs login
> 
> 
> Hmm, no go as far as getting it to work from a windows box 
> without cygwin. Does anyone know if/how to setup ssh for 
> windows, should
> it be possible to get putty (or something else) as the ssh client?

Have you tried wincvs ? I only tested it with pserver connection, not with ssh, but I think it is supported.
http://www.wincvs.org/

HTH,
Marc

From barry.moore at genetics.utah.edu  Tue Jan 11 11:59:31 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 11 11:55:55 2005
Subject: [Bioperl-l] developer cvs login
In-Reply-To: <BEE28BF86078B6429D6C780635718E219050B8@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E219050B8@morelia.be.devgen.com>
Message-ID: <41E405F3.7010007@genetics.utah.edu>

Nathan,

Not sure if you are asking just if ssh works from windows or if ssh 
works to connect to bioperl cvs from windows.  If you question is the 
first, then the answer is yes.  You should be able to use putty, 
openSSH, or others.  I use and like the free version from ssh.com.  You 
can find it here:  http://ftp.ssh.com/pub/ssh/SSHSecureShellClient-3.2.9.exe

Barry

Marc Logghe wrote:

>  
>
>>-----Original Message-----
>>From: Nathan Haigh [mailto:nathanhaigh@ukonline.co.uk]
>>Sent: Tuesday, January 11, 2005 12:54 PM
>>To: Marc Logghe; 'Bioperl list'
>>Subject: RE: [Bioperl-l] developer cvs login
>>
>>
>>Hmm, no go as far as getting it to work from a windows box 
>>without cygwin. Does anyone know if/how to setup ssh for 
>>windows, should
>>it be possible to get putty (or something else) as the ssh client?
>>    
>>
>
>Have you tried wincvs ? I only tested it with pserver connection, not with ssh, but I think it is supported.
>http://www.wincvs.org/
>
>HTH,
>Marc
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

From nathanhaigh at ukonline.co.uk  Tue Jan 11 12:03:07 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Tue Jan 11 12:00:15 2005
Subject: [Bioperl-l] developer cvs login
In-Reply-To: <A74F87D0-63EF-11D9-93F3-000393C44276@duke.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAfTxv/ZzruUGEhKSHRh68MwEAAAAA@ukonline.co.uk>

Thanks Jason

The problem I was having was the lack of info available for a windows client. I have now managed to get things working by
installing:
ftp://ftp.ssh.com/pub/ssh/SSHSecureShellClient-3.2.9.exe
and setting the windows env variable CVS_RSH = ssh2

Executing:
cvs -d :ext:nathan@pub.open-bio.org:/home/repository/bioperl co bioperl-live
using cvsnt (v 2.0.51d) now works fine!

Nathan

> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 11 January 2005 16:42
> To: nathanhaigh@ukonline.co.uk
> Cc: 'Marc Logghe'; Open-Bio Admins
> Subject: Re: [Bioperl-l] developer cvs login
> 
> You probably should have gotten a copy of the newuser info.  I send it
> out when I create new accounts - am attaching it now.

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-1, 11/01/2005
Tested on: 11/01/2005 17:02:58
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From barry.moore at genetics.utah.edu  Tue Jan 11 17:04:38 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 11 17:01:05 2005
Subject: [Bioperl-l] Using a bioperl cvs checkout
In-Reply-To: <CE257821-6385-11D9-8B3A-000A95AE92B0@gnf.org>
References: <41E32784.6060409@genetics.utah.edu>
	<CE257821-6385-11D9-8B3A-000A95AE92B0@gnf.org>
Message-ID: <41E44D76.2090402@genetics.utah.edu>

Hilmar (or others)-

The bioperl cvs documentation was great, and I've managed to checked out 
bioperl-live, bioperl-db and bioperl-run from anonymous cvs into a 
directory off of my home.  Now I've got a couple of questions about how 
to best utilize this code from cvs.  I've got an existing installation 
of bioperl 1.4 which I probably don't need to duplicate, but I'm unsure 
of what is the best way to utilize the code from cvs.  I see that the 
cvs checkout comes with Makefile.PL etc. Should I run make process on 
the cvs checkout and let it install everything into my standard perl 
library location, or should I keep my cvs checkouts seperate and tell 
perl where it is?  I don't have a developer account on bioperl cvs, so I 
won't be commiting (or even changing my local copy) at this point, but I 
might as well do things the right way and from reading 'Open Source 
Development with CVS' it seems like I ought to be using the cvs checkout 
without 'installing' it or moving it anywhere.  If I keep them seperate 
the bioperl cvs docs suggest to export PERL5LIB='$HOME/src/bioperl' . If 
I do that I think perl will see two copies of the bioperl modules when I 
run a script (the cvs copy and the installed 1.4 copy).  How do I know 
which copy of the modules a script will be using?  I don't want to 
completely do away with the system installation of bioperl 1.4 because 
another user is using that.

Barry

Hilmar Lapp wrote:

> Did the test script that you ran come with bioperl? Bio::DB::BioDB 
> comes with bioperl-db, and is not needed for anything else. Also, 
> bioperl-db is not included in the bioperl 1.4 distribution. If you 
> want it, you do need to obtain from CVS at this point. Let me know if 
> you have problems with that.
>
> Also, if you do want to use bioperl-db I do recommend you obtain 
> bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 
> release. The 1.4.0 release has problems in the interpro and GO 
> ontology parsers, and 1.4.1 was never released in anticipation of 1.5.
>
>     -hilmar
>
> On Jan 10, 2005, at 5:10 PM, Barry Moore wrote:
>
>> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and 
>> bioperl-db) on a new system (Debian woody). I run a test script that 
>> works fine on my old system and get an error that BioDB.pm can't be 
>> found.  Sure enough BioDB.pm isn't on my new system, but it is on the 
>> old (also bioperl 1.4 Debian woody).  I look in cvs and BioDB.pm is 
>> there, but I look in the distribution downloaded from bioperl.org and 
>> it seems to be missing BioDB.pm and several other files?  I can get 
>> the files from cvs, but is this an error in the distribution file?
>>
>> Barry
>>
>> -- 
>> Barry Moore
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

From allenday at ucla.edu  Tue Jan 11 18:28:43 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 11 17:27:03 2005
Subject: [Bioperl-l] Using a bioperl cvs checkout
In-Reply-To: <41E44D76.2090402@genetics.utah.edu>
References: <41E32784.6060409@genetics.utah.edu>
	<CE257821-6385-11D9-8B3A-000A95AE92B0@gnf.org>
	<41E44D76.2090402@genetics.utah.edu>
Message-ID: <Pine.LNX.4.58.0501111526310.27511@sumo.ctrl.ucla.edu>

in a calling perl script you can add a line like:

use lib 'path/to/bioperl-live';

and it will take precedence over PERL5LIB / PERLLIB / @INC / etc.

or you can do it like this (my preferred method for using an alternate lib 
for a one-off:

perl -Ipath/to/bioperl-live path/to/myscript.pl

this basically puts the '-I' argument into the 0th slot of @INC so it gets 
used first.  you can give multiple '-I' args if needed.

-Allen


On Tue, 11 Jan 2005, Barry Moore wrote:

> Hilmar (or others)-
> 
> The bioperl cvs documentation was great, and I've managed to checked out 
> bioperl-live, bioperl-db and bioperl-run from anonymous cvs into a 
> directory off of my home.  Now I've got a couple of questions about how 
> to best utilize this code from cvs.  I've got an existing installation 
> of bioperl 1.4 which I probably don't need to duplicate, but I'm unsure 
> of what is the best way to utilize the code from cvs.  I see that the 
> cvs checkout comes with Makefile.PL etc. Should I run make process on 
> the cvs checkout and let it install everything into my standard perl 
> library location, or should I keep my cvs checkouts seperate and tell 
> perl where it is?  I don't have a developer account on bioperl cvs, so I 
> won't be commiting (or even changing my local copy) at this point, but I 
> might as well do things the right way and from reading 'Open Source 
> Development with CVS' it seems like I ought to be using the cvs checkout 
> without 'installing' it or moving it anywhere.  If I keep them seperate 
> the bioperl cvs docs suggest to export PERL5LIB='$HOME/src/bioperl' . If 
> I do that I think perl will see two copies of the bioperl modules when I 
> run a script (the cvs copy and the installed 1.4 copy).  How do I know 
> which copy of the modules a script will be using?  I don't want to 
> completely do away with the system installation of bioperl 1.4 because 
> another user is using that.
> 
> Barry
> 
> Hilmar Lapp wrote:
> 
> > Did the test script that you ran come with bioperl? Bio::DB::BioDB 
> > comes with bioperl-db, and is not needed for anything else. Also, 
> > bioperl-db is not included in the bioperl 1.4 distribution. If you 
> > want it, you do need to obtain from CVS at this point. Let me know if 
> > you have problems with that.
> >
> > Also, if you do want to use bioperl-db I do recommend you obtain 
> > bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5 
> > release. The 1.4.0 release has problems in the interpro and GO 
> > ontology parsers, and 1.4.1 was never released in anticipation of 1.5.
> >
> >     -hilmar
> >
> > On Jan 10, 2005, at 5:10 PM, Barry Moore wrote:
> >
> >> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and 
> >> bioperl-db) on a new system (Debian woody). I run a test script that 
> >> works fine on my old system and get an error that BioDB.pm can't be 
> >> found.  Sure enough BioDB.pm isn't on my new system, but it is on the 
> >> old (also bioperl 1.4 Debian woody).  I look in cvs and BioDB.pm is 
> >> there, but I look in the distribution downloaded from bioperl.org and 
> >> it seems to be missing BioDB.pm and several other files?  I can get 
> >> the files from cvs, but is this an error in the distribution file?
> >>
> >> Barry
> >>
> >> -- 
> >> Barry Moore
> >> Dept. of Human Genetics
> >> University of Utah
> >> Salt Lake City, UT
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> 
> 
From brian_osborne at cognia.com  Tue Jan 11 21:41:47 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Tue Jan 11 21:38:42 2005
Subject: [Bioperl-l] Using a bioperl cvs checkout
In-Reply-To: <41E44D76.2090402@genetics.utah.edu>
Message-ID: <GAEDKMGOKFBLJPKCLKCCMEGDEHAA.brian_osborne@cognia.com>

Barry,

By setting PERL5LIB to some directory you're telling Perl to search that
directory first when searching for the module or modules in question. So
yes, Perl will have at least 2 directories in its @INC variable but it will
use the modules it finds first, in PERL5LIB, and ignore the rest. This is
analogous to how the OS treats the PATH variable. I commend you on your
clever setup, you have the best of both worlds this way.

Brian O.


105 ~>perl -e 'print @INC'
/usr/lib/perl5/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/5.8.2/usr/lib/p
erl5
/site_perl/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/site_perl/5.8.2/usr
/lib
/perl5/site_perl

106 ~>setenv PERL5LIB /fake

107 ~>perl -e 'print @INC'
/fake/usr/lib/perl5/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/5.8.2/usr/
lib/
perl5/site_perl/5.8.2/cygwin-thread-multi-64int/usr/lib/perl5/site_perl/5.8.
2/us
r/lib/perl5/site_perl


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Barry Moore
Sent: Tuesday, January 11, 2005 5:05 PM
To: Hilmar Lapp
Cc: bioperl
Subject: [Bioperl-l] Using a bioperl cvs checkout


Hilmar (or others)-

The bioperl cvs documentation was great, and I've managed to checked out
bioperl-live, bioperl-db and bioperl-run from anonymous cvs into a
directory off of my home.  Now I've got a couple of questions about how
to best utilize this code from cvs.  I've got an existing installation
of bioperl 1.4 which I probably don't need to duplicate, but I'm unsure
of what is the best way to utilize the code from cvs.  I see that the
cvs checkout comes with Makefile.PL etc. Should I run make process on
the cvs checkout and let it install everything into my standard perl
library location, or should I keep my cvs checkouts seperate and tell
perl where it is?  I don't have a developer account on bioperl cvs, so I
won't be commiting (or even changing my local copy) at this point, but I
might as well do things the right way and from reading 'Open Source
Development with CVS' it seems like I ought to be using the cvs checkout
without 'installing' it or moving it anywhere.  If I keep them seperate
the bioperl cvs docs suggest to export PERL5LIB='$HOME/src/bioperl' . If
I do that I think perl will see two copies of the bioperl modules when I
run a script (the cvs copy and the installed 1.4 copy).  How do I know
which copy of the modules a script will be using?  I don't want to
completely do away with the system installation of bioperl 1.4 because
another user is using that.

Barry

Hilmar Lapp wrote:

> Did the test script that you ran come with bioperl? Bio::DB::BioDB
> comes with bioperl-db, and is not needed for anything else. Also,
> bioperl-db is not included in the bioperl 1.4 distribution. If you
> want it, you do need to obtain from CVS at this point. Let me know if
> you have problems with that.
>
> Also, if you do want to use bioperl-db I do recommend you obtain
> bioperl 1.4 from the CVS branch as well, or otherwise wait for the 1.5
> release. The 1.4.0 release has problems in the interpro and GO
> ontology parsers, and 1.4.1 was never released in anticipation of 1.5.
>
>     -hilmar
>
> On Jan 10, 2005, at 5:10 PM, Barry Moore wrote:
>
>> I've just installed bioperl 1.4 (bioperl-core, bioperl-run and
>> bioperl-db) on a new system (Debian woody). I run a test script that
>> works fine on my old system and get an error that BioDB.pm can't be
>> found.  Sure enough BioDB.pm isn't on my new system, but it is on the
>> old (also bioperl 1.4 Debian woody).  I look in cvs and BioDB.pm is
>> there, but I look in the distribution downloaded from bioperl.org and
>> it seems to be missing BioDB.pm and several other files?  I can get
>> the files from cvs, but is this an error in the distribution file?
>>
>> Barry
>>
>> --
>> Barry Moore
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>

--
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From davidg at lsi.upc.edu  Wed Jan 12 06:35:52 2005
From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=)
Date: Wed Jan 12 06:32:44 2005
Subject: [Bioperl-l] Getting entire descriptions from FASTA files
Message-ID: <006e01c4f89a$e25108b0$cf1e5393@Davidg>

Hello.

I want to get the entire description line in FASTA format (i mean: description, accession number, etc...). Ive tried with display_id this way:

my $seq_inIO     = Bio::SeqIO->new(-file => "$proteasa",
                         -format => 'Fasta');

my $seq_in        = $seq_inIO->next_seq();

my $id_peptid = $seq_in->display_id;
 
but I only obtain the gi and gb numbers, not the description line.

Then, I tried with $seq_in->desc instead of $seq_in->display_id , but then I only obtain the description (or part of it). 

Is there a way to get the entire description line the same way you see it at the FASTA file?

Thanks.

--
David Garc?a Cort?s
Instituto Nacional de Bioinform?tica (INB)
Nodo Computacional GNHC-2 UPC-CIRI
c/. Jordi Girona 1-3              
Modul C6-E201                   Tel.  : 934 011 650
E-08034 Barcelona               Fax   : 934 017 014
Catalunya (Spain)               e-mail: davidg@lsi.upc.edu


From Marc.Logghe at devgen.com  Wed Jan 12 07:53:21 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Jan 12 07:50:18 2005
Subject: [Bioperl-l] Getting entire descriptions from FASTA files
Message-ID: <BEE28BF86078B6429D6C780635718E219050C3@morelia.be.devgen.com>

Hi David,

> I want to get the entire description line in FASTA format (i 
> mean: description, accession number, etc...). Ive tried with 
> display_id this way:
> 
> my $seq_inIO     = Bio::SeqIO->new(-file => "$proteasa",
>                          -format => 'Fasta');
> 
> my $seq_in        = $seq_inIO->next_seq();
> 
> my $id_peptid = $seq_in->display_id;
>  
> but I only obtain the gi and gb numbers, not the description line.
> 
> Then, I tried with $seq_in->desc instead of 
> $seq_in->display_id , but then I only obtain the description 
> (or part of it). 
> 
> Is there a way to get the entire description line the same 
> way you see it at the FASTA file?

You can reconstruct it by concatenating the id and description:
my $fasta_line = join ' ', $seq_in->display_id, $seq_in->desc;

Of course, I don't know what's the purpose of your script, but if it is only to fetch the > line, why not just a plain-ol' grep ? something like:
grep '^>' /your/fastafile | sed "s/^>//"

HTH,
Marc

From brian_osborne at cognia.com  Wed Jan 12 08:33:27 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Jan 12 08:30:20 2005
Subject: [Bioperl-l] Getting entire descriptions from FASTA files
In-Reply-To: <006e01c4f89a$e25108b0$cf1e5393@Davidg>
Message-ID: <GAEDKMGOKFBLJPKCLKCCKEGIEHAA.brian_osborne@cognia.com>

David,

$seq_in->display_id and $seq_in->desc together should constitute the entire
line - you're not seeing this?


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of David Garc?a
Cort?s
Sent: Wednesday, January 12, 2005 6:36 AM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] Getting entire descriptions from FASTA files


Hello.

I want to get the entire description line in FASTA format (i mean:
description, accession number, etc...). Ive tried with display_id this way:

my $seq_inIO     = Bio::SeqIO->new(-file => "$proteasa",
                         -format => 'Fasta');

my $seq_in        = $seq_inIO->next_seq();

my $id_peptid = $seq_in->display_id;

but I only obtain the gi and gb numbers, not the description line.

Then, I tried with $seq_in->desc instead of $seq_in->display_id , but then I
only obtain the description (or part of it).

Is there a way to get the entire description line the same way you see it at
the FASTA file?

Thanks.

--
David Garc?a Cort?s
Instituto Nacional de Bioinform?tica (INB)
Nodo Computacional GNHC-2 UPC-CIRI
c/. Jordi Girona 1-3
Modul C6-E201                   Tel.  : 934 011 650
E-08034 Barcelona               Fax   : 934 017 014
Catalunya (Spain)               e-mail: davidg@lsi.upc.edu


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From brian_osborne at cognia.com  Wed Jan 12 09:37:53 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Jan 12 09:34:23 2005
Subject: [Bioperl-l] POD note
Message-ID: <GAEDKMGOKFBLJPKCLKCCCEGKEHAA.brian_osborne@cognia.com>

bioperl-l,

There are various short tags that you can use in POD to italicize,
emphasize, etc. The POD utilities, like pod2html, will only interpret these
tags if the POD line containing them is not indented. So, this works:

The L<Bio::SeqIO> module...

But this doesn't:

   The L<Bio::SeqIO> module...

What happens in that last case is that the line is treated literally, the
"L<" and ">" end up in the resultant HTML, if you've run pod2html. I only
mention this because it seems to me that I'm removing tags that I've removed
before - perhaps someone is putting these back? I could be wrong about
that...


Brian O.


From amackey at pcbi.upenn.edu  Wed Jan 12 11:31:42 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Jan 12 11:28:11 2005
Subject: [Bioperl-l] POD note
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCCEGKEHAA.brian_osborne@cognia.com>
References: <GAEDKMGOKFBLJPKCLKCCCEGKEHAA.brian_osborne@cognia.com>
Message-ID: <70F13BBA-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu>


POD markup (L<>, B<>, etc) is only valid in auto-formatted text (which 
in POD, is only non-indented text).  It sounds like we have some 
indented text that shouldn't be indented (rather than removing 
otherwise valid markup).  POD interprets any indented text as literal, 
pre-formatted text (much like <pre> in HTML).  Is this why much of our 
documentation is so poorly line-wrapped !?!

-Aaron

On Jan 12, 2005, at 9:37 AM, Brian Osborne wrote:

> bioperl-l,
>
> There are various short tags that you can use in POD to italicize,
> emphasize, etc. The POD utilities, like pod2html, will only interpret 
> these
> tags if the POD line containing them is not indented. So, this works:
>
> The L<Bio::SeqIO> module...
>
> But this doesn't:
>
>    The L<Bio::SeqIO> module...
>
> What happens in that last case is that the line is treated literally, 
> the
> "L<" and ">" end up in the resultant HTML, if you've run pod2html. I 
> only
> mention this because it seems to me that I'm removing tags that I've 
> removed
> before - perhaps someone is putting these back? I could be wrong about
> that...
>
>
> Brian O.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From amackey at pcbi.upenn.edu  Wed Jan 12 11:35:08 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Jan 12 11:31:29 2005
Subject: [Bioperl-l] POD note
In-Reply-To: <70F13BBA-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu>
References: <GAEDKMGOKFBLJPKCLKCCCEGKEHAA.brian_osborne@cognia.com>
	<70F13BBA-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu>
Message-ID: <EBD5430A-64B7-11D9-AC7B-000D93392082@pcbi.upenn.edu>


Ahh, I see now, these are in our pre-formatted API summaries ...

-Aaron

On Jan 12, 2005, at 11:31 AM, Aaron J. Mackey wrote:

>
> POD markup (L<>, B<>, etc) is only valid in auto-formatted text (which 
> in POD, is only non-indented text).  It sounds like we have some 
> indented text that shouldn't be indented (rather than removing 
> otherwise valid markup).  POD interprets any indented text as literal, 
> pre-formatted text (much like <pre> in HTML).  Is this why much of our 
> documentation is so poorly line-wrapped !?!
>
> -Aaron
>
> On Jan 12, 2005, at 9:37 AM, Brian Osborne wrote:
>
>> bioperl-l,
>>
>> There are various short tags that you can use in POD to italicize,
>> emphasize, etc. The POD utilities, like pod2html, will only interpret 
>> these
>> tags if the POD line containing them is not indented. So, this works:
>>
>> The L<Bio::SeqIO> module...
>>
>> But this doesn't:
>>
>>    The L<Bio::SeqIO> module...
>>
>> What happens in that last case is that the line is treated literally, 
>> the
>> "L<" and ">" end up in the resultant HTML, if you've run pod2html. I 
>> only
>> mention this because it seems to me that I'm removing tags that I've 
>> removed
>> before - perhaps someone is putting these back? I could be wrong about
>> that...
>>
>>
>> Brian O.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey@pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From brian_osborne at cognia.com  Wed Jan 12 15:07:10 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Jan 12 15:04:05 2005
Subject: [Bioperl-l] Storing Blast object in a local database
In-Reply-To: <1105400867.4274.4.camel@bacp4>
Message-ID: <GAEDKMGOKFBLJPKCLKCCGEHAEHAA.brian_osborne@cognia.com>

Tim,

One way is to "stringify" the object like so:

use Data::Dumper;
$str = Dumper($blast_object);

Then store the string in your database. To re-create the Blast object
retrieve the string, then something like:

$blast_object = eval "$str";


Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Tim Erwin
Sent: Monday, January 10, 2005 6:48 PM
To: Bioperl List
Subject: [Bioperl-l] Storing Blast object in a local database


Hi all,

Is it possible to store a blast object
(Bio::Search::Result::BlastResult) in a mysql database?

Any pointers would be appreciated.

Regards,

Tim

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Wed Jan 12 15:14:17 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 12 15:10:53 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
Message-ID: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>

In preparation for Bioperl 1.5.0 developer release I have put up 
Release Candidate 2.

  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip


We need people to test on this.  So download, run
  perl Makefile.PL
  make
  make test

Let us know what breaks.  I've tested on OS X and few different linux 
installs with different auxiliary modules installed.  Would be nice to 
have a few more combinations of OS, perl versions, and suite of modules 
installed before we make a release.

Thanks for your help.
-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Wed Jan 12 15:46:45 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 12 15:43:12 2005
Subject: [Bioperl-l] Storing Blast object in a local database
In-Reply-To: <1105400867.4274.4.camel@bacp4>
References: <1105400867.4274.4.camel@bacp4>
Message-ID: <128A603A-64DB-11D9-A0F3-000393C44276@duke.edu>

Ensembl has a strategy.  Their objects extend Bio::Search and store the 
full data I believe.  WIll could probably speak more to what the 
strategy is.

-jason

On Jan 10, 2005, at 6:47 PM, Tim Erwin wrote:

> Hi all,
>
> Is it possible to store a blast object
> (Bio::Search::Result::BlastResult) in a mysql database?
>
> Any pointers would be appreciated.
>
> Regards,
>
> Tim
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From allenday at ucla.edu  Wed Jan 12 17:48:23 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 12 17:44:48 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <Pine.LNX.4.58.0501121445350.11969@sumo.ctrl.ucla.edu>

.....
t/Ontology...................set_attribute: not a compat02 graph at 
/net/groove/lib/perl5/site_perl/5.8.0/Graph.pm line 2253, <GEN0> line 10.
t/Ontology...................dubious                                         
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-50
        Failed 50/50 tests, 0.00% okay
t/OntologyEngine.............ok                                              
t/OntologyStore..............FAILED tests 3-6                                
        Failed 4/6 tests, 33.33% okay
.....
t/simpleGOparser.............set_attribute: not a compat02 graph at 
/net/groove/lib/perl5/site_perl/5.8.0/Graph.pm line 2253, <GEN1> line 14.
t/simpleGOparser.............dubious                                         
        Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-101
        Failed 101/101 tests, 0.00% okay
.....
Failed Test        Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/Ontology.t        255 65280    50  100 200.00%  1-50
t/OntologyStore.t                 6    4  66.67%  3-6
t/simpleGOparser.t  255 65280   101  202 200.00%  1-101
114 subtests skipped.
Failed 3/193 test scripts, 98.45% okay. 155/8964 subtests failed, 98.27% 
okay.
make: *** [test_dynamic] Error 29

~~~~~

This is perl, v5.8.0 built for i386-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2002, Larry Wall

Perl may be copied only under the terms of either the Artistic License or 
the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

~~~~

Looks like this is caused by Graph.pm.  I've seen other reports about "not
a compat02 graph" recently, maybe there is a Graph.pm versioning problem?

-Allen


On Wed, 12 Jan 2005, Jason Stajich wrote:

> In preparation for Bioperl 1.5.0 developer release I have put up 
> Release Candidate 2.
> 
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> 
> 
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
> 
> Let us know what breaks.  I've tested on OS X and few different linux 
> installs with different auxiliary modules installed.  Would be nice to 
> have a few more combinations of OS, perl versions, and suite of modules 
> installed before we make a release.
> 
> Thanks for your help.
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From nathanhaigh at ukonline.co.uk  Thu Jan 13 03:56:07 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Thu Jan 13 03:52:35 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA70C8CSz8HE+Um35+x7+wmwEAAAAA@ukonline.co.uk>

......
t/GOR4.......................ok 3/13Can't call method "start" on an undefined value at t/GOR4.t line 80, <GEN2> line 1.
t/GOR4.......................dubious
        Test returned status 76 (wstat 19456, 0x4c00)
DIED. FAILED test 7
        Failed 1/13 tests, 92.31% okay
t/GOterm.....................ok
.......
t/HNN........................FAILED tests 7, 12
        Failed 2/13 tests, 84.62% okay
.......
t/Sopma......................FAILED tests 7-8, 14
        Failed 3/15 tests, 80.00% okay
.......
Failed Test Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/GOR4.t      76 19456    13    1   7.69%  7
t/HNN.t                   13    2  15.38%  7 12
t/Sopma.t                 15    3  20.00%  7-8 14
2 subtests skipped.

~~~~~~~
WinXP Pro v5.1.2600 Service Pack 1 Build 2600
~~~~~~~~
This is perl, v5.8.0 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2002, Larry Wall

Binary build 804 provided by ActiveState Corp. http://www.ActiveState.com
Built 23:15:13 Dec  1 2002

If you need a hand working these problems out give me a shout!
Nathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich
> Sent: 12 January 2005 20:14
> To: Bioperl list
> Subject: [Bioperl-l] bioperl-1.5.0 RC2
> 
> In preparation for Bioperl 1.5.0 developer release I have put up
> Release Candidate 2.
> 
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> 
> 
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
> 
> Let us know what breaks.  I've tested on OS X and few different linux
> installs with different auxiliary modules installed.  Would be nice to
> have a few more combinations of OS, perl versions, and suite of modules
> installed before we make a release.
> 
> Thanks for your help.
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 12/01/2005 21:49:55
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-2, 11/01/2005
Tested on: 13/01/2005 08:54:25
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From brian_osborne at cognia.com  Thu Jan 13 07:22:14 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Thu Jan 13 07:19:24 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <GAEDKMGOKFBLJPKCLKCCGEHHEHAA.brian_osborne@cognia.com>

Jason,

All tests pass on CYGWIN_NT-5.0.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Jason Stajich
Sent: Wednesday, January 12, 2005 3:14 PM
To: Bioperl list
Subject: [Bioperl-l] bioperl-1.5.0 RC2


In preparation for Bioperl 1.5.0 developer release I have put up 
Release Candidate 2.

  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip


We need people to test on this.  So download, run
  perl Makefile.PL
  make
  make test

Let us know what breaks.  I've tested on OS X and few different linux 
installs with different auxiliary modules installed.  Would be nice to 
have a few more combinations of OS, perl versions, and suite of modules 
installed before we make a release.

Thanks for your help.
-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From razi at genet.sickkids.on.ca  Wed Jan 12 22:52:32 2005
From: razi at genet.sickkids.on.ca (Razi Khaja)
Date: Thu Jan 13 08:19:20 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <20050113035233.16569.qmail@web51602.mail.yahoo.com>

I've tested RC2 on FreeBSD 5.3 running perl5.8.5 on i386.  This has been
tested with all prerequisite modules installed (including Graph::Directed
(J/JH/JHI/Graph-0.51.tar.gz)as perl output of 'perl Makefile.PL'. 

Attached is the output of make test (make_test.out.gz).

Summary of make test included here:
Failed 3/193 test scripts, 98.45% okay. 155/8956 subtests failed, 98.27%
okay.
Failed Test        Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/Ontology.t        255 65280    50  100 200.00%  1-50
t/OntologyStore.t                 6    4  66.67%  3-6
t/simpleGOparser.t  255 65280   101  202 200.00%  1-101
2 subtests skipped.
*** Error code 25

Stop in /usr/home/bioperl/bioperl-1.5.0-RC2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: make_test.out.gz
Type: application/x-gzip
Size: 14925 bytes
Desc: make_test.out.gz
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050112/15f05e3f/make_test.out.bin
From danielucgbioinfo at yahoo.com.br  Thu Jan 13 09:05:50 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Thu Jan 13 09:03:28 2005
Subject: [Bioperl-l] Clickable Graphics
Message-ID: <20050113140550.22237.qmail@web53504.mail.yahoo.com>

Hi, I'm trying to do is to render a Sequence as a png
file, but clickable. I need to make each glyph
clickable(online whith CGI). But I haven't achieved
nor one glyph clickable.
Any can send me a exempla this kind of code. I have
used Bio:Graphics::Panel 


Thank you,
Daniel Xavier - 
BioinfoUCG - Brazil


_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From paulo.david at netvisao.pt  Thu Jan 13 09:43:48 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Thu Jan 13 09:41:25 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7s
	KAAAAQAAAA70C8CSz8HE+Um35+x7+wmwEAAAAA@ukonline.co.uk>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
	<!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA70C8CSz8HE+Um35+x7+wmwEAAAAA@ukonline.co.uk>
Message-ID: <59455.193.137.94.3.1105627428.squirrel@193.137.94.3>

This is my error output on Linux (Debian testing freshly installed,
including the bioperl-1.4 package; kernel 2.6.8; perl v5.8.4), but it
skipped many tests:

Failed Test           Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/SeqFeatCollection.t              432    1   0.23%  423
109 subtests skipped.
Failed 1/193 test scripts, 99.48% okay. 1/8762 subtests failed, 99.99% okay.
make: *** [test_dynamic] Error 255


> -----Original Message-----
>
> In preparation for Bioperl 1.5.0 developer release I have put up
> Release Candidate 2.
>
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
>
>
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
>
> Let us know what breaks.  I've tested on OS X and few different linux
> installs with different auxiliary modules installed.  Would be nice to
> have a few more combinations of OS, perl versions, and suite of modules
> installed before we make a release.
>
> Thanks for your help.
> -jason

From e-just at northwestern.edu  Thu Jan 13 11:58:32 2005
From: e-just at northwestern.edu (Eric Just)
Date: Thu Jan 13 11:55:07 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <6.1.1.1.2.20050113105812.05ea36a8@hecky.it.northwestern.edu>


activestate 5.8, windows XP

Failed Test  Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t\Registry.t  255 65280    13   11  84.62%  8-13
t\flat.t        2   512    16   30 187.50%  2-16
4 subtests skipped.
Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed, 99.76% okay.

D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
1..16
ok 1

------------- EXCEPTION  -------------
MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be read: 
No such file or directory
STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
STACK Bio::DB::Flat::BDB::_index_file D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
STACK Bio::DB::Flat::BDB::build_index D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
STACK toplevel t/flat.t:70

--------------------------------------

D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
1..13
ok 1
ok 2
ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
This Perl doesn't implement function getpwuid(). Skipping...

-------------------- WARNING ---------------------
MSG: Couldn't call new_from_registry on [Bio::DB::Flat]

------------- EXCEPTION  -------------
MSG: No flatfile fileid files in config - check the index has been made 
correctly
STACK Bio::DB::Flat::BinarySearch::read_config_file 
D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
STACK Bio::DB::Flat::BinarySearch::new 
D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
STACK Bio::DB::Flat::new_from_registry D:/Perl/site/lib/Bio/DB/Flat.pm:256
STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
STACK Bio::DB::Registry::_load_registry D:/Perl/site/lib/Bio/DB/Registry.pm:183
STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
STACK toplevel t/Registry.t:69

--------------------------------------

---------------------------------------------------
ok 5
ok 6
ok 7
not ok 8
# Failed test 8 in t/Registry.t at line 77
Can't call method "seq" on an undefined value at t/Registry.t line 78.

D:\tmp\New Folder\bioperl-1.5.0-RC2>


At 02:14 PM 1/12/2005, Jason Stajich wrote:
>In preparation for Bioperl 1.5.0 developer release I have put up Release 
>Candidate 2.
>
>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
>
>
>We need people to test on this.  So download, run
>  perl Makefile.PL
>  make
>  make test
>
>Let us know what breaks.  I've tested on OS X and few different linux 
>installs with different auxiliary modules installed.  Would be nice to 
>have a few more combinations of OS, perl versions, and suite of modules 
>installed before we make a release.
>
>Thanks for your help.
>-jason
>--
>Jason Stajich
>jason.stajich at duke.edu
>http://www.duke.edu/~jes12/
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

============================================

Eric Just
e-just@northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 

From Jan.Aerts at wur.nl  Thu Jan 13 12:20:41 2005
From: Jan.Aerts at wur.nl (Aerts, Jan)
Date: Thu Jan 13 12:17:23 2005
Subject: [Bioperl-l] Clickable Graphics
Message-ID: <7D030487F1A3D143A76F2A1E91F570350186DB62@scomp0010>

Hi Daniel,

Do you mean you want a graphical representation of your sequence in e.g. a web-browser, with the features linking to additional information or other websites? If so, I'd seriously suggest gbrowse (or Generic Genome Browser; http://www.gmod.org/ggb/gbrowse.shtml). Look at the website for the (very easy) installation instructions and the (equally simple) tutorial.

GBrowse actually uses the Bio::Graphics objects to build its graphics.

If you'd happen to be at the PAG meeting in San Diego: Scott Cain will give a demo of gbrowse.

Good luck,
Jan Aerts

-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Danielucg Sousa
Sent:	Thu 13-Jan-05 15:05
To:	bioperl-l@portal.open-bio.org
Cc:	
Subject:	[Bioperl-l] Clickable Graphics
Hi, I'm trying to do is to render a Sequence as a png
file, but clickable. I need to make each glyph
clickable(online whith CGI). But I haven't achieved
nor one glyph clickable.
Any can send me a exempla this kind of code. I have
used Bio:Graphics::Panel 


Thank you,
Daniel Xavier - 
BioinfoUCG - Brazil


_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From khh103 at york.ac.uk  Thu Jan 13 12:24:21 2005
From: khh103 at york.ac.uk (Kat Hull)
Date: Thu Jan 13 12:22:18 2005
Subject: [Bioperl-l] Getting started with Bio::Perl
Message-ID: <41E6AEC5.2050302@york.ac.uk>

*Dear Users,
I have a newbie question!  I am interested in the following module 'Bio::Tools::Run::PiseApplication::codonw' but really don't
know how to start to use it.  I have looked at the documentation etc but am confused about how to pass my array of sequences to
the module and then how to call the individual functions to perform the calculations (e.g. gc, cai, fop...).

Does anyone have a simple script showing how to run this module with the input as an array of fasta format sequences?
Many thanks,

Kat
*


From jason.stajich at duke.edu  Thu Jan 13 12:34:41 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jan 13 12:30:57 2005
Subject: [Bioperl-l] Getting started with Bio::Perl
In-Reply-To: <41E6AEC5.2050302@york.ac.uk>
References: <41E6AEC5.2050302@york.ac.uk>
Message-ID: <68529540-6589-11D9-9682-000393C44276@duke.edu>

See the documentation in the SYNOPSIS of
Bio::Tools::Run::PiseApplication


-jason
On Jan 13, 2005, at 12:24 PM, Kat Hull wrote:

> Bio::Tools::Run::PiseApplication
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From nathanhaigh at ukonline.co.uk  Thu Jan 13 14:49:00 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Thu Jan 13 14:45:54 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <6.1.1.1.2.20050113105812.05ea36a8@hecky.it.northwestern.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAScm6kQDrv0+IoocD+DHTQwEAAAAA@ukonline.co.uk>

I'm a little confused about what your results are from. Obviously the failed test table was from an "nmake test".
However, it appears that you then did a "perl t\flat.t" why? Was it to get the full details of that particular test? If so you need
to run:
"perl -I. -w t\flat.t"
The -I. ensures you use the bioperl modules from the bioperl-1.5.0-RC2 not your installed version of bioperl (which may be 1.4 or
from the cvs). Also make sure that you are running from a path that contains no spaces - it appears as though you unpacked the
contents of bioperl-1.5.0-RC2 into a folder called "New Folder", this path contains a space, so may (and probably will) cause
unexpected effects/results.

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Eric Just
> Sent: 13 January 2005 16:59
> To: Jason Stajich; Bioperl list
> Subject: Re: [Bioperl-l] bioperl-1.5.0 RC2
> 
> 
> 
> 
> activestate 5.8, windows XP
> 
> Failed Test  Stat Wstat Total Fail  Failed  List of Failed
> -------------------------------------------------------------------------------
> t\Registry.t  255 65280    13   11  84.62%  8-13
> t\flat.t        2   512    16   30 187.50%  2-16
> 4 subtests skipped.
> Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed, 99.76% okay.
> 
> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
> 1..16
> ok 1
> 
> ------------- EXCEPTION  -------------
> MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be read:
> No such file or directory
> STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
> STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
> STACK Bio::DB::Flat::BDB::_index_file D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
> STACK Bio::DB::Flat::BDB::build_index D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
> STACK toplevel t/flat.t:70
> 
> --------------------------------------
> 
> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
> 1..13
> ok 1
> ok 2
> ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
> ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
> This Perl doesn't implement function getpwuid(). Skipping...
> 
> -------------------- WARNING ---------------------
> MSG: Couldn't call new_from_registry on [Bio::DB::Flat]
> 
> ------------- EXCEPTION  -------------
> MSG: No flatfile fileid files in config - check the index has been made
> correctly
> STACK Bio::DB::Flat::BinarySearch::read_config_file
> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
> STACK Bio::DB::Flat::BinarySearch::new
> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
> STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
> STACK Bio::DB::Flat::new_from_registry D:/Perl/site/lib/Bio/DB/Flat.pm:256
> STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
> STACK Bio::DB::Registry::_load_registry D:/Perl/site/lib/Bio/DB/Registry.pm:183
> STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
> STACK toplevel t/Registry.t:69
> 
> --------------------------------------
> 
> ---------------------------------------------------
> ok 5
> ok 6
> ok 7
> not ok 8
> # Failed test 8 in t/Registry.t at line 77
> Can't call method "seq" on an undefined value at t/Registry.t line 78.
> 
> D:\tmp\New Folder\bioperl-1.5.0-RC2>
> 
> 
> At 02:14 PM 1/12/2005, Jason Stajich wrote:
> >In preparation for Bioperl 1.5.0 developer release I have put up Release
> >Candidate 2.
> >
> >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
> >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
> >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> >
> >
> >We need people to test on this.  So download, run
> >  perl Makefile.PL
> >  make
> >  make test
> >
> >Let us know what breaks.  I've tested on OS X and few different linux
> >installs with different auxiliary modules installed.  Would be nice to
> >have a few more combinations of OS, perl versions, and suite of modules
> >installed before we make a release.
> >
> >Thanks for your help.
> >-jason
> >--
> >Jason Stajich
> >jason.stajich at duke.edu
> >http://www.duke.edu/~jes12/
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l@portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> ============================================
> 
> Eric Just
> e-just@northwestern.edu
> dictyBase Programmer
> Center for Genetic Medicine
> Northwestern University
> http://dictybase.org
> 
> ============================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-2, 11/01/2005
Tested on: 13/01/2005 19:46:20
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From amackey at pcbi.upenn.edu  Thu Jan 13 15:05:25 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Thu Jan 13 15:02:02 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAScm6kQDrv0+IoocD+DHTQwEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAScm6kQDrv0+IoocD+DHTQwEAAAAA@ukonline.co.uk>
Message-ID: <76EF80F6-659E-11D9-AFAA-000D93392082@pcbi.upenn.edu>


To be pedantic, you should use "perl -Mblib t\flat.t" to ensure that  
all the right "build lib" files are being used.  But it looks like part  
of the problem is a mismatch between the expected number of tests and  
the number of tests actually run ...

-Aaron

On Jan 13, 2005, at 2:49 PM, Nathan Haigh wrote:

> I'm a little confused about what your results are from. Obviously the  
> failed test table was from an "nmake test".
> However, it appears that you then did a "perl t\flat.t" why? Was it to  
> get the full details of that particular test? If so you need
> to run:
> "perl -I. -w t\flat.t"
> The -I. ensures you use the bioperl modules from the bioperl-1.5.0-RC2  
> not your installed version of bioperl (which may be 1.4 or
> from the cvs). Also make sure that you are running from a path that  
> contains no spaces - it appears as though you unpacked the
> contents of bioperl-1.5.0-RC2 into a folder called "New Folder", this  
> path contains a space, so may (and probably will) cause
> unexpected effects/results.
>
> Nathan
>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org  
>> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Eric Just
>> Sent: 13 January 2005 16:59
>> To: Jason Stajich; Bioperl list
>> Subject: Re: [Bioperl-l] bioperl-1.5.0 RC2
>>
>>
>>
>>
>> activestate 5.8, windows XP
>>
>> Failed Test  Stat Wstat Total Fail  Failed  List of Failed
>> ---------------------------------------------------------------------- 
>> ---------
>> t\Registry.t  255 65280    13   11  84.62%  8-13
>> t\flat.t        2   512    16   30 187.50%  2-16
>> 4 subtests skipped.
>> Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed,  
>> 99.76% okay.
>>
>> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
>> 1..16
>> ok 1
>>
>> ------------- EXCEPTION  -------------
>> MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be  
>> read:
>> No such file or directory
>> STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
>> STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
>> STACK Bio::DB::Flat::BDB::_index_file  
>> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
>> STACK Bio::DB::Flat::BDB::build_index  
>> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
>> STACK toplevel t/flat.t:70
>>
>> --------------------------------------
>>
>> D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
>> 1..13
>> ok 1
>> ok 2
>> ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
>> ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
>> This Perl doesn't implement function getpwuid(). Skipping...
>>
>> -------------------- WARNING ---------------------
>> MSG: Couldn't call new_from_registry on [Bio::DB::Flat]
>>
>> ------------- EXCEPTION  -------------
>> MSG: No flatfile fileid files in config - check the index has been  
>> made
>> correctly
>> STACK Bio::DB::Flat::BinarySearch::read_config_file
>> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
>> STACK Bio::DB::Flat::BinarySearch::new
>> D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
>> STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
>> STACK Bio::DB::Flat::new_from_registry  
>> D:/Perl/site/lib/Bio/DB/Flat.pm:256
>> STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
>> STACK Bio::DB::Registry::_load_registry  
>> D:/Perl/site/lib/Bio/DB/Registry.pm:183
>> STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
>> STACK toplevel t/Registry.t:69
>>
>> --------------------------------------
>>
>> ---------------------------------------------------
>> ok 5
>> ok 6
>> ok 7
>> not ok 8
>> # Failed test 8 in t/Registry.t at line 77
>> Can't call method "seq" on an undefined value at t/Registry.t line 78.
>>
>> D:\tmp\New Folder\bioperl-1.5.0-RC2>
>>
>>
>> At 02:14 PM 1/12/2005, Jason Stajich wrote:
>>> In preparation for Bioperl 1.5.0 developer release I have put up  
>>> Release
>>> Candidate 2.
>>>
>>>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>>>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>>>  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
>>>
>>>
>>> We need people to test on this.  So download, run
>>>  perl Makefile.PL
>>>  make
>>>  make test
>>>
>>> Let us know what breaks.  I've tested on OS X and few different linux
>>> installs with different auxiliary modules installed.  Would be nice  
>>> to
>>> have a few more combinations of OS, perl versions, and suite of  
>>> modules
>>> installed before we make a release.
>>>
>>> Thanks for your help.
>>> -jason
>>> --
>>> Jason Stajich
>>> jason.stajich at duke.edu
>>> http://www.duke.edu/~jes12/
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ============================================
>>
>> Eric Just
>> e-just@northwestern.edu
>> dictyBase Programmer
>> Center for Genetic Medicine
>> Northwestern University
>> http://dictybase.org
>>
>> ============================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 13/01/2005 19:46:20
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From davidg at lsi.upc.edu  Thu Jan 13 17:21:26 2005
From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=)
Date: Thu Jan 13 17:19:02 2005
Subject: [Bioperl-l] Problems parsing PSI-BLAST results
Message-ID: <003d01c4f9be$56287150$30b01950@maxpower>

Hello.

I'm trying to parse the hits of all the iterations in a PSI-BLAST result 
(which I have in a variable, not in a file), but I can't  make it work.
It gives me the following error:

Use of uninitialized value in index at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 271, 
<GEN2> line 54.
Use of uninitialized value in length at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 272, 
<GEN2> line 54.
Use of uninitialized value in join or string at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 283, 
<GEN2> line 54.
Use of uninitialized value in join or string at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Sbjct.pm line 284, 
<GEN2> line 54.
Use of uninitialized value in numeric gt (>) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/HSP.pm line 185, <GEN2> 
line 54.
Use of uninitialized value in numeric gt (>) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/HSP.pm line 197, <GEN2> 
line 54.

-------------------- WARNING ---------------------
MSG: Possible error (2) while parsing BLAST report!
---------------------------------------------------
Use of uninitialized value in substitution (s///) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 207, 
<GEN2> line 54.
Use of uninitialized value in substitution (s///) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 208, 
<GEN2> line 54.
Use of uninitialized value in substitution (s///) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 209, 
<GEN2> line 54.
Use of uninitialized value in pattern match (m//) at 
/usr/opt/perl5/lib/site_perl/5.6.0/Bio/Tools/BPlite/Iteration.pm line 211, 
<GEN2> line 54.


I don't know why it doesn't work... I looked at the bioperl API and there is 
a method "nextSbjct()" for Bio::Tools::BPlite::Iteration !
This is the part of the code:

 my $seqsfich  = Bio::SeqIO->new(-file=>"$proteasa"
 , '-format' => 'Fasta');

 # blast parameters
 my @pars = (
 'database' => "nr"
 , 'j' => '2'
 );

 my $factory = Bio::Tools::Run::StandAloneBlast->new(@pars);

while (my $seq = $seqsfich->next_seq()) {
    my $report = $factory->blastpgp($seq);
    my $max_iter = $report->number_of_iterations;
    my $iter = $report->round($max_iter);

     while (my $sbjct = $iter->nextSbjct()){
          while (my $hsp = $sbjct->nextHSP()){
                       printf("%-70s   %s\n", substr($hsp->hit->seqname, 0, 
70), $hsp->score);
          }
     }
}


The error must be in this part:

  while (my $sbjct = $iter->nextSbjct()){
          while (my $hsp = $sbjct->nextHSP()){
                       printf("%-70s   %s\n", substr($hsp->hit->seqname, 0, 
70), $hsp->score);
          }
     }

because when I remove it from the code, the errors don't appear.
What am I doing wrong?

Thank you very much.


From golharam at umdnj.edu  Thu Jan 13 17:59:15 2005
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu Jan 13 17:53:37 2005
Subject: [Bioperl-l] Losing STDOUT
Message-ID: <001c01c4f9c3$81546ff0$bb028a0a@GOLHARMOBILE1>

If I open a file using Bio::SearchIO, I'm unable to redirect STDOUT
anymore:

$searchio = new Bio::SearchIO (-format=>'blast', -file=>"myfile.blast");
$hit = $searchio->next_result->next_hit;
print "name", $hit->name, "\n";

This works from the shell.  If you put this in a script and redirect the
output, you don't get anything.  I'm wondering if SearchIO does
something with STDOUT?


-----
Ryan Golhar
Computational Biologist
The Informatics Institute at
The University of Medicine & Dentistry of NJ

Phone: 973-972-5034
Fax: 973-972-7412
Email: golharam@umdnj.edu

From rob at salmonella.org  Thu Jan 13 18:35:40 2005
From: rob at salmonella.org (Rob Edwards)
Date: Thu Jan 13 18:32:06 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <D6098403-65BB-11D9-9C9B-000A959E1622@salmonella.org>

Jason,

Thanks for herding us and release 1.5 together, and also (a little 
belatedly) thanks for the vision of Bioperl in 2005 you sent out a 
couple of weeks ago.

We definitely all be lost without you.

On my Mac OSX machine I get

All tests successful, 114 subtests skipped.
Files=193, Tests=8956, 782 wallclock secs (352.60 cusr + 46.69 csys = 
399.29 CPU)

% uname -a
Darwin Robs-Computer.local 7.7.0 Darwin Kernel Version 7.7.0: Sun Nov  
7 16:06:51 PST 2004; root:xnu/xnu-517.9.5.obj~1/RELEASE_PPC  Power 
Macintosh powerpc

% perl -v

This is perl, v5.8.1-RC3 built for darwin-thread-multi-2level
(with 1 registered patch, see perl -V for more detail)


Rob

From e-just at northwestern.edu  Thu Jan 13 19:54:15 2005
From: e-just at northwestern.edu (Eric Just)
Date: Thu Jan 13 19:50:54 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEG
	eBpZF7/IE7sKAAAAQAAAAScm6kQDrv0+IoocD+DHTQwEAAAAA@ukonline.co.uk>
References: <6.1.1.1.2.20050113105812.05ea36a8@hecky.it.northwestern.edu>
	<!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAScm6kQDrv0+IoocD+DHTQwEAAAAA@ukonline.co.uk>
Message-ID: <6.1.1.1.2.20050113185146.06145758@hecky.it.northwestern.edu>

hey
It was the directory name.  I moved it up one directory and the tests 
worked.  My apologies.

Thanks for your responses.
Eric
At 01:49 PM 1/13/2005, Nathan Haigh wrote:
>I'm a little confused about what your results are from. Obviously the 
>failed test table was from an "nmake test".
>However, it appears that you then did a "perl t\flat.t" why? Was it to get 
>the full details of that particular test? If so you need
>to run:
>"perl -I. -w t\flat.t"
>The -I. ensures you use the bioperl modules from the bioperl-1.5.0-RC2 not 
>your installed version of bioperl (which may be 1.4 or
>from the cvs). Also make sure that you are running from a path that 
>contains no spaces - it appears as though you unpacked the
>contents of bioperl-1.5.0-RC2 into a folder called "New Folder", this path 
>contains a space, so may (and probably will) cause
>unexpected effects/results.
>
>Nathan
>
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Eric Just
> > Sent: 13 January 2005 16:59
> > To: Jason Stajich; Bioperl list
> > Subject: Re: [Bioperl-l] bioperl-1.5.0 RC2
> >
> >
> >
> >
> > activestate 5.8, windows XP
> >
> > Failed Test  Stat Wstat Total Fail  Failed  List of Failed
> > 
> -------------------------------------------------------------------------------
> > t\Registry.t  255 65280    13   11  84.62%  8-13
> > t\flat.t        2   512    16   30 187.50%  2-16
> > 4 subtests skipped.
> > Failed 2/193 test scripts, 98.96% okay. 21/8916 subtests failed, 99.76% 
> okay.
> >
> > D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/flat.t
> > 1..16
> > ok 1
> >
> > ------------- EXCEPTION  -------------
> > MSG: flat file D:\tmp\New Folder\bioperl-1.5.0-RC2\tmpNew cannot be read:
> > No such file or directory
> > STACK Bio::DB::Flat::add_flat_file D:/Perl/site/lib/Bio/DB/Flat.pm:358
> > STACK Bio::DB::Flat::_path2fileno D:/Perl/site/lib/Bio/DB/Flat.pm:519
> > STACK Bio::DB::Flat::BDB::_index_file 
> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:228
> > STACK Bio::DB::Flat::BDB::build_index 
> D:/Perl/site/lib/Bio/DB/Flat/BDB.pm:218
> > STACK toplevel t/flat.t:70
> >
> > --------------------------------------
> >
> > D:\tmp\New Folder\bioperl-1.5.0-RC2>perl t/Registry.t
> > 1..13
> > ok 1
> > ok 2
> > ok 3 # DB_File or BerkeleyDB not found, skipping DB_File tests
> > ok 4 # DB_File or BerkeleyDB not found, skipping DB_File tests
> > This Perl doesn't implement function getpwuid(). Skipping...
> >
> > -------------------- WARNING ---------------------
> > MSG: Couldn't call new_from_registry on [Bio::DB::Flat]
> >
> > ------------- EXCEPTION  -------------
> > MSG: No flatfile fileid files in config - check the index has been made
> > correctly
> > STACK Bio::DB::Flat::BinarySearch::read_config_file
> > D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:1297
> > STACK Bio::DB::Flat::BinarySearch::new
> > D:/Perl/site/lib/Bio/DB/Flat/BinarySearch.pm:280
> > STACK Bio::DB::Flat::new D:/Perl/site/lib/Bio/DB/Flat.pm:181
> > STACK Bio::DB::Flat::new_from_registry D:/Perl/site/lib/Bio/DB/Flat.pm:256
> > STACK (eval) D:/Perl/site/lib/Bio/DB/Registry.pm:184
> > STACK Bio::DB::Registry::_load_registry 
> D:/Perl/site/lib/Bio/DB/Registry.pm:183
> > STACK Bio::DB::Registry::new D:/Perl/site/lib/Bio/DB/Registry.pm:96
> > STACK toplevel t/Registry.t:69
> >
> > --------------------------------------
> >
> > ---------------------------------------------------
> > ok 5
> > ok 6
> > ok 7
> > not ok 8
> > # Failed test 8 in t/Registry.t at line 77
> > Can't call method "seq" on an undefined value at t/Registry.t line 78.
> >
> > D:\tmp\New Folder\bioperl-1.5.0-RC2>
> >
> >
> > At 02:14 PM 1/12/2005, Jason Stajich wrote:
> > >In preparation for Bioperl 1.5.0 developer release I have put up Release
> > >Candidate 2.
> > >
> > >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
> > >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
> > >  http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> > >
> > >
> > >We need people to test on this.  So download, run
> > >  perl Makefile.PL
> > >  make
> > >  make test
> > >
> > >Let us know what breaks.  I've tested on OS X and few different linux
> > >installs with different auxiliary modules installed.  Would be nice to
> > >have a few more combinations of OS, perl versions, and suite of modules
> > >installed before we make a release.
> > >
> > >Thanks for your help.
> > >-jason
> > >--
> > >Jason Stajich
> > >jason.stajich at duke.edu
> > >http://www.duke.edu/~jes12/
> > >
> > >_______________________________________________
> > >Bioperl-l mailing list
> > >Bioperl-l@portal.open-bio.org
> > >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> > ============================================
> >
> > Eric Just
> > e-just@northwestern.edu
> > dictyBase Programmer
> > Center for Genetic Medicine
> > Northwestern University
> > http://dictybase.org
> >
> > ============================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>---
>avast! Antivirus: Outbound message clean.
>Virus Database (VPS): 0502-2, 11/01/2005
>Tested on: 13/01/2005 19:46:20
>avast! is copyright (c) 2000-2003 ALWIL Software.
>http://www.avast.com
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

============================================

Eric Just
e-just@northwestern.edu
dictyBase Programmer
Center for Genetic Medicine
Northwestern University
http://dictybase.org

============================================ 

From jason.stajich at duke.edu  Thu Jan 13 23:12:12 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jan 13 23:09:17 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <20050113035233.16569.qmail@web51602.mail.yahoo.com>
References: <20050113035233.16569.qmail@web51602.mail.yahoo.com>
Message-ID: <77987E0F-65E2-11D9-A1F6-000393C44276@duke.edu>

Thanks Razi.

Tests were succeeding for me with Graph 0.20105-1 - when I upgraded to  
Graph 0.52 it also worked. I am running perl 5.8.3 though so don't know  
what is the problem with compatibility.  Do you have the same problems  
with Graph 0.52?
Going to need someone who can reproduce the bug to debug and fix.

Since this is a developer release I am not going to hold out on this  
part too much.  We'll try and get it closed out, otherwise release  
which ship with some tests turned off.

I would like to do the release on this coming Monday if possible.

-jason
On Jan 12, 2005, at 10:52 PM, Razi Khaja wrote:

> I've tested RC2 on FreeBSD 5.3 running perl5.8.5 on i386.  This has  
> been
> tested with all prerequisite modules installed (including  
> Graph::Directed
> (J/JH/JHI/Graph-0.51.tar.gz)as perl output of 'perl Makefile.PL'.
>
> Attached is the output of make test (make_test.out.gz).
>
> Summary of make test included here:
> Failed 3/193 test scripts, 98.45% okay. 155/8956 subtests failed,  
> 98.27%
> okay.
> Failed Test        Stat Wstat Total Fail  Failed  List of Failed
> ----------------------------------------------------------------------- 
> --------
> t/Ontology.t        255 65280    50  100 200.00%  1-50
> t/OntologyStore.t                 6    4  66.67%  3-6
> t/simpleGOparser.t  255 65280   101  202 200.00%  1-101
> 2 subtests skipped.
> *** Error code 25
>
> Stop in  
> /usr/home/bioperl/bioperl-1.5.0- 
> RC2.<make_test.out.gz>_______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From taerwin at tpg.com.au  Fri Jan 14 00:32:14 2005
From: taerwin at tpg.com.au (Tim Erwin)
Date: Fri Jan 14 00:31:18 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
Message-ID: <1105680734.4274.59.camel@bacp4>

I am using Fedora Core 3 with perl, v5.8.5 built for i386-linux-thread-
multi. 


make test output:


t/DBCUTG.....................ok
        22/24 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/DBFasta....................ok
t/DNAMutation................ok
t/Domcut.....................ok
        22/25 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/ECnumber...................ok
t/ELM........................ok 9/14
-------------------- WARNING ---------------------
MSG: Bio::Tools::Analysis::Protein::ELM Request Error:
500 (Internal Server Error) Can't connect to elm.eu.org:80 (connect:
timeout)
Content-Type: text/plain
Client-Date: Fri, 14 Jan 2005 01:29:27 GMT
Client-Warning: Internal response

500 Can't connect to elm.eu.org:80 (connect: timeout)

---------------------------------------------------
t/ELM........................ok
t/ESEfinder..................error is 0
t/ESEfinder..................ok
        10/12 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Genewise...................ok
        2/51 skipped:
t/GOR4.......................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/HNN........................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/MitoProt...................ok
        5/8 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/protgraph..................Class::AutoClass or Clone not installed.
This means that the module is not usable. Skipping tests at
t/protgraph.t line 23.
t/RefSeq.....................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/RemoteBlast................ok
        4/6 skipped: to avoid timeout
t/SeqIO......................XML::DOM::XPath not found - skipping
interpro tests
XML::SAX::Base or XML::SAX or XML::SAX::Writer not found - skipping
BSML_SAX tests
t/SeqIO......................ok
t/simpleGOparser.............ok 88/101Use of uninitialized value in hash
element at /home/te07/bioperl-1.5.0-
RC2/blib/lib/Bio/Ontology/OntologyStore.pm line 263, <GEN3> line 11.
t/Sopma......................ok
        12/15 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Taxonomy...................ok
        7/8 skipped: to avoid blocking
t/tutorial...................ok 18/21Use of uninitialized value in print
at /home/te07/bioperl-1.5.0-RC2/blib/lib/bptutorial.pl line 4039,
<GEN21> line 934.

All tests successful, 114 subtests skipped.
Files=193, Tests=8942, 959 wallclock secs (108.35 cusr +  7.82 csys =
116.17 CPU)


From nathanhaigh at ukonline.co.uk  Fri Jan 14 02:40:25 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Fri Jan 14 02:37:08 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA70C8CSz8HE+Um35+x7+wmwEAAAAA@ukonline.co.uk>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAr/e3IYbai02qWH6fTwFzqwEAAAAA@ukonline.co.uk>

I ran the tests again and they pretty much completed without error! I say "pretty much" because perl sometimes crashes on a test
(which obviously results in that test failing), but running this/these tests separately using "perl -MBlib t\<test.t>" resulted in
the test completing successfully.

Therefore, everything ok for me!

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Nathan Haigh
> Sent: 13 January 2005 08:56
> To: 'Jason Stajich'; 'Bioperl list'
> Subject: RE: [Bioperl-l] bioperl-1.5.0 RC2
> 
> ......
> t/GOR4.......................ok 3/13Can't call method "start" on an undefined value at t/GOR4.t line 80, <GEN2> line 1.
> t/GOR4.......................dubious
>         Test returned status 76 (wstat 19456, 0x4c00)
> DIED. FAILED test 7
>         Failed 1/13 tests, 92.31% okay
> t/GOterm.....................ok
> .......
> t/HNN........................FAILED tests 7, 12
>         Failed 2/13 tests, 84.62% okay
> .......
> t/Sopma......................FAILED tests 7-8, 14
>         Failed 3/15 tests, 80.00% okay
> .......
> Failed Test Stat Wstat Total Fail  Failed  List of Failed
> -------------------------------------------------------------------------------
> t/GOR4.t      76 19456    13    1   7.69%  7
> t/HNN.t                   13    2  15.38%  7 12
> t/Sopma.t                 15    3  20.00%  7-8 14
> 2 subtests skipped.
> 
> ~~~~~~~
> WinXP Pro v5.1.2600 Service Pack 1 Build 2600
> ~~~~~~~~
> This is perl, v5.8.0 built for MSWin32-x86-multi-thread
> (with 1 registered patch, see perl -V for more detail)
> 
> Copyright 1987-2002, Larry Wall
> 
> Binary build 804 provided by ActiveState Corp. http://www.ActiveState.com
> Built 23:15:13 Dec  1 2002
> 
> If you need a hand working these problems out give me a shout!
> Nathan
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich
> > Sent: 12 January 2005 20:14
> > To: Bioperl list
> > Subject: [Bioperl-l] bioperl-1.5.0 RC2
> >
> > In preparation for Bioperl 1.5.0 developer release I have put up
> > Release Candidate 2.
> >
> >   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
> >   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
> >   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> >
> >
> > We need people to test on this.  So download, run
> >   perl Makefile.PL
> >   make
> >   make test
> >
> > Let us know what breaks.  I've tested on OS X and few different linux
> > installs with different auxiliary modules installed.  Would be nice to
> > have a few more combinations of OS, perl versions, and suite of modules
> > installed before we make a release.
> >
> > Thanks for your help.
> > -jason
> > --
> > Jason Stajich
> > jason.stajich at duke.edu
> > http://www.duke.edu/~jes12/
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > ---
> > avast! Antivirus: Inbound message clean.
> > Virus Database (VPS): 0502-2, 11/01/2005
> > Tested on: 12/01/2005 21:49:55
> > avast! is copyright (c) 2000-2003 ALWIL Software.
> > http://www.avast.com
> >
> >
> 
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 13/01/2005 08:54:25
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0502-2, 11/01/2005
> Tested on: 13/01/2005 10:06:51
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-2, 11/01/2005
Tested on: 14/01/2005 07:36:48
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From razi at genet.sickkids.on.ca  Fri Jan 14 03:52:06 2005
From: razi at genet.sickkids.on.ca (Razi Khaja)
Date: Fri Jan 14 03:48:29 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <77987E0F-65E2-11D9-A1F6-000393C44276@duke.edu>
Message-ID: <20050114085206.67889.qmail@web51601.mail.yahoo.com>

 --- Jason Stajich <jason.stajich@duke.edu> wrote: 
> Thanks Razi.
> 
> Tests were succeeding for me with Graph 0.20105-1 - when I upgraded to  
> Graph 0.52 it also worked. I am running perl 5.8.3 though so don't know  
> what is the problem with compatibility.  Do you have the same problems  
> with Graph 0.52?

Yes, I upgraded to Graph 0.52, ... same errors (installed by perl -MCPAN
-eshell).
I downgraded to Graph 0.20105 and no problems (installed by making port
/usr/ports/math/p5-Graph on FreeBSD5.3)

This may only be a BSD/OSX problem ... might have to wait for Graph.pm 0.52
to get ported to BSD to run with the newest version of Graph.pm

Summary of 'make test' with BIOPERLDEBUG=1 (on FreeBSD5.3, perl5.8.5,
Graph.pm ver 0.20105):
...
t/Variation_IO...............ok
t/WABA.......................ok
t/XEMBL_DB...................ok
All tests successful, 2 subtests skipped.
Files=193, Tests=8956, 600 wallclock secs (310.09 cusr + 30.23 csys =
340.32 CPU)

> Going to need someone who can reproduce the bug to debug and fix.
> 
> Since this is a developer release I am not going to hold out on this  
> part too much.  We'll try and get it closed out, otherwise release  
> which ship with some tests turned off.
> 
> I would like to do the release on this coming Monday if possible.

Great!

> -jason


=====
/**
 * Razi Khaja, Bioinformatics Analyst
 * The Hospital for Sick Children, Toronto
 * The Centre for Applied Genomics, www.tcag.ca
 * Tel 416-813-7032, Fax 416-813-8319
 */
From letondal at pasteur.fr  Fri Jan 14 04:29:54 2005
From: letondal at pasteur.fr (Catherine Letondal)
Date: Fri Jan 14 04:23:47 2005
Subject: [Bioperl-l] Getting started with Bio::Perl
In-Reply-To: <41E6AEC5.2050302@york.ac.uk>
References: <41E6AEC5.2050302@york.ac.uk>
Message-ID: <D96A6BD9-660E-11D9-84F0-000D93B0BD32@pasteur.fr>


On Jan 13, 2005, at 6:24 PM, Kat Hull wrote:

> *Dear Users,
> I have a newbie question!  I am interested in the following module 
> 'Bio::Tools::Run::PiseApplication::codonw' but really don't
> know how to start to use it.  I have looked at the documentation etc 
> but am confused about how to pass my array of sequences to
> the module and then how to call the individual functions to perform 
> the calculations (e.g. gc, cai, fop...).
>
> Does anyone have a simple script showing how to run this module with 
> the input as an array of fasta format sequences?
> Many thanks,
>
> Kat
> *

Hi again,

As Jason already answered, first look at 
Bio::Tools::Run::PiseApplication where there is an example of use. You 
can also look at the examples/pise directory.
Regarding the parameters of a specific program (codonw), I suggest to 
first look at the interactive service 
(http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html), for it's 
exactly the same that the one that is run through bioperl). You will 
get a better understanding of the available values and the output files 
of interest (which differ from on program to another).

--
Catherine Letondal -- Institut Pasteur

From jurgen.pletinckx at algonomics.com  Fri Jan 14 07:21:29 2005
From: jurgen.pletinckx at algonomics.com (Jurgen Pletinckx)
Date: Fri Jan 14 06:55:03 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <NDBBLIFFBJPKKEBCNAEHCEHMIAAA.jurgen.pletinckx@algonomics.com>

> perl -v

This is perl, v5.6.1 built for IP27-irix
...

> uname -a
IRIX64 deepskyblue 6.5 10181058 IP27

> make test

t/ESEfinder..................error is 0
t/ESEfinder..................ok
        10/12 skipped: tests which require remote servers - set env variable
BIOPERLDEBUG to test

t/Genewise...................ok
        2/51 skipped:

t/RestrictionIO..............FAILED test 10
        Failed 1/14 tests, 92.86% okay

t/SeqIO......................XML::DOM::XPath not found - skipping interpro tests
t/SeqIO......................ok

t/simpleGOparser.............ok 88/101Use of uninitialized value in hash element
at
/xlv2/users/jpletinc/00Perl/bioperl-1.5-rc2/bioperl-1.5.0-RC2/blib/lib/Bio/Ontol
ogy/OntologyStore.pm line 263, <GEN3> line 11.
Use of uninitialized value in hash element at
/xlv2/users/jpletinc/00Perl/bioperl-1.5-rc2/bioperl-1.5.0-RC2/blib/lib/Bio/Ontol
ogy/OntologyStore.pm line 263, <GEN3> line 11.

t/simpleGOparser.............ok
t/tutorial...................ok 18/21Use of uninitialized value in print at
/xlv2/users/jpletinc/00Perl/bioperl-1.5-rc2/bioperl-1.5.0-RC2/blib/lib/bptutoria
l.pl line 4039, <GEN21> line 934.
t/tutorial...................ok

t/XEMBL_DB...................SOAP::Lite and/or XML::DOM not installed. This
means that Bio::DB::XEMBL module is not usable. Skipping tests.
t/XEMBL_DB...................ok

Failed Test       Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/RestrictionIO.t               14    1   7.14%  10
114 subtests skipped.
Failed 1/193 test scripts, 99.48% okay. 1/8956 subtests failed, 99.99% okay.
*** Error code 11 (bu21)


More specific:
> perl -w -Iblib/lib t/RestrictionIO.t
1..14
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
not ok 10
# Test 10 got: '9' (t/RestrictionIO.t at line 53)
#    Expected: '11'
ok 11
ok 12
ok 13
ok 14

The 'error is 0' message with ESEfinder seems to indicate correct test
completion.


(Thanks, Jason!)

--
Jurgen Pletinckx
AlgoNomics NV

From danielucgbioinfo at yahoo.com.br  Fri Jan 14 07:30:14 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Fri Jan 14 07:26:49 2005
Subject: [Bioperl-l] Method image_and_map
Message-ID: <20050114123014.21358.qmail@web53505.mail.yahoo.com>

HI,

I would like to use method image_and_map of classe
Bio::Graphics::Panel, but I have this menssage : Can't
locate object method "image_and_map" via package
"Bio::Graphics::Panel". 

I'm using Bioperl 1.4, and I saw this method in
Biopel-live and Bioperl 1.5 e not on Bioperl 1.4, but
where I get these version? Or how I do for use
image_and_map?

Thank,
Daniel 

Part of my code :
my $panel = Bio::Graphics::Panel->new(-length      =>
$seq->length,-width       => 1000,-pad_left    => 10,	
		      -pad_right   => 10,				      -key_color   =>
'white',				      -key_spacing => 15,				     
-key_style   => 'bottom',				      -spacing     =>
-0.25,				      -box_subparts => 'true'				      );y
($url,$map,$mapname) = $panel->image_and_map(-root =>
'/cgi-bin',-url => '/tmpimages', );
 
$panel->add_track($wholeseq,  -glyph  => 'arrow', 
-bump   => +1,  -double => 1,  -tick   => 2  );


__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger 
http://br.download.yahoo.com/messenger/ 
From Marc.Logghe at devgen.com  Fri Jan 14 07:49:37 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Fri Jan 14 07:46:08 2005
Subject: [Bioperl-l] Method image_and_map
Message-ID: <BEE28BF86078B6429D6C780635718E219050DC@morelia.be.devgen.com>

Hi,

> I would like to use method image_and_map of classe
> Bio::Graphics::Panel, but I have this menssage : Can't
> locate object method "image_and_map" via package
> "Bio::Graphics::Panel". 
> 
> I'm using Bioperl 1.4, and I saw this method in
> Biopel-live and Bioperl 1.5 e not on Bioperl 1.4, but

That is correct, it was introduced in Bio::Graphics::Panel revision 1.74. Meaning after bioperl-release-1-4-0 which contained 1.70.

> where I get these version? Or how I do for use
> image_and_map?

Bioperl 1.5 RC 2 can be downloaded from http://news.open-bio.org/archives/2005_01.html#000073

HTH,
Marc

From paulo.david at netvisao.pt  Fri Jan 14 08:00:54 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Fri Jan 14 08:08:41 2005
Subject: [Bioperl-l] ProtDist with Phylip 3.6
Message-ID: <10517.193.137.94.3.1105707654.squirrel@193.137.94.3>

I can't get ProtDist running, with Bioperl-run 1.4 and Phylip 3.6. I tried
setting PHYLIPVERSION = 3.6 , because I saw that on the mailing list, but
it didn't work. The test (perl -I. -w t/ProtDist.t) returns "Protdist
program not found". The phylip executable is at /usr/bin . Should there be
a protdist executable too? I don't have that.

-Paulo Almeida

From senger at ebi.ac.uk  Fri Jan 14 08:42:58 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Fri Jan 14 08:39:25 2005
Subject: [Bioperl-l] Re: Bio::Biblio
In-Reply-To: <200501061710.j06H8RKu023694@portal.open-bio.org>
Message-ID: <Pine.LNX.4.44.0501141334520.8271-100000@bagheera.ebi.ac.uk>

> Since two weeks before Christmas, I have a problem to fetch Articles.
>
   I am sorry about delayed reply - but I was away and at once when I left 
the disk crashed here :-(.
   The service is now back and running... but...
   ...for some citations there may be some errors caused by the bug in the
underlying conversion between html and xml. In other words, some returned
XML may not be valid (because some characters there are not properly
escaped). I will fix it (and let you know) as soon as I get response from
our SRS team.  Again, sorry for the delay...

   With regards,
   Martin


From nathanhaigh at ukonline.co.uk  Fri Jan 14 10:24:22 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Fri Jan 14 10:20:58 2005
Subject: [Bioperl-l] ProtDist with Phylip 3.6
In-Reply-To: <10517.193.137.94.3.1105707654.squirrel@193.137.94.3>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAOOEv5LX0nE+V1pSYXIGiiAEAAAAA@ukonline.co.uk>

Phylip is a suite of programs which include around 35 executables, things such as:
Consense
Contml
Drawgram
Drawtree
Neighbor
Proml
Protdist
Protpars
Treedist

Maybe the downloaded file wasn't extracted??? I can't think why else you might have a file called phylip?
Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Paulo Almeida
> Sent: 14 January 2005 13:01
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] ProtDist with Phylip 3.6
> 
> I can't get ProtDist running, with Bioperl-run 1.4 and Phylip 3.6. I tried
> setting PHYLIPVERSION = 3.6 , because I saw that on the mailing list, but
> it didn't work. The test (perl -I. -w t/ProtDist.t) returns "Protdist
> program not found". The phylip executable is at /usr/bin . Should there be
> a protdist executable too? I don't have that.
> 
> -Paulo Almeida
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-3, 14/01/2005
Tested on: 14/01/2005 15:23:44
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From paulo.david at netvisao.pt  Fri Jan 14 10:40:08 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Fri Jan 14 12:21:32 2005
Subject: [Bioperl-l] ProtDist with Phylip 3.6
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7s
	KAAAAQAAAAOOEv5LX0nE+V1pSYXIGiiAEAAAAA@ukonline.co.uk>
References: <10517.193.137.94.3.1105707654.squirrel@193.137.94.3>
	<!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAOOEv5LX0nE+V1pSYXIGiiAEAAAAA@ukonline.co.uk>
Message-ID: <32086.193.137.94.3.1105717208.squirrel@193.137.94.3>

Thanks, I just found the protdist executable, and the script works fine. I
installed phylip with a packaging program, and it put all the other
executables in /usr/lib/phylip ...

-Paulo


> Phylip is a suite of programs which include around 35 executables, things
> such as:
> Consense
> Contml
> Drawgram
> Drawtree
> Neighbor
> Proml
> Protdist
> Protpars
> Treedist
>
> Maybe the downloaded file wasn't extracted??? I can't think why else you
> might have a file called phylip?
> Nathan

From talcon at iastate.edu  Fri Jan 14 13:30:38 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Fri Jan 14 13:27:04 2005
Subject: [Bioperl-l] t/data for bptutorial for windows
Message-ID: <41E80FCE.8000900@iastate.edu>

I downloaded ActivePerl 5.6.1 on Windows XP and used ppm to install 
bioperl and bundle-bioperl.  When I tried to run bptutorial.pl, it 
complained about not having io::string, which I then installed.  My 
current problem is that now when I try to run bptutorial.pl, it 
complains about not finding stuff in folder t/data.  I checked the 
directory and tried reinstalling bioperl, but there's still no folder 
called t/data, which apparently contains data necessary for the examples 
in bptutorial.pl.  If anyone could help me with this, I'd appreciate 
it.  Thanks.

Tim
From gyang at plantbio.uga.edu  Fri Jan 14 14:01:29 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri Jan 14 13:57:47 2005
Subject: [Bioperl-l] regular expression help?
In-Reply-To: <41E80FCE.8000900@iastate.edu>
Message-ID: <20050114140129.68047175@dogwood.plantbio.uga.edu>

Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


From gyang at plantbio.uga.edu  Fri Jan 14 14:12:46 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri Jan 14 14:09:28 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>

Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


From brian_osborne at cognia.com  Fri Jan 14 14:33:51 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Jan 14 14:30:59 2005
Subject: [Bioperl-l] t/data for bptutorial for windows
In-Reply-To: <41E80FCE.8000900@iastate.edu>
Message-ID: <GAEDKMGOKFBLJPKCLKCCIEKBEHAA.brian_osborne@cognia.com>

Tim,

I'm not sure where that t/data directory ends up when you use ppm to install
Bioperl but it's in there somewhere. You'll need to find it and execute
bptutorial.pl fro within the directory containing t/. A bit awkward, yes.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Tim Alcon
Sent: Friday, January 14, 2005 1:31 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] t/data for bptutorial for windows


I downloaded ActivePerl 5.6.1 on Windows XP and used ppm to install
bioperl and bundle-bioperl.  When I tried to run bptutorial.pl, it
complained about not having io::string, which I then installed.  My
current problem is that now when I try to run bptutorial.pl, it
complains about not finding stuff in folder t/data.  I checked the
directory and tried reinstalling bioperl, but there's still no folder
called t/data, which apparently contains data necessary for the examples
in bptutorial.pl.  If anyone could help me with this, I'd appreciate
it.  Thanks.

Tim
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From fernan at iib.unsam.edu.ar  Fri Jan 14 14:40:22 2005
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Fri Jan 14 14:37:25 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <20050114194022.GC55770@iib.unsam.edu.ar>

+----[ Jason Stajich <jason.stajich@duke.edu> (12.Jan.2005 17:25):
|
| We need people to test on this.  So download, run
|  perl Makefile.PL
|  make
|  make test
| 
| Let us know what breaks. 
|
+----]

This is on FreeBSD-4.10p5 (RELENG_4_10), i386, perl v5.8.5.

A summary of the failed tests follows (anyone knows why I'm
getting percentages over 100%?)

A complete log is available at 
http://genoma.unsam.edu.ar/~fernan/freebsd/bioperl-1.5.0-RC2.tests.gz

And the list of perl modules installed in my box and their
versions is here:
http://genoma.unsam.edu.ar/~fernan/freebsd/p5-ports.txt

If some tests can be fixed just by updating perl modules,
let me know. Perl modules in FreeBSD are installed from the
ports system, so when updating the bioperl port to 1.5 we
should also make sure to update dependencies as needed.
The versions of my installed perl modules are the latest
available in the ports tree (of course there could be newer
versions in CPAN that are not yet in the FreeBSD ports tree)

Fernan


Failed Test           Stat Wstat Total Fail  Failed  List of Failed
-------------------------------------------------------------------------------
t/AlignIO.t            255 65280   152  230 151.32%  10-11
39-152
t/AlignUtil.t                       16   14  87.50%  2-15
t/AnnotationAdaptor.t  255 65280    19   36 189.47%  2-19
t/CodonTable.t         255 65280    44    6  13.64%  42-44
t/LocationFactory.t                179    1   0.56%  64
t/OntologyStore.t                    6    1  16.67%  6
t/PAML.t               255 65280   142    0   0.00%  ??
t/PopGen.t             255 65280    85  170 200.00%  1-85
t/ProtPsm.t            255 65280     5    6 120.00%  3-5
t/Registry.t           255 65280    13   11  84.62%  8-13
t/SearchIO.t           255 65280  1216   17   1.40%  170 211 264-265 469 580
                                                     582 584 598 600 643 685-
						     686 713-714 1216
t/SeqFeature.t                     192    2   1.04%  76 81
t/SeqIO.t              255 65280   345  562 162.90%  65-345
t/Species.t            255 65280    21   22 104.76%  11-21
t/StandAloneBlast.t                 18    1   5.56%  3
t/Tree.t               255 65280    26   47 180.77%  3-26
t/TreeBuild.t          255 65280     7   14 200.00%  1-7
t/TreeIO.t             255 65280    50   43  86.00%  28-50
t/Unflattener2.t                    11    2  18.18%  7 10
t/UniGene.t                         63    1   1.59%  12
t/game.t                            23    1   4.35%  9
t/hmmer.t                          136   14  10.29%  8 13 125-136
t/primaryqual.t        255 65280    32    8  25.00%  29-32
t/psm.t                255 65280    48   78 162.50%  10-48
t/qual.t               255 65280    12    0   0.00%  ??
t/simpleGOparser.t                 101    5   4.95%  78-79 83-84 87
t/singlet.t                          3    2  66.67%  2-3
114 subtests skipped.
Failed 27/193 test scripts, 86.01% okay. 680/8956 subtests failed, 92.41% okay.
From danielucgbioinfo at yahoo.com.br  Fri Jan 14 15:22:24 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Fri Jan 14 15:19:09 2005
Subject: [Bioperl-l] A error with Bio::Graphics::Pane, help me!!!!!
Message-ID: <20050114202224.63661.qmail@web53503.mail.yahoo.com>

HI,

I have done my code with image_and_map of
Bio::Graphics::Pane Classe. So, out this message:
> gd-png:  fatal libpng error: Image width or height
is zero in IHDR
> gd-png error: setjmp returns error condition

My code until this erro is:
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
#use CGI ':standard';
use CGI::Pretty;

my $file = '/var/www/cgi-bin/AL391145.gb';
my $io = Bio::SeqIO->new(-file=>$file);
my $seq = $io->next_seq;
#my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
my @features = $seq->all_SeqFeatures;
 my $q = new CGI;

# sort features by their primary tags
my %sorted_features;
for my $f (@features) {
  my $tag = $f->primary_tag;
  push @{$sorted_features{$tag}},$f;
}

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

my $panel =
Bio::Graphics::Panel->new(-length=>$seq->length,
-width       => 1000,				      -pad_left    => 10,				
     -pad_right   => 10,				      -key_color   =>
'white',				-key_spacing => 15,				      -key_style  
=> 'bottom',			      -spacing     => -0.25,				     
-box_subparts => 'true'				      );
my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/home/bioinfo/cgi-bin',-url => '/tmpimages', );

----
I don't know what I have to do. Please somebody help
me.

By,
Daniel


_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From danielucgbioinfo at yahoo.com.br  Fri Jan 14 15:25:58 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Fri Jan 14 15:22:09 2005
Subject: [Bioperl-l] gd-png: fatal libpng error: Image width or height is
	zero in IHDR
Message-ID: <20050114202558.33598.qmail@web53509.mail.yahoo.com>

HI,

I have done my code with image_and_map of
Bio::Graphics::Pane Classe. So, out this message:
> gd-png:  fatal libpng error: Image width or height
is zero in IHDR
> gd-png error: setjmp returns error condition

My code until this erro is:
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
#use CGI ':standard';
use CGI::Pretty;

my $file = '/var/www/cgi-bin/AL391145.gb';
my $io = Bio::SeqIO->new(-file=>$file);
my $seq = $io->next_seq;
#my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
my @features = $seq->all_SeqFeatures;
 my $q = new CGI;

# sort features by their primary tags
my %sorted_features;
for my $f (@features) {
  my $tag = $f->primary_tag;
  push @{$sorted_features{$tag}},$f;
}

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

my $panel =
Bio::Graphics::Panel->new(-length=>$seq->length,
-width       => 1000,				      -pad_left    => 10,				
     -pad_right   => 10,				      -key_color   =>
'white',				-key_spacing => 15,				      -key_style  
=> 'bottom',			      -spacing     => -0.25,				     
-box_subparts => 'true'				      );
my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/home/bioinfo/cgi-bin',-url => '/tmpimages', );

----
I don't know what I have to do. Please somebody help
me.

By,
Daniel


__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger 
http://br.download.yahoo.com/messenger/ 
From paulo.david at netvisao.pt  Fri Jan 14 14:53:59 2005
From: paulo.david at netvisao.pt (Paulo Almeida)
Date: Fri Jan 14 18:57:31 2005
Subject: [Bioperl-l] regular expression help?
In-Reply-To: <20050114140129.68047175@dogwood.plantbio.uga.edu>
References: <41E80FCE.8000900@iastate.edu>
	<20050114140129.68047175@dogwood.plantbio.uga.edu>
Message-ID: <15455.193.137.94.3.1105732439.squirrel@193.137.94.3>

Hi,

There is something I don't understand in your expression (actually, there
is a lot, but the rest I can't even comment on). If you have
\S+(\S+)(\S{10}) , isn't the first \S+ going to match through the whole
string?

-Paulo

> Hi, Everybody,
> I was trying to use a regex recognizing a patter of inverted repeat DNA
> seq flanked by direct repeats (see below), it returns errors saying
> "(?{...}) not terminated or {...} not balanced. Can anybody help me
> sorting this out?
> The regex I have is:
> $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~
> tr/ATCG/TAGC/i);})\1.*/i;
> Thank you,
> Yang

From Marc.Logghe at devgen.com  Sat Jan 15 03:34:10 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Sat Jan 15 03:35:43 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <BEE28BF86078B6429D6C780635718E21B3A0A8@morelia.be.devgen.com>

Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From Jan.Aerts at wur.nl  Sat Jan 15 04:24:39 2005
From: Jan.Aerts at wur.nl (Aerts, Jan)
Date: Sat Jan 15 04:21:18 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <7D030487F1A3D143A76F2A1E91F570350186DB63@scomp0010>

Without taking the time to look at the actual expression (sorry): a nice aid for developing more complicated regexes is the Regex Coach (http://www.weitz.de/regex-coach/). It allows you to experiment with regex and shows you the result interactively.

Good luck,
Jan Aerts


-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Marc Logghe
Sent:	Sat 15-Jan-05 09:34
To:	Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From Marc.Logghe at devgen.com  Sat Jan 15 06:31:59 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Sat Jan 15 06:29:41 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <BEE28BF86078B6429D6C780635718E21B3A0A9@morelia.be.devgen.com>

Hi Jan !
Nice goody indeed.
But I am afraid that the extended regular expression feature ${code} is not supported by regex-coach.
Maybe this had to do with the fact that this regex feature seems to be highly experimental and might be changed or even deleted in future Perl versions.
Cheers,
Marc


-----Oorspronkelijk bericht-----
Van: bioperl-l-bounces@portal.open-bio.org namens Aerts, Jan
Verzonden: za 15-1-2005 10:24
Aan: Guojun Yang; bioperl-l@portal.open-bio.org
Onderwerp: RE: [Bioperl-l] regular expression help!
 
Without taking the time to look at the actual expression (sorry): a nice aid for developing more complicated regexes is the Regex Coach (http://www.weitz.de/regex-coach/). It allows you to experiment with regex and shows you the result interactively.

Good luck,
Jan Aerts


-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Marc Logghe
Sent:	Sat 15-Jan-05 09:34
To:	Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From nathanhaigh at ukonline.co.uk  Sat Jan 15 09:03:13 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Sat Jan 15 08:59:45 2005
Subject: [Bioperl-l] t/data for bptutorial for windows
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCIEKBEHAA.brian_osborne@cognia.com>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAiNGdi6AaOUiQrW/QMwrgnAEAAAAA@ukonline.co.uk>

Actually, t\data doesn't end up in the tar.gz file that is downloaded when using ppm to install bioperl. If you like I (or someone
else) could send you the data files and then you could proceed as per Brian's instructions.

Nathan  

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Brian Osborne
> Sent: 14 January 2005 19:34
> To: Tim Alcon; bioperl-l@portal.open-bio.org
> Subject: RE: [Bioperl-l] t/data for bptutorial for windows
> 
> Tim,
> 
> I'm not sure where that t/data directory ends up when you use ppm to install
> Bioperl but it's in there somewhere. You'll need to find it and execute
> bptutorial.pl fro within the directory containing t/. A bit awkward, yes.
> 
> Brian O.
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Tim Alcon
> Sent: Friday, January 14, 2005 1:31 PM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] t/data for bptutorial for windows
> 
> 
> I downloaded ActivePerl 5.6.1 on Windows XP and used ppm to install
> bioperl and bundle-bioperl.  When I tried to run bptutorial.pl, it
> complained about not having io::string, which I then installed.  My
> current problem is that now when I try to run bptutorial.pl, it
> complains about not finding stuff in folder t/data.  I checked the
> directory and tried reinstalling bioperl, but there's still no folder
> called t/data, which apparently contains data necessary for the examples
> in bptutorial.pl.  If anyone could help me with this, I'd appreciate
> it.  Thanks.
> 
> Tim
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0502-3, 14/01/2005
Tested on: 15/01/2005 13:59:02
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From Jan.Aerts at wur.nl  Sat Jan 15 09:17:28 2005
From: Jan.Aerts at wur.nl (Aerts, Jan)
Date: Sat Jan 15 09:14:05 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <7D030487F1A3D143A76F2A1E91F570350186DB64@scomp0010>

You're right... Should have looked at the actual expression.
Idea: is it possible in this case to call subroutines from within a regex and evaluating them using the 'e' switch?

j.


-----Original Message-----
From:	Marc Logghe [mailto:Marc.Logghe@devgen.com]
Sent:	Sat 15-Jan-05 12:31
To:	Aerts, Jan; Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi Jan !
Nice goody indeed.
But I am afraid that the extended regular expression feature ${code} is not supported by regex-coach.
Maybe this had to do with the fact that this regex feature seems to be highly experimental and might be changed or even deleted in future Perl versions.
Cheers,
Marc


-----Oorspronkelijk bericht-----
Van: bioperl-l-bounces@portal.open-bio.org namens Aerts, Jan
Verzonden: za 15-1-2005 10:24
Aan: Guojun Yang; bioperl-l@portal.open-bio.org
Onderwerp: RE: [Bioperl-l] regular expression help!
 
Without taking the time to look at the actual expression (sorry): a nice aid for developing more complicated regexes is the Regex Coach (http://www.weitz.de/regex-coach/). It allows you to experiment with regex and shows you the result interactively.

Good luck,
Jan Aerts


-----Original Message-----
From:	bioperl-l-bounces@portal.open-bio.org on behalf of Marc Logghe
Sent:	Sat 15-Jan-05 09:34
To:	Guojun Yang; bioperl-l@portal.open-bio.org
Cc:	
Subject:	RE: [Bioperl-l] regular expression help!
Hi,
In the part (??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);}) I have my doubts about the double question marks.
In case you don't want capturing braces and you want to execute code, I think it should look like:
(?:?{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})
In case you want to capture, then there is one ? too many because the syntax is '?{ code }' and not '??{ code }' for executing code in a regex.
HTH,
Marc 


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Guojun Yang
Sent: Fri 1/14/2005 8:12 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] regular expression help!
 
Hi, Everybody,
I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
The regex I have is:
$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
Thank you,
Yang


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From zayed.albertyn at gmail.com  Fri Jan 14 01:50:40 2005
From: zayed.albertyn at gmail.com (zayed albertyn)
Date: Sat Jan 15 10:11:22 2005
Subject: [Bioperl-l] Finding Alignment overlaps
Message-ID: <81da19f3050113225018d1c01a@mail.gmail.com>

Dear Bioperl Community

I have output from an alignment program that produces coordinates with
reference to the query sequence e.g.

3665384,3665702-1770163,1770480
3665130,3665474-3695657,3696000
3665115,3665357-1770508,1770749

Each line represent <querybegin>,<queryend>-<targetbegin>,<targetend>

I know how to add each line as a sequence feature using
Bio::Seqfeature::Generic. Is there a bioperl class or associated
method that can be used for determing possible overlaps in these
alignments?
Eventually I would like to find all overlaps and merge them if possible.

Thanks for the help,
Zayed


-- 
-----------------------------------------------------------
Zayed Albertyn
From jason.stajich at duke.edu  Sat Jan 15 10:46:25 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sat Jan 15 10:42:40 2005
Subject: [Bioperl-l] Finding Alignment overlaps
In-Reply-To: <81da19f3050113225018d1c01a@mail.gmail.com>
References: <81da19f3050113225018d1c01a@mail.gmail.com>
Message-ID: <9D4002D6-670C-11D9-83B1-000393C44276@duke.edu>

Bio::SeqFeature::Collection lets you efficiently extract subsets of 
Features or Locations that overlap using Lincoln's binning algorithm 
that is in Bio::DB::GFF.  It is done storing data in a flatfile 
BerkeleyDB  B-Trees using the DB_File module.

-jason
On Jan 14, 2005, at 1:50 AM, zayed albertyn wrote:

> Dear Bioperl Community
>
> I have output from an alignment program that produces coordinates with
> reference to the query sequence e.g.
>
> 3665384,3665702-1770163,1770480
> 3665130,3665474-3695657,3696000
> 3665115,3665357-1770508,1770749
>
> Each line represent <querybegin>,<queryend>-<targetbegin>,<targetend>
>
> I know how to add each line as a sequence feature using
> Bio::Seqfeature::Generic. Is there a bioperl class or associated
> method that can be used for determing possible overlaps in these
> alignments?
> Eventually I would like to find all overlaps and merge them if 
> possible.
>
> Thanks for the help,
> Zayed
>
>
>
>
> -- 
> -----------------------------------------------------------
> Zayed Albertyn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From sac at portal.open-bio.org  Sat Jan 15 12:50:25 2005
From: sac at portal.open-bio.org (Steve Chervitz)
Date: Sat Jan 15 16:55:57 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <BE0E97E1.3FA2%sac@bioperl.org>

I just committed a small fix to Bio::DB::NCBIHelper, to deal with
downloading gbwithparts records.

It now allows you to deal with some genbank nucleotide records that by
default don't contain all CDS features, such as L42023.

To force the Bio::DB object to get all the features, you can do the
following:

my $gb = new Bio::DB::GenBank;
$gb->request_format('gbwithparts');

Not sure if this is the best approach, but it seems reasonable.

Steve

> From: Jason Stajich <jason.stajich@duke.edu>
> Date: Wed, 12 Jan 2005 15:14:17 -0500
> To: Bioperl list <bioperl-l@portal.open-bio.org>
> Subject: [Bioperl-l] bioperl-1.5.0 RC2
> 
> In preparation for Bioperl 1.5.0 developer release I have put up
> Release Candidate 2.
> 
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.gz
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.tar.bz2
>   http://bioperl.org/DIST/bioperl-1.5.0-RC2.zip
> 
> 
> We need people to test on this.  So download, run
>   perl Makefile.PL
>   make
>   make test
> 
> Let us know what breaks.  I've tested on OS X and few different linux
> installs with different auxiliary modules installed.  Would be nice to
> have a few more combinations of OS, perl versions, and suite of modules
> installed before we make a release.
> 
> Thanks for your help.
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From perlguy at hotmail.com  Sat Jan 15 17:50:44 2005
From: perlguy at hotmail.com (Philip Parker)
Date: Sat Jan 15 17:48:39 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <BAY13-F292EB48DAEC7D8EFBB012CAF8C0@phx.gbl>

If posting a question regarding a questionable regexp, it would be a good 
idea if you'd include a small sample of the text you're running it on. That 
way one can get a better idea of what you're after and possibly help you 
create a more efficient regexp. Those first \S+ have me worried.

Philip Parker -  perlguy@hotmail.com


From rob at salmonella.org  Sat Jan 15 18:22:23 2005
From: rob at salmonella.org (Rob Edwards)
Date: Sat Jan 15 18:18:43 2005
Subject: [Bioperl-l] GFF3
Message-ID: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>

Because I need it for some things that I am doing, I have worked quite 
a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
written this module, I have just made some cosmetic changes:

I have improved the validation processes that are applied as a gff3 
file is parsed, and the module should now validate essentially 
everything in the file except alignments. Validation is optional and is 
based on the specification described at : 
http://song.sourceforge.net/gff3.shtml

For clarification and edification I have created a couple of tables 
describing the module and the validation that is applied to GFF3 files, 
which you can see online: http://www.salmonella.org/bioperl/gff3.html

I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
sequences, it seems that you'd want to be able to call the next_seq 
methods, and therefore SeqIO is more appropriate than FeatureIO for 
those aspects. Currently the SeqIO module uses the FeatureIO module for 
parsing the file, it just reorganizes things.

This provides two different interfaces for getting objects out of GFF3 
files:
	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
representing the annotations.
	Bio::SeqIO::gff will return Bio::Seq objects representing the 
sequences with all the annotations attached.

The other difference between the two is that the former passes out the 
objects as they are read, but the latter has to read the whole file to 
get the annotations and the sequences.

At the moment I focussed on reading GFF3 files.

I have not committed these to cvs yet, pending comments from others. I 
have some specific questions:
	Should I wait until after 1.5 is out?
	Is two separate modules really the right way to go about this?
	What about other GFF modules (like Bio::Tools::GFF)?
	Could someone give the modules a workout and let me know about bugs? I 
am sure there are many.

I have posted these modules online via anonymous ftp at 
ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
Take a look and let me know what you do and don't like!

Rob

From george_titus6 at yahoo.com  Sun Jan 16 01:29:33 2005
From: george_titus6 at yahoo.com (george titus)
Date: Sun Jan 16 09:45:23 2005
Subject: [Bioperl-l] Drawing Chromosomes
Message-ID: <20050116062933.1054.qmail@web52210.mail.yahoo.com>

hai 
 please help me drawing ideograms , igot the data from ncbi ,which module shold i use?.i am confused with the data.help me .
 
 
---------------------------------
Do you Yahoo!?
 Yahoo! Mail - now with 250MB free storage. Learn more.
From lstein at cshl.edu  Sat Jan 15 22:14:56 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Sun Jan 16 09:45:55 2005
Subject: [Bioperl-l] A error with Bio::Graphics::Pane, help me!!!!!
In-Reply-To: <20050114202224.63661.qmail@web53503.mail.yahoo.com>
References: <20050114202224.63661.qmail@web53503.mail.yahoo.com>
Message-ID: <200501151914.56616.lstein@cshl.edu>

This happens when you try to create an image with height or width of 
zero.  Check to make sure that your sequence has positive length.

Lincoln

On Friday 14 January 2005 12:22 pm, Danielucg Sousa wrote:
> HI,
>
> I have done my code with image_and_map of
>
> Bio::Graphics::Pane Classe. So, out this message:
> > gd-png:  fatal libpng error: Image width or height
>
> is zero in IHDR
>
> > gd-png error: setjmp returns error condition
>
> My code until this erro is:
> #!/usr/bin/perl -wT
>
> use strict;
> use Bio::Graphics;
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> #use CGI ':standard';
> use CGI::Pretty;
>
> my $file = '/var/www/cgi-bin/AL391145.gb';
> my $io = Bio::SeqIO->new(-file=>$file);
> my $seq = $io->next_seq;
> #my $wholeseq =
> Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);
> my @features = $seq->all_SeqFeatures;
>  my $q = new CGI;
>
> # sort features by their primary tags
> my %sorted_features;
> for my $f (@features) {
>   my $tag = $f->primary_tag;
>   push @{$sorted_features{$tag}},$f;
> }
>
> print $q->header('text/html');
> print $q->start_html('A Vector Rendering ');
>
> my $panel =
> Bio::Graphics::Panel->new(-length=>$seq->length,
> -width       => 1000,				      -pad_left    => 10,
>      -pad_right   => 10,				      -key_color   =>
> 'white',				-key_spacing => 15,				      -key_style
> => 'bottom',			      -spacing     => -0.25,
> -box_subparts => 'true'				      );
> my ($url,$map,$mapname) = $panel->image_and_map(-root
> => '/home/bioinfo/cgi-bin',-url => '/tmpimages', );
>
> ----
> I don't know what I have to do. Please somebody help
> me.
>
> By,
> Daniel
>
>
>
>
>
>
> _______________________________________________________
> Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora.
> http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050115/9c85ade9/attachment-0001.bin
From robinxml at yahoo.com  Sun Jan 16 01:20:05 2005
From: robinxml at yahoo.com (Robin XML)
Date: Sun Jan 16 09:45:57 2005
Subject: [Bioperl-l] bioperl
Message-ID: <20050116062005.37768.qmail@web30103.mail.mud.yahoo.com>

Dear Sir,
I am a beginner in bioinformatics. I am being excited
by your fantastic biopel functions. But some questions
confuse me:
1.Is it possible to call bioperl functions by Java
under Windows? because I need a GUI and need Java to
handle XML template modification.
2. Is it correct that with Bio::DB::GenBank() and
Bio::SeqIO, I can get full GanBank data in XML format?
Is it means include the features part?


Thank you!!!!!!

Best regards,
Robin


__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail
From corenth at gmail.com  Sun Jan 16 09:53:55 2005
From: corenth at gmail.com (Willy West)
Date: Sun Jan 16 09:50:03 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <7D030487F1A3D143A76F2A1E91F570350186DB65@scomp0010>
References: <7D030487F1A3D143A76F2A1E91F570350186DB65@scomp0010>
Message-ID: <4f10f19405011606531737d90@mail.gmail.com>

oops- i'd forgotten to "reply to all" with this... i apologize.


On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan <Jan.Aerts@wur.nl> wrote:
> The problem is (or I might miss something here), that he wants to _test_ a regex. It's not possible to write something like
> $_ =~ /(.*)(.*)foo(\2)(.*)/e
> I think...
> 
> jan.

now i'm trying to do this with the test regex and am not successful :(
  this is an interesting problem and i really would love to find a
way..

one solution would be to explode the whole thing in another
subroutine... but if it's
not  what you want, i'm not yet sure how to do it.

good challenge though.....

:)

> 
> 
> -----Original Message-----
> From:   Willy West [mailto:corenth@gmail.com]
> Sent:   Sun 16-Jan-05 00:09
> To:     Aerts, Jan
> Cc:
> Subject:        Re: [Bioperl-l] regular expression help!
> On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan <Jan.Aerts@wur.nl> wrote:
> > You're right... Should have looked at the actual expression.
> > Idea: is it possible in this case to call subroutines from within a regex and evaluating them using the 'e' switch?
> 
> if i recall::
> 
> sub foo {
>            return 'hello genome';
> }
> 
> $data = "ih ho hum bababa";
> 
> $data =~ s/ih/foo/e; #one way to do it.
> 
> print "$data\n";
> 
> seems to work..


-- 
Willy
http://www.hackswell.com/corenth
From senger at ebi.ac.uk  Sun Jan 16 11:29:37 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Sun Jan 16 11:25:54 2005
Subject: [Bioperl-l] Re: Bio::Biblio
In-Reply-To: <Pine.LNX.4.44.0501141334520.8271-100000@bagheera.ebi.ac.uk>
Message-ID: <Pine.OSF.4.21.0501161628200.746707-100000@ice.ebi.ac.uk>

Hi again,

>    ...for some citations there may be some errors caused by the bug in the
> underlying conversion between html and xml. In other words, some returned
> XML may not be valid (because some characters there are not properly
> escaped). I will fix it (and let you know) as soon as I get response from
> our SRS team.
> 
   This was fixed over ths weekend. Now you should not get back bad XML
entries.

   With regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From mlemieux at bioinfo.ca  Sun Jan 16 14:00:45 2005
From: mlemieux at bioinfo.ca (Madeleine Lemieux)
Date: Sun Jan 16 13:57:00 2005
Subject: [Bioperl-l] regular expression help!
Message-ID: <ED08F16B-67F0-11D9-8237-000A95B139D2@bioinfo.ca>

I'm not sure if this is the sort of thing you mean:

#!/usr/bin/perl -w

my @test_strings = ("acgttgcaacgt", "acgtacgt", "acgttgca", "ata");

foreach my $seq (@test_strings) {
     # force case change here, if necessary
     $seq =~ /([acgt]+)(?=([acgt]+)\1)/;
     my $fwd = $1;
     (my $rev = $2) =~ tr/acgt/tgca/;
     if ($fwd eq $rev) {
         print $seq, ' ', $fwd, ' ', $2, "\n";
     }
}

HTH,
Madeleine

> Hi, Everybody,
> I was trying to use a regex recognizing a patter of inverted repeat 
> DNA seq flanked by direct repeats (see below), it returns errors 
> saying "(?{...}) not terminated or {...} not balanced. Can anybody 
> help me sorting this out?
> The regex I have is:
> $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ 
> tr/ATCG/TAGC/i);})\1.*/i;
> Thank you,
> Yang
>
>

From senger at ebi.ac.uk  Mon Jan 17 06:42:24 2005
From: senger at ebi.ac.uk (Martin Senger)
Date: Mon Jan 17 06:38:41 2005
Subject: [Bioperl-l] Re: Bio::Biblio
In-Reply-To: <Pine.OSF.4.21.0501161628200.746707-100000@ice.ebi.ac.uk>
Message-ID: <Pine.OSF.4.21.0501171137190.1151967-100000@mozart.ebi.ac.uk>

Hi once again,

   There is one thing that is different from the previous citations
returned in XML format: now the returned XML starts with the full XML
declaration. Something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MedlineCitationSet
PUBLIC "-//NLM//DTD NLM Medline, 1st November 2004//EN"
"http://www.nlm.nih.gov/databases/dtd/nlmmedline_041101.dtd">

   This probably does not break any of your code - but it may (e.g. when
your code adds the XML declarations on its own).

   With regards,
   Martin

-- 
Martin Senger

EMBL Outstation - Hinxton                Senger@EBI.ac.uk     
European Bioinformatics Institute        Phone: (+44) 1223 494636      
Wellcome Trust Genome Campus             (Switchboard:     494444)
Hinxton                                  Fax  : (+44) 1223 494468
Cambridge CB10 1SD
United Kingdom                           http://industry.ebi.ac.uk/~senger

From Peter.Robinson at t-online.de  Mon Jan 17 06:06:02 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Mon Jan 17 08:33:52 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
In-Reply-To: <Pine.OSX.4.58.0501101732020.407@skerryvore.dhcp.lbl.gov>
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net>
	<1104792001.3186.17.camel@localhost.localdomain>
	<0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu>
	<1104871954.3102.24.camel@localhost.localdomain>
	<1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu>
	<1105044266.3084.27.camel@localhost.localdomain>
	<Pine.OSX.4.58.0501101732020.407@skerryvore.dhcp.lbl.gov>
Message-ID: <1105959962.21808.15.camel@localhost.localdomain>

Hi list,

here's an update on Entrez Gene. 
1) NCBI apparently does not have plans to offer the files in XML format
for FTP download. It is possible to download the files in XML format
from the website, even including the files for the entire species with
corresponding queries (although I havent tried this yet). It seems this
might be too complicated for many users and there could be issues of
stability for browsers downloading files of that size.


2) I have completed two reasonably simple modules for parsing gene_info
and gene2accession using the SeqIO interface. These are attached
together with simple demo programs. These modules can be used to do some
useful things. For instance, we often want to generate a list of
correspondences between NCBI accession numbers and MGI accession numbers
so as to be able to use MGI's Gene Ontology annotations for the mouse.I
have included a script (accession2mgi.pl) that uses the above modules to
parse gene_info and gene2accession to do this (you need to use both
files)

3) In the meantime I have also gotten a lex/yacc parser in C to parse
the species-specific Gene files (which is by far the most interesting
file in the Entrez gene system). In principle this approach could be
done in Perl -- straightforward but a lot of detail work. I will be
needing this kind of thing for my work, so I will continue to work on
this, and once it is bug-free in C I will think about ways of porting it
to Bioperl (this might take a while). As I mentioned before on this
list, if anybody else can do this more quickly please go ahead (but drop
me a line); on the other hand, collaborators who like the idea of
writing a grammer in the style of lex/yacc or ANTLR are also welcome.

--peter


On Tue, 2005-01-11 at 02:33, Chris Mungall wrote: 
> Hi Peter
> 
> Have you tried asking NCBI to make XML available as well as ASN? In
> general they seem keen to offer both for most of their datasets. If not, I
> believe the NCBI toolkit has an ASN->XML converter.
> 
> Cheers
> Chris
> 
> On Thu, 6 Jan 2005, Peter Robinson wrote:
> 
> > Dear Bioperlers,
> >
> > I have started looking at writing some modules to parse the new Entrez
> > gene, which is kind of an expanded LocusLink. The really interesting
> > files are species specific and are in the ASN.1 format, and I am still
> > experimenting around with the best way of parsing them. To get started,
> > I am looking at the tab-delimited flat files. It seems to me that it
> > would be interesting to be able to parse gene_info and gene2accession
> > using the Bio::SeqIO system, the other files such as gene2unigene seem
> > less suited for this (the latter has just two entries which could be
> > parsed ad hoc easily enough).
> >
> > In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
> > well as a test script (which contains a small excerpt of gene_info in
> > the data section) for comments and criticism to the list. I am presently
> > working on another module for Bio::SeqIO::gene2accession and plan to
> > write a demo script using both modules to convert NCBI accession numbers
> > to MGI accession numbers (which is something one might want to do in
> > order to use Gene Ontology for affymetrix data, although one needs
> > additional work for probesets which are only related to ESTs).
> >
> > For the moment it seemed better to just parse in the NCBI taxon id into
> > the Bio::Species object (only this info is supplied by gene_info), and
> > expect users who need the information to use the taxonomy support of
> > other Bioperl modules in their scripts.
> >
> > I will continue to work on parsing the species specific ASN.1 files, but
> > I will be trying a combination of lex/yacc/C to do this. If that works I
> > will look into trying perl support for lex/yacc for potential use in
> > Bioperl, but since I am not sure how long this will take me, I do not
> > want to scare off anyone else who would like to give this a shot.
> >
> > best,
> > peter
> >
> >
> > On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
> > > On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > thanks for the advice. It seems as if the documentation of
> > > > Bio::DB::Taxonomy is a bit out of sync.
> > > >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
> > > >                                  -nodesfile => $nodesfile,
> > > >                                  -namesfile => $namefile);
> > > > What does 'flatfile' refer to here? It is not apparent upon looking at
> > > > the code for new.
> > > >
> > > See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned
> > > in the mail I sent, flatfile is for downloading the taxonomy DB from
> > > NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
> > > DB_File) version of the file.
> > >
> > > You must need the most up-to-date verion of the modules - works fine
> > > for me for both the entrez and flatfile code, but you may have to
> > > upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 RC1
> > > code should work fine.
> > >
> > >
> > >
> > > > I had somewhat better luck using the entrez version, but I got a
> > > > pretty amusing error
> > > > message:
> > > >
> > > > MSG: can't create a species object for Homo sapiens (human) because it
> > > > isn't a species but is a '' instead
> > > >
> > > > ###
> > > > Full error and a dump of the script follow:
> > > >
> > > > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
> > > > my $taxaid = $db->get_taxonid('Homo sapiens');
> > > > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
> > > > print Dumper($species);
> > > >
> > > > ###
> > > >
> > > > Use of uninitialized value in string eq at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> > > > Use of uninitialized value in sprintf at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: can't create a species object for Homo sapiens (human) because it
> > > > isn't a species but is a '' instead
> > > > ---------------------------------------------------
> > > > Use of uninitialized value in string eq at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
> > > > Use of uninitialized value in sprintf at
> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: can't create a species object for Homo sapiens (human) because it
> > > > isn't a species but is a '' instead
> > > > ---------------------------------------------------
> > > > $VAR1 = {
> > > >           'TaxId' => '9606',
> > > >           'Division' => 'mammals',
> > > >           'GeneNumber' => '32775',
> > > >           'Rank' => 'species',
> > > >           'ProtNumber' => '247791',
> > > >           'ScientificName' => 'Homo sapiens',
> > > >           'CommonName' => 'human',
> > > >           'NucNumber' => '9025800',
> > > >           'GenNumber' => '25',
> > > >           'StructNumber' => '5638'
> > > >         };
> > > > peter@anna:~/programs/bioperlTest$
> > > >
> > > >
> > > > --best, peter
> > > >
> > > > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
> > > >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
> > > >> species object (or equivalent) using this code.  But you cannot (or
> > > >> could not when I wrote this, not sure of the current status) get the
> > > >> full classification from the NCBI taxonomy retrieval via cgi.  i.e.
> > > >> you
> > > >> can only get genus and species for a taxon id and I don't know how to
> > > >> walk up the hierarchy using the web API.  Earlier emails to NCBI
> > > >> seemed
> > > >> to indicate this is all they intended to provide, but not sure what
> > > >> the
> > > >> current status is.
> > > >>
> > > >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
> > > >> Entrez
> > > >> over HTTP
> > > >>    my $taxaid = $db->get_taxonid('Homo sapiens');
> > > >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
> > > >>
> > > >> You can get the full classification if you use the
> > > >> Bio::DB::Taxonomy::flatfile factory which requires you to have
> > > >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
> > > >> reliable (and faster) it is what I have tended to use for grouping
> > > >> sets
> > > >> of seqDB search results, etc.
> > > >>
> > > >> -jason
> > > >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
> > > >>
> > > >>> Hi Bioperlers, hi Hilmar,
> > > >>>
> > > >>> after some thinking I have embarked on a lex/yacc parser for the
> > > >>> Entrez
> > > >>> Gene ASN.1 format as the way of least resistance, although I am not
> > > >>> sure
> > > >>> how that would fit in to BioPerl. If anyone is interested in this (or
> > > >>> has a better idea of how to go about it..), please drop me a line.
> > > >>>
> > > >>> In the meantime I have been looking at writing code to parse some of
> > > >>> the
> > > >>> "easy" Entrez gene documents, starting off with gene_info. This file
> > > >>> includes the NCBI taxon id for each entry. I would like to convert
> > > >>> this
> > > >>> to a Bio::Species object to pass to the following
> > > >>> 	my $seq = $self->sequence_factory->create(
> > > >>> 			     -verbose => $self->verbose(),
> > > >>> 			     -accession_number => $geneID,
> > > >>> 			     -desc => $description,
> > > >>> 			     -display_id => $symbol,
> > > >>> 			     -species =>  ???
> > > >>> 			     -annotation => $ann);
> > > >>>
> > > >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
> > > >>> this sort of thing. However, the code for that is pretty preliminary.
> > > >>> Is
> > > >>> anyone working on this at the moment? Or is there a better way of
> > > >>> doing
> > > >>> this (it seems a shame not to provide the actual species name if one
> > > >>> has
> > > >>> the taxid...)
> > > >>>
> > > >>> best
> > > >>>
> > > >>> Peter
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
> > > >>>> Great to hear that someone is giving this a shot. Yes at this point
> > > >>>> is
> > > >>>> appears that NCBI is only offering the ASN.1, not a conversion to
> > > >>>> XML.
> > > >>>> Their asn2xml tool will not work with this ASN.1 format either, just
> > > >>>> checked it to be sure. They do seem to be mulling the option of XML
> > > >>>> though on the Gene FAQ. Maybe if enough people get in their ears
> > > >>>> they
> > > >>>> will spend some effort towards that. After all, the entrez gene web
> > > >>>> interface can display XML on demand - even though it looks fairly
> > > >>>> hideous.
> > > >>>>
> > > >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in
> > > >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 two
> > > >>>> years ago that I could find ... doesn't make me feel warm and fuzzy.
> > > >>>>
> > > >>>> In the absence of any XML available from NCBI, gene_info might be
> > > >>>> the
> > > >>>> best start. An option could be to check for the presence of the
> > > >>>> other
> > > >>>> tab-delimited files and use those that are present. These are
> > > >>>> tab-delimited and hence the format itself is trivial so you can
> > > >>>> focus
> > > >>>> entirely on setting up a Bio::Seq plus annotation that's
> > > >>>> comparable/compatible to what the current SeqIO::locuslink does.
> > > >>>>
> > > >>>> My $0.02 (worth less and less almost every day).
> > > >>>>
> > > >>>> 	-hilmar
> > > >>>>
> > > >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> I have been thinking about given a BioPerl EntrezGene parser a try
> > > >>>>> since
> > > >>>>> I have been a heavy user of locus link to date. One issue is that
> > > >>>>> the
> > > >>>>> files that correspond to LL_tmpl (which was a flat file) are now in
> > > >>>>> asn
> > > >>>>> format
> > > >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
> > > >>>>> genehelp.html#query
> > > >>>>> Although I saw some mention of ASN support in Bioperl by googling,
> > > >>>>> I
> > > >>>>> can't seem to find any module that does this in the present
> > > >>>>> distribution. What is the status on that? In any case, I will be
> > > >>>>> working
> > > >>>>> on this in the next month or two and if anything nice comes of it I
> > > >>>>> will
> > > >>>>> send it to you / BioPerpl.
> > > >>>>>
> > > >>>>> best wishes & happy holidays
> > > >>>>>
> > > >>>>> Peter
> > > >>>>>
> > > >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
> > > >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
> > > >>>>>> parsing
> > > >>>>>> any input file, what you're asking is whether or not there is a
> > > >>>>>> SeqIO
> > > >>>>>> parser for NCBI Gene.
> > > >>>>>>
> > > >>>>>> The answer to that question is no, not yet. Anybody who feels
> > > >>>>>> motivated
> > > >>>>>> is welcome to give it a try ... Since I'll need it, I'll write the
> > > >>>>>> parser if nobody else does within the next 3 months, but I'm not
> > > >>>>>> going
> > > >>>>>> to promise when exactly this will happen.
> > > >>>>>>
> > > >>>>>> 	-hilmar
> > > >>>>>>
> > > >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
> > > >>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> I was wondering with regards to bioperl-db the scripts and schema
> > > >>>>>>> and
> > > >>>>>>> load_seqdatabase.pl has there been preparation for integration of
> > > >>>>>>> Entrez
> > > >>>>>>> gene information when locuslink is phased out?  Or if it has
> > > >>>>>>> already
> > > >>>>>>> been
> > > >>>>>>> changed could somebody point
> > > >>>>>>> me to the documentation or changed code?
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Annie.
> > > >>>>>>> _______________________________________________
> > > >>>>>>> Bioperl-l mailing list
> > > >>>>>>> Bioperl-l@portal.open-bio.org
> > > >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>> --
> > > >>>>> Peter N. Robinson
> > > >>>>> peter.robinson@t-online.de
> > > >>>>> peter.robinson@charite.de
> > > >>>>> http://www.charite.de/ch/medgen/robinson/
> > > >>>>>
> > > >>>>>
> > > >>> --
> > > >>> Peter N. Robinson
> > > >>> peter.robinson@t-online.de
> > > >>> peter.robinson@charite.de
> > > >>> http://www.charite.de/ch/medgen/robinson/
> > > >>>
> > > >>> _______________________________________________
> > > >>> Bioperl-l mailing list
> > > >>> Bioperl-l@portal.open-bio.org
> > > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>
> > > >>>
> > > >> --
> > > >> Jason Stajich
> > > >> jason.stajich at duke.edu
> > > >> http://www.duke.edu/~jes12/
> > > > --
> > > > Peter N. Robinson
> > > > peter.robinson@t-online.de
> > > > peter.robinson@charite.de
> > > > http://www.charite.de/ch/medgen/robinson/
> > > >
> > > >
> > > --
> > > Jason Stajich
> > > jason.stajich at duke.edu
> > > http://www.duke.edu/~jes12/
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >

-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: accession2mgi.pl
Type: application/x-perl
Size: 2507 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/accession2mgi-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gene2accession.pm
Type: application/x-perl
Size: 8148 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/gene2accession-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gene2accession_test.pl
Type: application/x-perl
Size: 5968 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/gene2accession_test-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geneinfo.pm
Type: application/x-perl
Size: 10515 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/geneinfo-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: geneinfotest.pl
Type: application/x-perl
Size: 11225 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/2c5de30b/geneinfotest-0001.bin
From sdavis2 at mail.nih.gov  Mon Jan 17 09:09:37 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon Jan 17 09:08:03 2005
Subject: [Bioperl-l] Entrez Gene and bioperl-db
References: <2ED9C47A-5898-11D9-AC01-000A959EB4C4@gmx.net><1104792001.3186.17.camel@localhost.localdomain><0F5A3AE4-5DDA-11D9-AA3C-000393C44276@duke.edu><1104871954.3102.24.camel@localhost.localdomain><1DA5FD5C-5E94-11D9-9C0C-000393C44276@duke.edu><1105044266.3084.27.camel@localhost.localdomain><Pine.OSX.4.58.0501101732020.407@skerryvore.dhcp.lbl.gov>
	<1105959962.21808.15.camel@localhost.localdomain>
Message-ID: <009101c4fc9e$2dcb7f80$7d75f345@WATSON>

Peter,

Thanks for doing all this!

Just a bit more on an update.  I checked with some folks in our (NHGRI) 
bioinformatics core.  It sounds like the closest thing to XML that NCBI 
might offer would be an ASN.1 to XML converter and NOT the xml files, as 
Peter already stated.  They have one (like for public consumption) that 
works for each ASN.1 file except for the gene files.  There is no definite 
date for completion as far as I know.  They have also mentioned a bulk ASN.1 
to XML web-based tool, but I agree with Peter that this will have 
significant limitations for "online" use for large datasets like 
human/mouse/rat (but might work well with a user agent).

Sean

----- Original Message ----- 
From: "Peter Robinson" <Peter.Robinson@t-online.de>
To: "Bioperl list" <bioperl-l@portal.open-bio.org>
Cc: "Peter Robinson" <Peter.Robinson@t-online.de>
Sent: Monday, January 17, 2005 6:06 AM
Subject: Re: [Bioperl-l] Entrez Gene and bioperl-db


> Hi list,
>
> here's an update on Entrez Gene.
> 1) NCBI apparently does not have plans to offer the files in XML format
> for FTP download. It is possible to download the files in XML format
> from the website, even including the files for the entire species with
> corresponding queries (although I havent tried this yet). It seems this
> might be too complicated for many users and there could be issues of
> stability for browsers downloading files of that size.
>
>
> 2) I have completed two reasonably simple modules for parsing gene_info
> and gene2accession using the SeqIO interface. These are attached
> together with simple demo programs. These modules can be used to do some
> useful things. For instance, we often want to generate a list of
> correspondences between NCBI accession numbers and MGI accession numbers
> so as to be able to use MGI's Gene Ontology annotations for the mouse.I
> have included a script (accession2mgi.pl) that uses the above modules to
> parse gene_info and gene2accession to do this (you need to use both
> files)
>
> 3) In the meantime I have also gotten a lex/yacc parser in C to parse
> the species-specific Gene files (which is by far the most interesting
> file in the Entrez gene system). In principle this approach could be
> done in Perl -- straightforward but a lot of detail work. I will be
> needing this kind of thing for my work, so I will continue to work on
> this, and once it is bug-free in C I will think about ways of porting it
> to Bioperl (this might take a while). As I mentioned before on this
> list, if anybody else can do this more quickly please go ahead (but drop
> me a line); on the other hand, collaborators who like the idea of
> writing a grammer in the style of lex/yacc or ANTLR are also welcome.
>
> --peter
>
>
> On Tue, 2005-01-11 at 02:33, Chris Mungall wrote:
>> Hi Peter
>>
>> Have you tried asking NCBI to make XML available as well as ASN? In
>> general they seem keen to offer both for most of their datasets. If not, 
>> I
>> believe the NCBI toolkit has an ASN->XML converter.
>>
>> Cheers
>> Chris
>>
>> On Thu, 6 Jan 2005, Peter Robinson wrote:
>>
>> > Dear Bioperlers,
>> >
>> > I have started looking at writing some modules to parse the new Entrez
>> > gene, which is kind of an expanded LocusLink. The really interesting
>> > files are species specific and are in the ASN.1 format, and I am still
>> > experimenting around with the best way of parsing them. To get started,
>> > I am looking at the tab-delimited flat files. It seems to me that it
>> > would be interesting to be able to parse gene_info and gene2accession
>> > using the Bio::SeqIO system, the other files such as gene2unigene seem
>> > less suited for this (the latter has just two entries which could be
>> > parsed ad hoc easily enough).
>> >
>> > In any case, I am sending a proposed module Bio::SeqIO::geneinfo.pm as
>> > well as a test script (which contains a small excerpt of gene_info in
>> > the data section) for comments and criticism to the list. I am 
>> > presently
>> > working on another module for Bio::SeqIO::gene2accession and plan to
>> > write a demo script using both modules to convert NCBI accession 
>> > numbers
>> > to MGI accession numbers (which is something one might want to do in
>> > order to use Gene Ontology for affymetrix data, although one needs
>> > additional work for probesets which are only related to ESTs).
>> >
>> > For the moment it seemed better to just parse in the NCBI taxon id into
>> > the Bio::Species object (only this info is supplied by gene_info), and
>> > expect users who need the information to use the taxonomy support of
>> > other Bioperl modules in their scripts.
>> >
>> > I will continue to work on parsing the species specific ASN.1 files, 
>> > but
>> > I will be trying a combination of lex/yacc/C to do this. If that works 
>> > I
>> > will look into trying perl support for lex/yacc for potential use in
>> > Bioperl, but since I am not sure how long this will take me, I do not
>> > want to scare off anyone else who would like to give this a shot.
>> >
>> > best,
>> > peter
>> >
>> >
>> > On Tue, 2005-01-04 at 22:03, Jason Stajich wrote:
>> > > On Jan 4, 2005, at 3:52 PM, Peter Robinson wrote:
>> > >
>> > > > Hi Jason,
>> > > >
>> > > > thanks for the advice. It seems as if the documentation of
>> > > > Bio::DB::Taxonomy is a bit out of sync.
>> > > >  my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>> > > >                                  -nodesfile => $nodesfile,
>> > > >                                  -namesfile => $namefile);
>> > > > What does 'flatfile' refer to here? It is not apparent upon looking 
>> > > > at
>> > > > the code for new.
>> > > >
>> > > See Bio::DB::Taxonomy::flatfile for more information.  As I mentioned
>> > > in the mail I sent, flatfile is for downloading the taxonomy DB from
>> > > NCBI.  This lets you run it locally using an indexed  (BerkelyDB via
>> > > DB_File) version of the file.
>> > >
>> > > You must need the most up-to-date verion of the modules - works fine
>> > > for me for both the entrez and flatfile code, but you may have to
>> > > upgrade off of the 1.4.0 release. Code from CVS or the bioperl-1.5 
>> > > RC1
>> > > code should work fine.
>> > >
>> > >
>> > >
>> > > > I had somewhat better luck using the entrez version, but I got a
>> > > > pretty amusing error
>> > > > message:
>> > > >
>> > > > MSG: can't create a species object for Homo sapiens (human) because 
>> > > > it
>> > > > isn't a species but is a '' instead
>> > > >
>> > > > ###
>> > > > Full error and a dump of the script follow:
>> > > >
>> > > > my $db = new Bio::DB::Taxonomy(-source => 'entrez'); #
>> > > > my $taxaid = $db->get_taxonid('Homo sapiens');
>> > > > my $species = $db->get_Taxonomy_Node(-taxonid => '9606');
>> > > > print Dumper($species);
>> > > >
>> > > > ###
>> > > >
>> > > > Use of uninitialized value in string eq at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>> > > > Use of uninitialized value in sprintf at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>> > > >
>> > > > -------------------- WARNING ---------------------
>> > > > MSG: can't create a species object for Homo sapiens (human) because 
>> > > > it
>> > > > isn't a species but is a '' instead
>> > > > ---------------------------------------------------
>> > > > Use of uninitialized value in string eq at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 192.
>> > > > Use of uninitialized value in sprintf at
>> > > > /usr/local/share/perl/5.8.4/Bio/DB/Taxonomy/entrez.pm line 201.
>> > > >
>> > > > -------------------- WARNING ---------------------
>> > > > MSG: can't create a species object for Homo sapiens (human) because 
>> > > > it
>> > > > isn't a species but is a '' instead
>> > > > ---------------------------------------------------
>> > > > $VAR1 = {
>> > > >           'TaxId' => '9606',
>> > > >           'Division' => 'mammals',
>> > > >           'GeneNumber' => '32775',
>> > > >           'Rank' => 'species',
>> > > >           'ProtNumber' => '247791',
>> > > >           'ScientificName' => 'Homo sapiens',
>> > > >           'CommonName' => 'human',
>> > > >           'NucNumber' => '9025800',
>> > > >           'GenNumber' => '25',
>> > > >           'StructNumber' => '5638'
>> > > >         };
>> > > > peter@anna:~/programs/bioperlTest$
>> > > >
>> > > >
>> > > > --best, peter
>> > > >
>> > > > On Mon, 2005-01-03 at 23:51, Jason Stajich wrote:
>> > > >> Bio::DB::Taxonomy is the factory code - it is pretty easy to get a
>> > > >> species object (or equivalent) using this code.  But you cannot 
>> > > >> (or
>> > > >> could not when I wrote this, not sure of the current status) get 
>> > > >> the
>> > > >> full classification from the NCBI taxonomy retrieval via cgi. 
>> > > >> i.e.
>> > > >> you
>> > > >> can only get genus and species for a taxon id and I don't know how 
>> > > >> to
>> > > >> walk up the hierarchy using the web API.  Earlier emails to NCBI
>> > > >> seemed
>> > > >> to indicate this is all they intended to provide, but not sure 
>> > > >> what
>> > > >> the
>> > > >> current status is.
>> > > >>
>> > > >>   my $db = new Bio::DB::Taxonomy(-source => 'entrez'); # use NCBI
>> > > >> Entrez
>> > > >> over HTTP
>> > > >>    my $taxaid = $db->get_taxonid('Homo sapiens');
>> > > >>    my $taxonnode = $db->get_Taxonomy_Node(-taxonid => '9606');
>> > > >>
>> > > >> You can get the full classification if you use the
>> > > >> Bio::DB::Taxonomy::flatfile factory which requires you to have
>> > > >> downloaded the taxonomy db flatfile from NCBI.  Since this is more
>> > > >> reliable (and faster) it is what I have tended to use for grouping
>> > > >> sets
>> > > >> of seqDB search results, etc.
>> > > >>
>> > > >> -jason
>> > > >> On Jan 3, 2005, at 5:40 PM, Peter Robinson wrote:
>> > > >>
>> > > >>> Hi Bioperlers, hi Hilmar,
>> > > >>>
>> > > >>> after some thinking I have embarked on a lex/yacc parser for the
>> > > >>> Entrez
>> > > >>> Gene ASN.1 format as the way of least resistance, although I am 
>> > > >>> not
>> > > >>> sure
>> > > >>> how that would fit in to BioPerl. If anyone is interested in this 
>> > > >>> (or
>> > > >>> has a better idea of how to go about it..), please drop me a 
>> > > >>> line.
>> > > >>>
>> > > >>> In the meantime I have been looking at writing code to parse some 
>> > > >>> of
>> > > >>> the
>> > > >>> "easy" Entrez gene documents, starting off with gene_info. This 
>> > > >>> file
>> > > >>> includes the NCBI taxon id for each entry. I would like to 
>> > > >>> convert
>> > > >>> this
>> > > >>> to a Bio::Species object to pass to the following
>> > > >>> my $seq = $self->sequence_factory->create(
>> > > >>>      -verbose => $self->verbose(),
>> > > >>>      -accession_number => $geneID,
>> > > >>>      -desc => $description,
>> > > >>>      -display_id => $symbol,
>> > > >>>      -species =>  ???
>> > > >>>      -annotation => $ann);
>> > > >>>
>> > > >>> and saw the Bio::Taxonomy::FactoryI code, which appears to want 
>> > > >>> to do
>> > > >>> this sort of thing. However, the code for that is pretty 
>> > > >>> preliminary.
>> > > >>> Is
>> > > >>> anyone working on this at the moment? Or is there a better way of
>> > > >>> doing
>> > > >>> this (it seems a shame not to provide the actual species name if 
>> > > >>> one
>> > > >>> has
>> > > >>> the taxid...)
>> > > >>>
>> > > >>> best
>> > > >>>
>> > > >>> Peter
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
>> > > >>>> Great to hear that someone is giving this a shot. Yes at this 
>> > > >>>> point
>> > > >>>> is
>> > > >>>> appears that NCBI is only offering the ASN.1, not a conversion 
>> > > >>>> to
>> > > >>>> XML.
>> > > >>>> Their asn2xml tool will not work with this ASN.1 format either, 
>> > > >>>> just
>> > > >>>> checked it to be sure. They do seem to be mulling the option of 
>> > > >>>> XML
>> > > >>>> though on the Gene FAQ. Maybe if enough people get in their ears
>> > > >>>> they
>> > > >>>> will spend some effort towards that. After all, the entrez gene 
>> > > >>>> web
>> > > >>>> interface can display XML on demand - even though it looks 
>> > > >>>> fairly
>> > > >>>> hideous.
>> > > >>>>
>> > > >>>> There is no ASN.1 support in bioperl at all. Also, ASN.1 support 
>> > > >>>> in
>> > > >>>> perl is actually thin - there is Convert::ASN1 at version 0.18 
>> > > >>>> two
>> > > >>>> years ago that I could find ... doesn't make me feel warm and 
>> > > >>>> fuzzy.
>> > > >>>>
>> > > >>>> In the absence of any XML available from NCBI, gene_info might 
>> > > >>>> be
>> > > >>>> the
>> > > >>>> best start. An option could be to check for the presence of the
>> > > >>>> other
>> > > >>>> tab-delimited files and use those that are present. These are
>> > > >>>> tab-delimited and hence the format itself is trivial so you can
>> > > >>>> focus
>> > > >>>> entirely on setting up a Bio::Seq plus annotation that's
>> > > >>>> comparable/compatible to what the current SeqIO::locuslink does.
>> > > >>>>
>> > > >>>> My $0.02 (worth less and less almost every day).
>> > > >>>>
>> > > >>>> -hilmar
>> > > >>>>
>> > > >>>> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson 
>> > > >>>> wrote:
>> > > >>>>
>> > > >>>>> Hi,
>> > > >>>>>
>> > > >>>>> I have been thinking about given a BioPerl EntrezGene parser a 
>> > > >>>>> try
>> > > >>>>> since
>> > > >>>>> I have been a heavy user of locus link to date. One issue is 
>> > > >>>>> that
>> > > >>>>> the
>> > > >>>>> files that correspond to LL_tmpl (which was a flat file) are 
>> > > >>>>> now in
>> > > >>>>> asn
>> > > >>>>> format
>> > > >>>>> http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
>> > > >>>>> genehelp.html#query
>> > > >>>>> Although I saw some mention of ASN support in Bioperl by 
>> > > >>>>> googling,
>> > > >>>>> I
>> > > >>>>> can't seem to find any module that does this in the present
>> > > >>>>> distribution. What is the status on that? In any case, I will 
>> > > >>>>> be
>> > > >>>>> working
>> > > >>>>> on this in the next month or two and if anything nice comes of 
>> > > >>>>> it I
>> > > >>>>> will
>> > > >>>>> send it to you / BioPerpl.
>> > > >>>>>
>> > > >>>>> best wishes & happy holidays
>> > > >>>>>
>> > > >>>>> Peter
>> > > >>>>>
>> > > >>>>> On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
>> > > >>>>>> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for
>> > > >>>>>> parsing
>> > > >>>>>> any input file, what you're asking is whether or not there is 
>> > > >>>>>> a
>> > > >>>>>> SeqIO
>> > > >>>>>> parser for NCBI Gene.
>> > > >>>>>>
>> > > >>>>>> The answer to that question is no, not yet. Anybody who feels
>> > > >>>>>> motivated
>> > > >>>>>> is welcome to give it a try ... Since I'll need it, I'll write 
>> > > >>>>>> the
>> > > >>>>>> parser if nobody else does within the next 3 months, but I'm 
>> > > >>>>>> not
>> > > >>>>>> going
>> > > >>>>>> to promise when exactly this will happen.
>> > > >>>>>>
>> > > >>>>>> -hilmar
>> > > >>>>>>
>> > > >>>>>> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
>> > > >>>>>>
>> > > >>>>>>> Hi,
>> > > >>>>>>>
>> > > >>>>>>> I was wondering with regards to bioperl-db the scripts and 
>> > > >>>>>>> schema
>> > > >>>>>>> and
>> > > >>>>>>> load_seqdatabase.pl has there been preparation for 
>> > > >>>>>>> integration of
>> > > >>>>>>> Entrez
>> > > >>>>>>> gene information when locuslink is phased out?  Or if it has
>> > > >>>>>>> already
>> > > >>>>>>> been
>> > > >>>>>>> changed could somebody point
>> > > >>>>>>> me to the documentation or changed code?
>> > > >>>>>>>
>> > > >>>>>>> Thanks,
>> > > >>>>>>> Annie.
>> > > >>>>>>> _______________________________________________
>> > > >>>>>>> Bioperl-l mailing list
>> > > >>>>>>> Bioperl-l@portal.open-bio.org
>> > > >>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>> --
>> > > >>>>> Peter N. Robinson
>> > > >>>>> peter.robinson@t-online.de
>> > > >>>>> peter.robinson@charite.de
>> > > >>>>> http://www.charite.de/ch/medgen/robinson/
>> > > >>>>>
>> > > >>>>>
>> > > >>> --
>> > > >>> Peter N. Robinson
>> > > >>> peter.robinson@t-online.de
>> > > >>> peter.robinson@charite.de
>> > > >>> http://www.charite.de/ch/medgen/robinson/
>> > > >>>
>> > > >>> _______________________________________________
>> > > >>> Bioperl-l mailing list
>> > > >>> Bioperl-l@portal.open-bio.org
>> > > >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> > > >>>
>> > > >>>
>> > > >> --
>> > > >> Jason Stajich
>> > > >> jason.stajich at duke.edu
>> > > >> http://www.duke.edu/~jes12/
>> > > > --
>> > > > Peter N. Robinson
>> > > > peter.robinson@t-online.de
>> > > > peter.robinson@charite.de
>> > > > http://www.charite.de/ch/medgen/robinson/
>> > > >
>> > > >
>> > > --
>> > > Jason Stajich
>> > > jason.stajich at duke.edu
>> > > http://www.duke.edu/~jes12/
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l@portal.open-bio.org
>> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> >
>
> -- 
> Peter N. Robinson
> peter.robinson@t-online.de
> peter.robinson@charite.de
> http://www.charite.de/ch/medgen/robinson/
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l 


From gyang at plantbio.uga.edu  Mon Jan 17 11:17:31 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon Jan 17 11:15:41 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <4f10f19405011606531737d90@mail.gmail.com>
Message-ID: <20050117111731.58739c14@dogwood.plantbio.uga.edu>

Thanks for everybody's comments, the only thing I am interested in is a regular expression to recognize the pattern (it should not be confined to certain sequences as have suggested by some). For example: in tttaatatcaaAGCATgggaaaggatat....atatcctttcccGCATacatataccata, the regex should recognize AGCATgggaaaggatat....atatcctttcccGCAT. The problem is not the direct repeat AGCAT, but how to match the atatcctttccc with the gggaaaggatat. I guess there must be a way to do it. I tried the following and obtained weird results:
/.*(\S+)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S).*(??{convert(\11);})(??{convert(\10);})(??{convert(\9);})(??{convert(\8);})(??{convert(\7);})(??{convert(\6);})(??{convert(\5);})(??{convert(\4);})(??{convert(\3);})(??{convert(\2);})\1.*/i
...

sub convert{
my $return=$_[0];
$return =~ tr/ATCG/TAGC/;
$return =reverse($return);
return $return;
}

Can anybody give me a hint on the -e switch when using perl script inside a regex?

Yang


----- Original Message -----
From: Willy West <corenth@gmail.com>
To: Jan.Aerts@wur.nl, bioperl-l@portal.open-bio.org
Sent: Sun, 16 Jan 2005 09:53:55 -0500
Subject: Re: [Bioperl-l] regular expression help!


> oops- i'd forgotten to "reply to all" with this... i apologize.
> 
> 
> On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan <Jan.Aerts@wur.nl> wrote:
> > The problem is (or I might miss something here), that he wants to _test_ a
> regex. It's not possible to write something like
> > $_ =~ /(.*)(.*)foo(\2)(.*)/e
> > I think...
> > 
> > jan.
> 
> now i'm trying to do this with the test regex and am not successful :(
>   this is an interesting problem and i really would love to find a
> way..
> 
> one solution would be to explode the whole thing in another
> subroutine... but if it's
> not  what you want, i'm not yet sure how to do it.
> 
> good challenge though.....
> 
> :)
> 
> > 
> > 
> > -----Original Message-----
> > From:   Willy West [mailto:corenth@gmail.com]
> > Sent:   Sun 16-Jan-05 00:09
> > To:     Aerts, Jan
> > Cc:
> > Subject:        Re: [Bioperl-l] regular expression help!
> > On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan <Jan.Aerts@wur.nl> wrote:
> > > You're right... Should have looked at the actual expression.
> > > Idea: is it possible in this case to call subroutines from within a regex
> and evaluating them using the 'e' switch?
> > 
> > if i recall::
> > 
> > sub foo {
> >            return 'hello genome';
> > }
> > 
> > $data = "ih ho hum bababa";
> > 
> > $data =~ s/ih/foo/e; #one way to do it.
> > 
> > print "$data\n";
> > 
> > seems to work..
> 
> 
> -- 
> Willy
> http://www.hackswell.com/corenth
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From danielucgbioinfo at yahoo.com.br  Mon Jan 17 11:19:31 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Mon Jan 17 11:16:23 2005
Subject: [Bioperl-l] method "link_pattern" and Bio::Graphics::Panel
Message-ID: <20050117161932.15787.qmail@web53502.mail.yahoo.com>

Hi,

I have has some difficult
whith Bio::Graphics:Panel class. I want only show on
browser a little sequence and to be possible
clickable, for a link http. Please, look my little
code and tell me what is wrong. I have used Bioperl
1.5 RC 2 .

The out messanger is :
Can't locate object method "link_pattern" via package
"Bio::Graphics::FeatureFile" at
/usr/lib/perl5/site_perl/5.8.3/Bio/Graphics/Panel.pm
line 981, <DATA> line 191,.

My little code :
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
use CGI  qw / :standard /;
use CGI::Pretty;

my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>600);

my $q = new CGI;

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

my $panel = Bio::Graphics::Panel->new(-length=>600,
-width=>1000, -pad_left=> 10, -pad_right=>10, 
-key_style =>'none',
-spacing=>-0.25,-box_subparts=>'true');

$panel->add_track($wholeseq,-glyph=>'arrow',-bump=>
+1,  -double => 1,-tick=>2,-title=>'test 1',-link =>
'www.perl.org' );
 
$panel->add_track($wholeseq,-glyph=>'transcript2',
-bgcolor =>'orange', -bump=> 0,-height
=>12,-title=>'test 2', -link
=>'http://www.google.com.br', );
      
my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/home/bioinfo/cgi-bin',-url => '/tmpimages', );

print $q->img({-src=>$url,-usemap=>"#$mapname"});
print $q->$map;
print $q->($panel->png);

print $q->exit_html;

exit;

Thank you very much,
Daniel Xavier 


_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From cldwalker at chwhat.com  Mon Jan 17 12:49:43 2005
From: cldwalker at chwhat.com (Gabriel Horner)
Date: Mon Jan 17 12:43:08 2005
Subject: [Bioperl-l] Announcing bioperl shell, Fry::Lib::BioPerl
Message-ID: <20050117174943.GA29769@bigmama.chwhat.com>

Hi All,
  I'm announcing that I put up a module, Fry::Lib::BioPerl,
for my shell framework, Fry::Shell, a few days ago. The result
is a set of a commands for viewing and obtaining sequences and alignments.
It's definitely a step up from the shell in examples/bioperl.pl.
See http://search.cpan.org/perldoc?Fry::Shell for details.
It is fairly easy to write new libraries to use with Fry::Shell. 
Since the shell framework has no dependencies, a Fry::Shell bundle along with a script that loads
only Fry::Lib::BioPerl could be included in the examples directory if desired.

Gabriel
-- 
my looovely website -- http://www.chwhat.com
BTW, IF chwhat.com goes down email me at gabriel.horner@cern.ch
From Peter.Robinson at t-online.de  Mon Jan 17 14:21:44 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Mon Jan 17 14:17:35 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <20050117111731.58739c14@dogwood.plantbio.uga.edu>
References: <20050117111731.58739c14@dogwood.plantbio.uga.edu>
Message-ID: <1105989704.8090.11.camel@localhost.localdomain>

Just a suggestion, but I don't think regular expressions are the best
way to do this. You might want to take a look at some of the programs
at www.emboss.org, which can find repeats, inverted repeats /
palindromes in DNA sequences. The EMBOSS programs are open-source, easy
to use and quite useful, although the EMBOSS group is unfortunately now
having difficulties with funding.

-peter

On Mon, 2005-01-17 at 17:17, Guojun Yang wrote:
> Thanks for everybody's comments, the only thing I am interested in is a regular expression to recognize the pattern (it should not be confined to certain sequences as have suggested by some). For example: in tttaatatcaaAGCATgggaaaggatat....atatcctttcccGCATacatataccata, the regex should recognize AGCATgggaaaggatat....atatcctttcccGCAT. The problem is not the direct repeat AGCAT, but how to match the atatcctttccc with the gggaaaggatat. I guess there must be a way to do it. I tried the following and obtained weird results:
> /.*(\S+)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S)(\S).*(??{convert(\11);})(??{convert(\10);})(??{convert(\9);})(??{convert(\8);})(??{convert(\7);})(??{convert(\6);})(??{convert(\5);})(??{convert(\4);})(??{convert(\3);})(??{convert(\2);})\1.*/i
> ...
> 
> sub convert{
> my $return=$_[0];
> $return =~ tr/ATCG/TAGC/;
> $return =reverse($return);
> return $return;
> }
> 
> Can anybody give me a hint on the -e switch when using perl script inside a regex?
> 
> Yang
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: Willy West <corenth@gmail.com>
> To: Jan.Aerts@wur.nl, bioperl-l@portal.open-bio.org
> Sent: Sun, 16 Jan 2005 09:53:55 -0500
> Subject: Re: [Bioperl-l] regular expression help!
> 
> 
> > oops- i'd forgotten to "reply to all" with this... i apologize.
> > 
> > 
> > On Sun, 16 Jan 2005 11:13:45 +0100, Aerts, Jan <Jan.Aerts@wur.nl> wrote:
> > > The problem is (or I might miss something here), that he wants to _test_ a
> > regex. It's not possible to write something like
> > > $_ =~ /(.*)(.*)foo(\2)(.*)/e
> > > I think...
> > > 
> > > jan.
> > 
> > now i'm trying to do this with the test regex and am not successful :(
> >   this is an interesting problem and i really would love to find a
> > way..
> > 
> > one solution would be to explode the whole thing in another
> > subroutine... but if it's
> > not  what you want, i'm not yet sure how to do it.
> > 
> > good challenge though.....
> > 
> > :)
> > 
> > > 
> > > 
> > > -----Original Message-----
> > > From:   Willy West [mailto:corenth@gmail.com]
> > > Sent:   Sun 16-Jan-05 00:09
> > > To:     Aerts, Jan
> > > Cc:
> > > Subject:        Re: [Bioperl-l] regular expression help!
> > > On Sat, 15 Jan 2005 15:17:28 +0100, Aerts, Jan <Jan.Aerts@wur.nl> wrote:
> > > > You're right... Should have looked at the actual expression.
> > > > Idea: is it possible in this case to call subroutines from within a regex
> > and evaluating them using the 'e' switch?
> > > 
> > > if i recall::
> > > 
> > > sub foo {
> > >            return 'hello genome';
> > > }
> > > 
> > > $data = "ih ho hum bababa";
> > > 
> > > $data =~ s/ih/foo/e; #one way to do it.
> > > 
> > > print "$data\n";
> > > 
> > > seems to work..
> > 
> > 
> > -- 
> > Willy
> > http://www.hackswell.com/corenth
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From babenko at ncbi.nlm.nih.gov  Mon Jan 17 14:23:12 2005
From: babenko at ncbi.nlm.nih.gov (Babenko, Vladimir (NIH/NLM/NCBI))
Date: Mon Jan 17 14:19:24 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
Message-ID: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>

    Greetings,
While parsing a genbank file taken from:
ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
0.dat as of Jan 2005,
I'm getting the following unflattening error:
--------------------------------------------------------
Processing file /ENSEMBL/Homo_sapiens.0.dat...
working on contig
chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
error:
Details: 
------------- EXCEPTION  -------------
MSG: PROBLEM, SEVERITY==2
no containers possible for SeqFeature of type: CDS; this SF is being placed
at root level
SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556

STACK Bio::SeqFeature::Tools::Unflattener::problem
/Bio/SeqFeature/Tools/Unflattener.pm:940
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
/Bio/SeqFeature/Tools/Unflattener.pm:1983
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
/Bio/SeqFeature/Tools/Unflattener.pm:1744
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/Bio/SeqFeature/Tools/Unflattener.pm:1449
STACK (eval) genbank2gff3.PLS:345
STACK main::unflatten_seq genbank2gff3.PLS:344
STACK toplevel genbank2gff3.PLS:209

--------------------------------------

Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
STDERR

Using bioperl-1.5.0.RC2 under Linux.

    Would be grateful for the hint,
      Vladimir
From cjm at fruitfly.org  Mon Jan 17 14:51:37 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon Jan 17 14:47:48 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>
References: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>
Message-ID: <Pine.OSX.4.58.0501171131340.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


Hi Vladimir

The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
information often which the genbank flat file format loses; this is the
information about which mRNA relates to which CDS. You may or may not need
this information, it depends why you are doing the conversion. If you
don't need this, you may want just a straightforward genbank->gff
conversion. Let me know if this is what you want to do and I can help with
that.

If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
isn't always possible to recover these with 100% fidelity from the genbank
flat files. You may wish to pursue alternate approaches, such as
downloading ensembl as a mysql dump (any ensembl folks around.. any plans
to offer downloads in alternate formats such as gff3? This would be
fantastic)

If you'd prefer to carry on via the genbank flat file route, here's what
you should do:

* get the latest version of genbank2gff3.PLS I have just checked into cvs
(I can send you a copy if you are using a bioperl release and not cvs)

* run the script with the "--ethresh 3" option. This will raise the error
severity threshold at which problems with genbank file become
showstoppers.

In addition, I will take a look at this particular file and see what it is
that is causing problems and get back to you.

Cheers
Chris

On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:

>     Greetings,
> While parsing a genbank file taken from:
> ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> 0.dat as of Jan 2005,
> I'm getting the following unflattening error:
> --------------------------------------------------------
> Processing file /ENSEMBL/Homo_sapiens.0.dat...
> working on contig
> chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> error:
> Details:
> ------------- EXCEPTION  -------------
> MSG: PROBLEM, SEVERITY==2
> no containers possible for SeqFeature of type: CDS; this SF is being placed
> at root level
> SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
>
> STACK Bio::SeqFeature::Tools::Unflattener::problem
> /Bio/SeqFeature/Tools/Unflattener.pm:940
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> /Bio/SeqFeature/Tools/Unflattener.pm:1983
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> /Bio/SeqFeature/Tools/Unflattener.pm:1744
> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> /Bio/SeqFeature/Tools/Unflattener.pm:1449
> STACK (eval) genbank2gff3.PLS:345
> STACK main::unflatten_seq genbank2gff3.PLS:344
> STACK toplevel genbank2gff3.PLS:209
>
> --------------------------------------
>
> Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> STDERR
>
> Using bioperl-1.5.0.RC2 under Linux.
>
>     Would be grateful for the hint,
>       Vladimir
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From osmany.guirola at cigb.edu.cu  Mon Jan 17 10:58:14 2005
From: osmany.guirola at cigb.edu.cu (Osmany Guirola Cruz)
Date: Mon Jan 17 14:54:15 2005
Subject: [Bioperl-l] buried surface
Message-ID: <1105977494.2482.11.camel@draco.cigb.edu.cu>

Hi 
I am new in the list and i need to know how can i calculate the buried
surface of residues of my pdb file ... i want select some residues of my
pdb with a specific buried surface

Thanks

Osmany
 

From osmany.guirola at cigb.edu.cu  Mon Jan 17 11:02:20 2005
From: osmany.guirola at cigb.edu.cu (Osmany Guirola Cruz)
Date: Mon Jan 17 14:58:20 2005
Subject: [Bioperl-l] buried surface calculation
Message-ID: <1105977740.2482.15.camel@draco.cigb.edu.cu>

Hi 
i am new in the list and i need to know how can i calculate the buried 
surface for each residue... How can i do that? i need select some
residues froma PDB file with a specific value ?

Thanks 
Osmany


From R.J.Minshall at pgr.salford.ac.uk  Mon Jan 17 08:39:45 2005
From: R.J.Minshall at pgr.salford.ac.uk (Robert Minshall)
Date: Mon Jan 17 19:59:57 2005
Subject: [Bioperl-l] Feature table comparison
Message-ID: <1105969185.41ebc021ee0a5@webmail.salford.ac.uk>


Hi does any one know of or have a script that can compare the faeture tables of
genomes and show what appears on one and not the other. ie i want to find the
differenmces on the feature tables. is this possible i'm new to perl and was
hoping that someone could point me in the right direction. my email is
r.j.minshall@pgr.salford.ac.uk
thanks in advance
Rob Minshall

--
Robert J Minshall
Postgraduate Researcher in Microbiology,
Biosciences Research Institute,
School of Environment and Life Sciences,
Lab 209 Cockcroft Building,
University of Salford,
Salford,
Greater Manchester.
M5 4WT
UK
0161 2952652
r.j.mishall@pgr.salford.ac.uk


----------------------------------------------------------------
Concerns about content should be sent to abuse@salford.ac.uk
From cain at cshl.edu  Mon Jan 17 12:04:54 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Jan 17 20:00:00 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <200501161451.j0GEpNKs028052@portal.open-bio.org>
Message-ID: <Pine.GSO.4.05.10501171203280.26102-100000@phage.cshl.edu>

Hi Rob,

Thanks for your work on this--I've put several comments in your
original message below.

Scott

---------Original Message--------
Date: Sat, 15 Jan 2005 15:22:23 -0800
From: Rob Edwards <rob@salmonella.org>
Subject: [Bioperl-l] GFF3
To: Bioperl list <bioperl-l@portal.open-bio.org>

Because I need it for some things that I am doing, I have worked quite 
a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
written this module, I have just made some cosmetic changes:

I have improved the validation processes that are applied as a gff3 
file is parsed, and the module should now validate essentially 
everything in the file except alignments. Validation is optional and is 
based on the specification described at : 
http://song.sourceforge.net/gff3.shtml

SC> Excellent--Did you happen to relax the requirement that ID be unique
SC> for each line of the GFF?  Allen and I put that in due to a misreading
SC> of the spec.  The ID has to be unique for a *feature*, which can be
SC> spread across several lines.

For clarification and edification I have created a couple of tables 
describing the module and the validation that is applied to GFF3 files, 
which you can see online: http://www.salmonella.org/bioperl/gff3.html

SC> Very nice and well done--do you happen to have a pod-ified version 
SC> of this page?  It would be nice to include in the pod for 
SC> Bio::FeatureIO::gff.

I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
sequences, it seems that you'd want to be able to call the next_seq 
methods, and therefore SeqIO is more appropriate than FeatureIO for 
those aspects. Currently the SeqIO module uses the FeatureIO module for 
parsing the file, it just reorganizes things.

This provides two different interfaces for getting objects out of GFF3 
files:
	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
representing the annotations.
	Bio::SeqIO::gff will return Bio::Seq objects representing the 
sequences with all the annotations attached.

The other difference between the two is that the former passes out the 
objects as they are read, but the latter has to read the whole file to 
get the annotations and the sequences.

SC> I thought about doing something similar with SeqIO, but I am worried 
SC> about the case where somebody tries to use SeqIO on a well 
SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
SC> but I suppose the same machine killing thing could be done if
SC> someone tried to use SeqIO on a genbank file of Chr1.

At the moment I focussed on reading GFF3 files.

I have not committed these to cvs yet, pending comments from others. I 
have some specific questions:
	Should I wait until after 1.5 is out?

SC> I don't have the definative answer, but I would say it doesn't
SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
SC> hardly a fully functional module as it is, so if we could 
SC> squeeze a little more functionality into it before we
SC> release it, that would be fine with me.

	Is two separate modules really the right way to go about this?

SC> As long as it works for this case, I don't mind:  calling
SC> 'next_feature' on a FeatureIO object until I run out of features
SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
SC> the same FeatureIO object until I run out of sequences.

	What about other GFF modules (like Bio::Tools::GFF)?

SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
SC> it will have to be kept around for apps that depend on it, I don't
SC> see adding any major functionality as time well spent.

	Could someone give the modules a workout and let me know about bugs? I 
am sure there are many.

SC> I will try to soon, but it won't be until next week at 
SC> the earliest.

I have posted these modules online via anonymous ftp at 
ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
Take a look and let me know what you do and don't like!

Rob


----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


From palmeida at igc.gulbenkian.pt  Mon Jan 17 12:56:07 2005
From: palmeida at igc.gulbenkian.pt (palmeida@igc.gulbenkian.pt)
Date: Mon Jan 17 20:00:02 2005
Subject: [Bioperl-l] regular expression help! (attached script)
Message-ID: <20050117175606.GB5318@bioinf.igc.gulbenkian.pt>


-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.pl
Type: text/x-perl
Size: 235 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050117/0ff6cec2/test.bin
-------------- next part --------------
tttaatatcaaagcatgggaaaggatatatcgatcgatgctacgatcatatcctttcccagcatacatataccata
From smarkel at scitegic.com  Mon Jan 17 21:00:18 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 17 20:56:58 2005
Subject: [Bioperl-l] possible to skip parsing features when calling
	Bio::SeqIO::new?
Message-ID: <41EC6DB2.10205@scitegic.com>

I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
an option or parameter I can set in the

my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
                                     "-format" => "genbank");

call that will cause the parser to skip the features?

I checked BioPerl 1.5RC2 and didn't see any changes there
that would address my question.

Scott

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


From cain at cshl.edu  Mon Jan 17 21:10:19 2005
From: cain at cshl.edu (Scott Cain)
Date: Mon Jan 17 21:06:42 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.OSX.4.58.0501171131340.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <Pine.GSO.4.05.10501172108280.20063-100000@phage.cshl.edu>

Hi Vladimir,

Not to ask a question on the level of "is it plugged in", but are you sure
it is a genbank formatted file?  I think you would get a different error
if it weren't, but I just wanted to make sure.

Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> Hi Vladimir
> 
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help with
> that.
> 
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)
> 
> If you'd prefer to carry on via the genbank flat file route, here's what
> you should do:
> 
> * get the latest version of genbank2gff3.PLS I have just checked into cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
> 
> * run the script with the "--ethresh 3" option. This will raise the error
> severity threshold at which problems with genbank file become
> showstoppers.
> 
> In addition, I will take a look at this particular file and see what it is
> that is causing problems and get back to you.
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> 
> >     Greetings,
> > While parsing a genbank file taken from:
> > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > 0.dat as of Jan 2005,
> > I'm getting the following unflattening error:
> > --------------------------------------------------------
> > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > working on contig
> > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > error:
> > Details:
> > ------------- EXCEPTION  -------------
> > MSG: PROBLEM, SEVERITY==2
> > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > at root level
> > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> >
> > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > STACK (eval) genbank2gff3.PLS:345
> > STACK main::unflatten_seq genbank2gff3.PLS:344
> > STACK toplevel genbank2gff3.PLS:209
> >
> > --------------------------------------
> >
> > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > STDERR
> >
> > Using bioperl-1.5.0.RC2 under Linux.
> >
> >     Would be grateful for the hint,
> >       Vladimir
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 

From brian_osborne at cognia.com  Mon Jan 17 21:20:47 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Mon Jan 17 21:19:47 2005
Subject: [Bioperl-l] possible to skip parsing features when
	callingBio::SeqIO::new?
In-Reply-To: <41EC6DB2.10205@scitegic.com>
Message-ID: <GAEDKMGOKFBLJPKCLKCCMEMPEHAA.brian_osborne@cognia.com>

Scott,

Why would you want to do this? I can imagine one reason, that there's some
problem with a feature causing the script to exit. In that case do something
like:

my $seq;
eval { $seq = $seqIterator->next_seq; };


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Scott Markel
Sent: Monday, January 17, 2005 9:00 PM
To: Bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] possible to skip parsing features when
callingBio::SeqIO::new?


I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
an option or parameter I can set in the

my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
                                     "-format" => "genbank");

call that will cause the parser to skip the features?

I checked BioPerl 1.5RC2 and didn't see any changes there
that would address my question.

Scott

--
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Mon Jan 17 21:21:57 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 17 21:19:54 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.OSX.4.58.0501171131340.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
References: <69BA0F938FAC6A4CBEF49461720696F20796569C@nihexchange16.nih.gov>
	<Pine.OSX.4.58.0501171131340.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <BA028CAF-68F7-11D9-83BC-000393C44276@duke.edu>

I have been using EnsMart to  grab GFF2/GTF or (GFF-like output and  
reformatting it for GFF3) with reasonable success.  You probably want  
just the output columns so you can reformat things to have CDS  
start/end and the Gen, Exon->Transcript->Peptide identifiers all in the  
same report

This is a lot easier than parsing genbank flatfiles and the whole point  
of ensmart.

-jason
On Jan 17, 2005, at 2:51 PM, Chris Mungall wrote:

>
> Hi Vladimir
>
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not  
> need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help  
> with
> that.
>
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the  
> genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any  
> plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)
>
> If you'd prefer to carry on via the genbank flat file route, here's  
> what
> you should do:
>
> * get the latest version of genbank2gff3.PLS I have just checked into  
> cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
>
> * run the script with the "--ethresh 3" option. This will raise the  
> error
> severity threshold at which problems with genbank file become
> showstoppers.
>
> In addition, I will take a look at this particular file and see what  
> it is
> that is causing problems and get back to you.
>
> Cheers
> Chris
>
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
>
>>     Greetings,
>> While parsing a genbank file taken from:
>> ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/ 
>> Homo_sapiens.
>> 0.dat as of Jan 2005,
>> I'm getting the following unflattening error:
>> --------------------------------------------------------
>> Processing file /ENSEMBL/Homo_sapiens.0.dat...
>> working on contig
>> chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1  
>> Unflattening
>> error:
>> Details:
>> ------------- EXCEPTION  -------------
>> MSG: PROBLEM, SEVERITY==2
>> no containers possible for SeqFeature of type: CDS; this SF is being  
>> placed
>> at root level
>> SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
>>
>> STACK Bio::SeqFeature::Tools::Unflattener::problem
>> /Bio/SeqFeature/Tools/Unflattener.pm:940
>> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
>> /Bio/SeqFeature/Tools/Unflattener.pm:1983
>> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
>> /Bio/SeqFeature/Tools/Unflattener.pm:1744
>> STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
>> /Bio/SeqFeature/Tools/Unflattener.pm:1449
>> STACK (eval) genbank2gff3.PLS:345
>> STACK main::unflatten_seq genbank2gff3.PLS:344
>> STACK toplevel genbank2gff3.PLS:209
>>
>> --------------------------------------
>>
>> Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1:  
>> consult
>> STDERR
>>
>> Using bioperl-1.5.0.RC2 under Linux.
>>
>>     Would be grateful for the hint,
>>       Vladimir
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Mon Jan 17 21:32:06 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 17 21:28:41 2005
Subject: [Bioperl-l] possible to skip parsing features when calling
	Bio::SeqIO::new?
In-Reply-To: <41EC6DB2.10205@scitegic.com>
References: <41EC6DB2.10205@scitegic.com>
Message-ID: <2580819E-68F9-11D9-83BC-000393C44276@duke.edu>

See the docs for Bio::Seq::SeqBuilder

I think this will work:
my $seqIterator = Bio::SeqIO->new("-file"   => "$file",
                                     "-format" => "genbank");
$seqIterator->sequence_builder->add_unwanted_slot('features');

If you additionally don't want the Annotations (references,etc)
$seqIterator->sequence_builder->add_unwanted_slot('features', 
'annotation');

[don't ask why one is plural and other singular... =)]

-jason
On Jan 17, 2005, at 9:00 PM, Scott Markel wrote:

> I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
> an option or parameter I can set in the
>
> my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
>                                     "-format" => "genbank");
>
> call that will cause the parser to skip the features?
>
> I checked BioPerl 1.5RC2 and didn't see any changes there
> that would address my question.
>
> Scott
>
> -- 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel@scitegic.com
> SciTegic Inc.                       mobile: +1 858 205 3653
> 9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
> San Diego, CA 92123                 fax:    +1 858 279 8804
> USA                                 web:    http://www.scitegic.com
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From cjm at fruitfly.org  Mon Jan 17 21:33:31 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon Jan 17 21:29:39 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.GSO.4.05.10501172108280.20063-100000@phage.cshl.edu>
References: <Pine.GSO.4.05.10501172108280.20063-100000@phage.cshl.edu>
Message-ID: <Pine.OSX.4.58.0501171811240.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


It is a genbank formatted file - you can download it from the url Vladmir
provides below.

There seem to be a few oddities to do with the ensembl-flavour genbank
format which may be causing problems for the unflattener:

* There doesn't appear to be any 'gene' features - a gene model is just
mRNAs and CDSs. This means the files don't even contain essential stuff
like the gene symbol!

* In the feature entry, for the reverse strand, ensembl nests the
complement function inside the join function, listing sublocations in a
3'->5' direction. This is unusual, but not problemmatic in itself.
However, I'm not 100% convinced that the bioperl genbank parser handles
these cases correctly - I will expand on this in another email. It's not
a problem for the vast majority of cases, but it will be problemmatic for
certain rare situations where the sublocations are of mixed strand (eg
trans-spliced genes).

I can implement a hack in the unflattener for the first problem. However,
the question is - is it worth it? Without the gene feature the
ensembl-flavoured genbank files seem not particularly useful (granted it
is possible to get the gene data by integrating with LocusLink/EntrezGene
but is it worth it?). I know for a fact that the data structures
underlying ensembl are sound, so it seems counterproductive to use nothing
but genbank/embl as a flat file distribution format (and to drop the gene
features on top of that!). I know ensembl use GTF a lot internally, it
would be great to see use made of this format (or even better, GFF3) for
data distribution. Perhaps there's something I'm missing here.. I'll wait
for comment from someone from ensembl before progressing here, to avoid
any pointless work...

Cheers
Chris

On Mon, 17 Jan 2005, Scott Cain wrote:

> Hi Vladimir,
>
> Not to ask a question on the level of "is it plugged in", but are you sure
> it is a genbank formatted file?  I think you would get a different error
> if it weren't, but I just wanted to make sure.
>
> Scott
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Mon, 17 Jan 2005, Chris Mungall wrote:
>
> >
> > Hi Vladimir
> >
> > The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> > information often which the genbank flat file format loses; this is the
> > information about which mRNA relates to which CDS. You may or may not need
> > this information, it depends why you are doing the conversion. If you
> > don't need this, you may want just a straightforward genbank->gff
> > conversion. Let me know if this is what you want to do and I can help with
> > that.
> >
> > If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> > isn't always possible to recover these with 100% fidelity from the genbank
> > flat files. You may wish to pursue alternate approaches, such as
> > downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> > to offer downloads in alternate formats such as gff3? This would be
> > fantastic)
> >
> > If you'd prefer to carry on via the genbank flat file route, here's what
> > you should do:
> >
> > * get the latest version of genbank2gff3.PLS I have just checked into cvs
> > (I can send you a copy if you are using a bioperl release and not cvs)
> >
> > * run the script with the "--ethresh 3" option. This will raise the error
> > severity threshold at which problems with genbank file become
> > showstoppers.
> >
> > In addition, I will take a look at this particular file and see what it is
> > that is causing problems and get back to you.
> >
> > Cheers
> > Chris
> >
> > On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> >
> > >     Greetings,
> > > While parsing a genbank file taken from:
> > > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > > 0.dat as of Jan 2005,
> > > I'm getting the following unflattening error:
> > > --------------------------------------------------------
> > > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > > working on contig
> > > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > > error:
> > > Details:
> > > ------------- EXCEPTION  -------------
> > > MSG: PROBLEM, SEVERITY==2
> > > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > > at root level
> > > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> > >
> > > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > > STACK (eval) genbank2gff3.PLS:345
> > > STACK main::unflatten_seq genbank2gff3.PLS:344
> > > STACK toplevel genbank2gff3.PLS:209
> > >
> > > --------------------------------------
> > >
> > > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > > STDERR
> > >
> > > Using bioperl-1.5.0.RC2 under Linux.
> > >
> > >     Would be grateful for the hint,
> > >       Vladimir
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
>
>
From smarkel at scitegic.com  Mon Jan 17 21:37:10 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 17 21:33:35 2005
Subject: [Bioperl-l] possible to skip parsing features when
	callingBio::SeqIO::new?
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCMEMPEHAA.brian_osborne@cognia.com>
References: <GAEDKMGOKFBLJPKCLKCCMEMPEHAA.brian_osborne@cognia.com>
Message-ID: <41EC7656.5040204@scitegic.com>

Brian,

The use case is when a user has many sequences to read and
is only interested in the sequence data for use in predicting
new features.  The user is likely to come back later and
look at some sequences in detail, so they only want to parse
the GenBank features then.  For the first pass, they would
like the reading sped up by omitting some of the parsing.

Scott

Brian Osborne wrote:

> Scott,
> 
> Why would you want to do this? I can imagine one reason, that there's some
> problem with a feature causing the script to exit. In that case do something
> like:
> 
> my $seq;
> eval { $seq = $seqIterator->next_seq; };
> 
> 
> Brian O.
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Scott Markel
> Sent: Monday, January 17, 2005 9:00 PM
> To: Bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] possible to skip parsing features when
> callingBio::SeqIO::new?
> 
> 
> I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
> an option or parameter I can set in the
> 
> my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
>                                      "-format" => "genbank");
> 
> call that will cause the parser to skip the features?
> 
> I checked BioPerl 1.5RC2 and didn't see any changes there
> that would address my question.
> 
> Scott

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

From neil.saunders at unsw.edu.au  Mon Jan 17 21:40:01 2005
From: neil.saunders at unsw.edu.au (Neil Saunders)
Date: Mon Jan 17 21:35:32 2005
Subject: [Bioperl-l] re:  buried surface calculation
Message-ID: <20050118024001.GA2699@psychro>

hi Osmany,

There are a couple of solutions to your problem.  First, you will need 
to process your PDB file with something that calculates the required 
surface areas.  You may be able to use DSSP for that, which would be 
nice because Bioperl includes a module for parsing DSSP output.  It is 
documented here:

http://doc.bioperl.org/releases/bioperl-1.4/Bio/Structure/SecStr/DSSP/toc.html

The method:
$solv_acc = $dssp_obj->resSolvAcc( RESIDUE_ID );

returns the solvent-accessible area of a residue, so if you knew the 
total surface area, you could calculate what was buried.

Another option might be to use the program naccess:

http://wolf.bms.umist.ac.uk/naccess/

There is no Bioperl module for this output so far as I know (it's 
something I'd like to write one day).  But the output (a .rsa file) is 
quite easy to parse, as it is mostly space-delimited columns.

I wrote a few scripts to process naccess output some years ago.  You 
might get some ideas from them, see 'surface_charge.pl' and 
'parse_nacc_core2.pl' at my CVS server:

http://psychro.bioinformatics.unsw.edu.au/cgi-bin/viewcvs.cgi/GenRes2003/scripts/#dirlist

I knew very little Perl when I wrote these so they are embarassingly 
awful, but they may give you an idea of how .rsa files can be parsed.


Neil
-- 
 School of Biotechnology and Biomolecular Sciences,
 The University of New South Wales,
 Sydney 2052,
 Australia

http://psychro.bioinformatics.unsw.edu.au/neil/index.php
From smarkel at scitegic.com  Mon Jan 17 21:53:05 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Mon Jan 17 21:49:35 2005
Subject: [Bioperl-l] possible to skip parsing features when calling
	Bio::SeqIO::new?
In-Reply-To: <2580819E-68F9-11D9-83BC-000393C44276@duke.edu>
References: <41EC6DB2.10205@scitegic.com>
	<2580819E-68F9-11D9-83BC-000393C44276@duke.edu>
Message-ID: <41EC7A11.6050903@scitegic.com>

Jason,

Excellent!  Thank you.

Scott

Jason Stajich wrote:

> See the docs for Bio::Seq::SeqBuilder
> 
> I think this will work:
> my $seqIterator = Bio::SeqIO->new("-file"   => "$file",
>                                     "-format" => "genbank");
> $seqIterator->sequence_builder->add_unwanted_slot('features');
> 
> If you additionally don't want the Annotations (references,etc)
> $seqIterator->sequence_builder->add_unwanted_slot('features', 
> 'annotation');
> 
> [don't ask why one is plural and other singular... =)]
> 
> -jason
> On Jan 17, 2005, at 9:00 PM, Scott Markel wrote:
> 
>> I'm using BioPerl 1.4 to read a GenBank sequence file.  Is there
>> an option or parameter I can set in the
>>
>> my $::seqIterator = Bio::SeqIO->new("-file"   => "$file",
>>                                     "-format" => "genbank");
>>
>> call that will cause the parser to skip the features?
>>
>> I checked BioPerl 1.5RC2 and didn't see any changes there
>> that would address my question.
>>
>> Scott
>>
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> 

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

From allenday at ucla.edu  Tue Jan 18 00:27:21 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 18 00:23:31 2005
Subject: [Bioperl-l] GFF3
In-Reply-To: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>
References: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>
Message-ID: <Pine.LNX.4.58.0501172123360.19385@sumo.ctrl.ucla.edu>

Hi Rob,

I looked at FeatureIO::gff and merged in your changes with some
modifications.

I also added a next_seq() method to FeatureIO::gff that is activated when
a /^##FASTA/ or /^>/ line is encountered.  Functionality delegates to
Bio::SeqIO's fasta parser.  I think this obviates the need for
Bio::SeqIO::gff.

Please update your repository and have a look at t/FeatureIO.t (unit test
for FeatureIO, also added).

-Allen


On Sat, 15 Jan 2005, Rob Edwards wrote:

> Because I need it for some things that I am doing, I have worked quite 
> a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> written this module, I have just made some cosmetic changes:
> 
> I have improved the validation processes that are applied as a gff3 
> file is parsed, and the module should now validate essentially 
> everything in the file except alignments. Validation is optional and is 
> based on the specification described at : 
> http://song.sourceforge.net/gff3.shtml
> 
> For clarification and edification I have created a couple of tables 
> describing the module and the validation that is applied to GFF3 files, 
> which you can see online: http://www.salmonella.org/bioperl/gff3.html
> 
> I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> sequences, it seems that you'd want to be able to call the next_seq 
> methods, and therefore SeqIO is more appropriate than FeatureIO for 
> those aspects. Currently the SeqIO module uses the FeatureIO module for 
> parsing the file, it just reorganizes things.
> 
> This provides two different interfaces for getting objects out of GFF3 
> files:
> 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> representing the annotations.
> 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> sequences with all the annotations attached.
> 
> The other difference between the two is that the former passes out the 
> objects as they are read, but the latter has to read the whole file to 
> get the annotations and the sequences.
> 
> At the moment I focussed on reading GFF3 files.
> 
> I have not committed these to cvs yet, pending comments from others. I 
> have some specific questions:
> 	Should I wait until after 1.5 is out?
> 	Is two separate modules really the right way to go about this?
> 	What about other GFF modules (like Bio::Tools::GFF)?
> 	Could someone give the modules a workout and let me know about bugs? I 
> am sure there are many.
> 
> I have posted these modules online via anonymous ftp at 
> ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> Take a look and let me know what you do and don't like!
> 
> Rob
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From allenday at ucla.edu  Tue Jan 18 00:34:01 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 18 00:30:16 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.GSO.4.05.10501171203280.26102-100000@phage.cshl.edu>
References: <Pine.GSO.4.05.10501171203280.26102-100000@phage.cshl.edu>
Message-ID: <Pine.LNX.4.58.0501172129280.19385@sumo.ctrl.ucla.edu>

Hi,

On Mon, 17 Jan 2005, Scott Cain wrote:

> Hi Rob,
> 
> Thanks for your work on this--I've put several comments in your
> original message below.
> 
> Scott
> 
> ---------Original Message--------
> Date: Sat, 15 Jan 2005 15:22:23 -0800
> From: Rob Edwards <rob@salmonella.org>
> Subject: [Bioperl-l] GFF3
> To: Bioperl list <bioperl-l@portal.open-bio.org>
> 
> Because I need it for some things that I am doing, I have worked quite 
> a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> written this module, I have just made some cosmetic changes:
> 
> I have improved the validation processes that are applied as a gff3 
> file is parsed, and the module should now validate essentially 
> everything in the file except alignments. Validation is optional and is 
> based on the specification described at : 
> http://song.sourceforge.net/gff3.shtml
> 
> SC> Excellent--Did you happen to relax the requirement that ID be unique
> SC> for each line of the GFF?  Allen and I put that in due to a misreading
> SC> of the spec.  The ID has to be unique for a *feature*, which can be
> SC> spread across several lines.

I'm not sure if this is taken care of in the code... actually, I'm a bit 
foggy on exactly what the problem is.

> For clarification and edification I have created a couple of tables
> describing the module and the validation that is applied to GFF3 files,
> which you can see online: http://www.salmonella.org/bioperl/gff3.html
> 
> SC> Very nice and well done--do you happen to have a pod-ified version
> SC> of this page?  It would be nice to include in the pod for
> SC> Bio::FeatureIO::gff.

That's nice, I'd like to see it folded into the gff.pm perldoc as well.

> I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> sequences, it seems that you'd want to be able to call the next_seq 
> methods, and therefore SeqIO is more appropriate than FeatureIO for 
> those aspects. Currently the SeqIO module uses the FeatureIO module for 
> parsing the file, it just reorganizes things.
> 
> This provides two different interfaces for getting objects out of GFF3 
> files:
> 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> representing the annotations.
> 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> sequences with all the annotations attached.
> 
> The other difference between the two is that the former passes out the 
> objects as they are read, but the latter has to read the whole file to 
> get the annotations and the sequences.
> 
> SC> I thought about doing something similar with SeqIO, but I am worried 
> SC> about the case where somebody tries to use SeqIO on a well 
> SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
> SC> but I suppose the same machine killing thing could be done if
> SC> someone tried to use SeqIO on a genbank file of Chr1.

See my previous email, I don't think we need the SeqIO module.

> At the moment I focussed on reading GFF3 files.
> 
> I have not committed these to cvs yet, pending comments from others. I 
> have some specific questions:
> 	Should I wait until after 1.5 is out?
> 
> SC> I don't have the definative answer, but I would say it doesn't
> SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
> SC> hardly a fully functional module as it is, so if we could 
> SC> squeeze a little more functionality into it before we
> SC> release it, that would be fine with me.

well it's in now.  and it passes tests.  there weren't any before, but i 
wrote some.  look in t/FeatureIO.t

> 	Is two separate modules really the right way to go about this?
> 
> SC> As long as it works for this case, I don't mind:  calling
> SC> 'next_feature' on a FeatureIO object until I run out of features
> SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
> SC> the same FeatureIO object until I run out of sequences.
> 
> 	What about other GFF modules (like Bio::Tools::GFF)?
> 
> SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
> SC> it will have to be kept around for apps that depend on it, I don't
> SC> see adding any major functionality as time well spent.
> 
> 	Could someone give the modules a workout and let me know about bugs? I 
> am sure there are many.
> 
> SC> I will try to soon, but it won't be until next week at 
> SC> the earliest.
> 
> I have posted these modules online via anonymous ftp at 
> ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> Take a look and let me know what you do and don't like!
> 
> Rob
> 
> 
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From rob at salmonella.org  Tue Jan 18 03:46:05 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 18 03:42:17 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.LNX.4.58.0501172129280.19385@sumo.ctrl.ucla.edu>
References: <Pine.GSO.4.05.10501171203280.26102-100000@phage.cshl.edu>
	<Pine.LNX.4.58.0501172129280.19385@sumo.ctrl.ucla.edu>
Message-ID: <64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>

Thanks for you help and comments. Here are a couple of points, and I'll 
work on filling in some of the other gaps.


>> SC> Excellent--Did you happen to relax the requirement that ID be 
>> unique
>> SC> for each line of the GFF?  Allen and I put that in due to a 
>> misreading
>> SC> of the spec.  The ID has to be unique for a *feature*, which can 
>> be
>> SC> spread across several lines.
>
> I'm not sure if this is taken care of in the code... actually, I'm a 
> bit
> foggy on exactly what the problem is.

It is not corrected yet. The problem is this section, around line 669. 
Its not true that each line can only have one ID, and this can be 
removed.

   if($attr{ID}){
     if(scalar( @{ $attr{ID} } ) > 1){
       $self->throw("Error in line:\n$feature_string\nA feature may have 
at most one ID value");
     }


>> For clarification and edification I have created a couple of tables
>> describing the module and the validation that is applied to GFF3 
>> files,
>> which you can see online: http://www.salmonella.org/bioperl/gff3.html
>>
>> SC> Very nice and well done--do you happen to have a pod-ified version
>> SC> of this page?  It would be nice to include in the pod for
>> SC> Bio::FeatureIO::gff.
>
> That's nice, I'd like to see it folded into the gff.pm perldoc as well.

I'll take care of PODify it over the next couple of days.


>> SC> I don't have the definative answer, but I would say it doesn't
>> SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
>> SC> hardly a fully functional module as it is, so if we could
>> SC> squeeze a little more functionality into it before we
>> SC> release it, that would be fine with me.
>
> well it's in now.  and it passes tests.  there weren't any before, but 
> i
> wrote some.  look in t/FeatureIO.t

Thanks for those, however at the moment the tests failed. See below.

The first series of errors die because the feature ID=AB000114 in 
t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
','

The second failure is because  hybrid1.gff3 isn't in cvs

Rob


% perl -I. -w t/FeatureIO.t
1..19
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
<GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
<GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
<GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
<GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
<GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
Use of uninitialized value in substitution (s///) at 
Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
<GEN5> line 10.
ok 7
ok 8

------------- EXCEPTION  -------------
MSG: Could not open t/data/hybrid1.gff3: No such file or directory
STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
STACK toplevel t/FeatureIO.t:83

--------------------------------------

From rob at salmonella.org  Tue Jan 18 03:46:11 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 18 03:42:28 2005
Subject: [Bioperl-l] GFF3
In-Reply-To: <Pine.LNX.4.58.0501172123360.19385@sumo.ctrl.ucla.edu>
References: <4FC537A9-674C-11D9-9C9B-000A959E1622@salmonella.org>
	<Pine.LNX.4.58.0501172123360.19385@sumo.ctrl.ucla.edu>
Message-ID: <67606A0B-692D-11D9-9265-000A959E1622@salmonella.org>

I don't really feel that strongly about this, but it seems that if I 
were downloading a gff3 file and wanted to read the sequence I would 
probably look in SeqIO for a reader. That was my primary rationale for 
Bio::SeqIO::gff.

Rob


On Jan 17, 2005, at 9:27 PM, Allen Day wrote:

> Hi Rob,
>
> I looked at FeatureIO::gff and merged in your changes with some
> modifications.
>
> I also added a next_seq() method to FeatureIO::gff that is activated 
> when
> a /^##FASTA/ or /^>/ line is encountered.  Functionality delegates to
> Bio::SeqIO's fasta parser.  I think this obviates the need for
> Bio::SeqIO::gff.
>
> Please update your repository and have a look at t/FeatureIO.t (unit 
> test
> for FeatureIO, also added).
>
> -Allen


From allenday at ucla.edu  Tue Jan 18 03:54:31 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 18 03:50:41 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>
References: <Pine.GSO.4.05.10501171203280.26102-100000@phage.cshl.edu>
	<Pine.LNX.4.58.0501172129280.19385@sumo.ctrl.ucla.edu>
	<64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>
Message-ID: <Pine.LNX.4.58.0501180053380.21413@sumo.ctrl.ucla.edu>

> The first series of errors die because the feature ID=AB000114 in 
> t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> ','

i'm not getting these errors, are you are in sync with cvs HEAD?

> The second failure is because  hybrid1.gff3 isn't in cvs

gff files are in cvs now.

> 
> Rob
> 
> 
> 
> % perl -I. -w t/FeatureIO.t
> 1..19
> ok 1
> ok 2
> ok 3
> ok 4
> ok 5
> ok 6
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> Use of uninitialized value in substitution (s///) at 
> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> <GEN5> line 10.
> ok 7
> ok 8
> 
> ------------- EXCEPTION  -------------
> MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> STACK toplevel t/FeatureIO.t:83
> 
> --------------------------------------
> 
From birney at ebi.ac.uk  Tue Jan 18 04:05:49 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Jan 18 04:03:42 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.OSX.4.58.0501171131340.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <Pine.LNX.4.44.0501180903430.2722-100000@pigeon.ebi.ac.uk>

On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> Hi Vladimir
> 
> The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> information often which the genbank flat file format loses; this is the
> information about which mRNA relates to which CDS. You may or may not need
> this information, it depends why you are doing the conversion. If you
> don't need this, you may want just a straightforward genbank->gff
> conversion. Let me know if this is what you want to do and I can help with
> that.
> 
> If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> isn't always possible to recover these with 100% fidelity from the genbank
> flat files. You may wish to pursue alternate approaches, such as
> downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> to offer downloads in alternate formats such as gff3? This would be
> fantastic)

This is on the road map for Ensembl due to Vectorbase, but don't forget we 
offer GTF format, which is a different and well established GFF derived 
format and very clean to parse.

Go to Ensembl website --> Click on EnsMart, select your genome, in Filter,
unselect the filter by genomic region (to get the entire region) then in
Output select structure and select "GTF" format.

> 
> If you'd prefer to carry on via the genbank flat file route, here's what
> you should do:
> 
> * get the latest version of genbank2gff3.PLS I have just checked into cvs
> (I can send you a copy if you are using a bioperl release and not cvs)
> 
> * run the script with the "--ethresh 3" option. This will raise the error
> severity threshold at which problems with genbank file become
> showstoppers.
> 
> In addition, I will take a look at this particular file and see what it is
> that is causing problems and get back to you.
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> 
> >     Greetings,
> > While parsing a genbank file taken from:
> > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > 0.dat as of Jan 2005,
> > I'm getting the following unflattening error:
> > --------------------------------------------------------
> > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > working on contig
> > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > error:
> > Details:
> > ------------- EXCEPTION  -------------
> > MSG: PROBLEM, SEVERITY==2
> > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > at root level
> > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> >
> > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > STACK (eval) genbank2gff3.PLS:345
> > STACK main::unflatten_seq genbank2gff3.PLS:344
> > STACK toplevel genbank2gff3.PLS:209
> >
> > --------------------------------------
> >
> > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > STDERR
> >
> > Using bioperl-1.5.0.RC2 under Linux.
> >
> >     Would be grateful for the hint,
> >       Vladimir
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney.  Work:  +44 1223 494420
             Email:  birney "at" ebi.ac.uk 
Clerical Assistant:  shelley "at" ebi.ac.uk
Please cc shelley for urgent or diary-dependent requests
-----------------------------------------------------------------

From birney at ebi.ac.uk  Tue Jan 18 04:16:29 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Jan 18 04:16:06 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.OSX.4.58.0501171811240.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <Pine.LNX.4.44.0501180909560.2722-100000@pigeon.ebi.ac.uk>

On Mon, 17 Jan 2005, Chris Mungall wrote:

> 
> It is a genbank formatted file - you can download it from the url Vladmir
> provides below.
> 
> There seem to be a few oddities to do with the ensembl-flavour genbank
> format which may be causing problems for the unflattener:
> 
> * There doesn't appear to be any 'gene' features - a gene model is just
> mRNAs and CDSs. This means the files don't even contain essential stuff
> like the gene symbol!

The symbols are on the mRNA and CDS (in fact most identifiers map to the
mRNA and CDS). Each mRNA and CDS has the ENSG identifier in there. We
could of course put in a Gene line as well, and I can flag this up to the
guys. We should do this as it is easy enough to do.


However Chris, as you imply, we don't consider our EMBL or GenBank flat
files somehow definitive - the Mart tool allows highly flexible
downloading of gene structure (GTF) and other things and if we do
implement a GFF3 dumper it is likely to be via the Mart tool again.


Underneath this the database and Perl and Java API allows nearly any sort 
of information to be yanked out, and the database is internet accessible 
directly at ensembldb.ensembl.org.


   --> I'll ask the guys here to put in a gene line - Chris - what 
precisely do you need in the format to tickle your unflattener right?

   --> GFF3 direct dumping is in 2005 todo list, but not at the top at the 
moment. 


> 
> * In the feature entry, for the reverse strand, ensembl nests the
> complement function inside the join function, listing sublocations in a
> 3'->5' direction. This is unusual, but not problemmatic in itself.
> However, I'm not 100% convinced that the bioperl genbank parser handles
> these cases correctly - I will expand on this in another email. It's not
> a problem for the vast majority of cases, but it will be problemmatic for
> certain rare situations where the sublocations are of mixed strand (eg
> trans-spliced genes).
> 
> I can implement a hack in the unflattener for the first problem. However,
> the question is - is it worth it? Without the gene feature the
> ensembl-flavoured genbank files seem not particularly useful (granted it
> is possible to get the gene data by integrating with LocusLink/EntrezGene
> but is it worth it?). I know for a fact that the data structures
> underlying ensembl are sound, so it seems counterproductive to use nothing
> but genbank/embl as a flat file distribution format (and to drop the gene
> features on top of that!). I know ensembl use GTF a lot internally, it
> would be great to see use made of this format (or even better, GFF3) for
> data distribution. Perhaps there's something I'm missing here.. I'll wait
> for comment from someone from ensembl before progressing here, to avoid
> any pointless work...
> 
> Cheers
> Chris
> 
> On Mon, 17 Jan 2005, Scott Cain wrote:
> 
> > Hi Vladimir,
> >
> > Not to ask a question on the level of "is it plugged in", but are you sure
> > it is a genbank formatted file?  I think you would get a different error
> > if it weren't, but I just wanted to make sure.
> >
> > Scott
> >
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> >
> >
> > On Mon, 17 Jan 2005, Chris Mungall wrote:
> >
> > >
> > > Hi Vladimir
> > >
> > > The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> > > information often which the genbank flat file format loses; this is the
> > > information about which mRNA relates to which CDS. You may or may not need
> > > this information, it depends why you are doing the conversion. If you
> > > don't need this, you may want just a straightforward genbank->gff
> > > conversion. Let me know if this is what you want to do and I can help with
> > > that.
> > >
> > > If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> > > isn't always possible to recover these with 100% fidelity from the genbank
> > > flat files. You may wish to pursue alternate approaches, such as
> > > downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> > > to offer downloads in alternate formats such as gff3? This would be
> > > fantastic)
> > >
> > > If you'd prefer to carry on via the genbank flat file route, here's what
> > > you should do:
> > >
> > > * get the latest version of genbank2gff3.PLS I have just checked into cvs
> > > (I can send you a copy if you are using a bioperl release and not cvs)
> > >
> > > * run the script with the "--ethresh 3" option. This will raise the error
> > > severity threshold at which problems with genbank file become
> > > showstoppers.
> > >
> > > In addition, I will take a look at this particular file and see what it is
> > > that is causing problems and get back to you.
> > >
> > > Cheers
> > > Chris
> > >
> > > On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> > >
> > > >     Greetings,
> > > > While parsing a genbank file taken from:
> > > > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > > > 0.dat as of Jan 2005,
> > > > I'm getting the following unflattening error:
> > > > --------------------------------------------------------
> > > > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > > > working on contig
> > > > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > > > error:
> > > > Details:
> > > > ------------- EXCEPTION  -------------
> > > > MSG: PROBLEM, SEVERITY==2
> > > > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > > > at root level
> > > > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> > > >
> > > > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > > > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > > > STACK (eval) genbank2gff3.PLS:345
> > > > STACK main::unflatten_seq genbank2gff3.PLS:344
> > > > STACK toplevel genbank2gff3.PLS:209
> > > >
> > > > --------------------------------------
> > > >
> > > > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > > > STDERR
> > > >
> > > > Using bioperl-1.5.0.RC2 under Linux.
> > > >
> > > >     Would be grateful for the hint,
> > > >       Vladimir
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l@portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > >
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney.  Work:  +44 1223 494420
             Email:  birney "at" ebi.ac.uk 
Clerical Assistant:  shelley "at" ebi.ac.uk
Please cc shelley for urgent or diary-dependent requests
-----------------------------------------------------------------

From danielucgbioinfo at yahoo.com.br  Tue Jan 18 05:34:50 2005
From: danielucgbioinfo at yahoo.com.br (Danielucg Sousa)
Date: Tue Jan 18 05:31:09 2005
Subject: [Bioperl-l] My last email about Bio::Graphics::Panel, please HELP
Message-ID: <20050118103450.97318.qmail@web53503.mail.yahoo.com>

Hi,

I'm showing a sequence on browser, but I not get do a
link http.
When a use: print $q->$map;
The out messanger is: 
Undefined subroutine CGI::<map name="bgmap00001"
id="bgmap00001">
<area shape="rect" coords="10,0,490,11"
href="http://www.google.com.br" title="test 2"
alt="test 2" />
<area shape="rect" coords="329,0,650,11"
href="http://www.google.com.br" title="test 2"
alt="test 2" />
</map>

Please, What I do?
I have used Bioperl 1.5 RC 2 
Thanky for all.

My little code :
#!/usr/bin/perl -wT

use strict;
use Bio::Graphics;
use Bio::Graphics::FeatureFile;
use Bio::SeqIO;
use Bio::SeqFeature::Generic;
use CGI  qw / :standard /;
use CGI::Pretty;

my $wholeseq =
Bio::SeqFeature::Generic->new(-start=>1,-end=>600);

my $q = new CGI;

print $q->header('text/html');
print $q->start_html('A Vector Rendering ');

print $q->h1('teste');
my $panel = Bio::Graphics::Panel->new(-length  => 
1000, -width  => 800, -pad_left     => 10,  -pad_right
   => 10,  -key_style =>'none', -spacing => -0.25, 
-box_subparts => 'true',-link =>
"http://www.google.com");

my $track =  $panel->add_track($wholeseq,  -glyph  =>
'transcript2', -bgcolor =>'orange', -bump   => 0,
-height =>12,-title=>'test 2', -link
=>'http://www.google.com.br' );

my $feature =
Bio::SeqFeature::Generic->new(-display_name=>'teste',
-score=>20, -start=>400, -end=>800,
-url=>'http://www.google.com' );
 $track -> add_feature($feature);
      
 my ($url,$map,$mapname) = $panel->image_and_map(-root
=> '/var/www/html',-url => '/tmpimages', -link =>
"http://www.google.com" );
 
print $q->img({-src=>$url,-usemap=>"#$mapname", -link
=> "http://www.google.com" });
print $q->$map;
print $q->($panel->png);
$panel->finished;
print $q->exit_html;

exit;

Thank you very much,
Daniel Xavier


_______________________________________________________ 
Yahoo! Acesso Gr?tis - Instale o discador do Yahoo! agora. http://br.acesso.yahoo.com/ - Internet r?pida e gr?tis
From sdavis2 at mail.nih.gov  Tue Jan 18 06:17:56 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Jan 18 06:15:54 2005
Subject: [Bioperl-l] Feature table comparison
In-Reply-To: <1106041017.41ecd8b96aa2b@webmail.salford.ac.uk>
References: <1105969185.41ebc021ee0a5@webmail.salford.ac.uk>
	<000801c4fd01$6caa6f00$7d75f345@WATSON>
	<1106041017.41ecd8b96aa2b@webmail.salford.ac.uk>
Message-ID: <9A5CF428-6942-11D9-B052-000D933565E8@mail.nih.gov>

Rob,

If you have files in EMBL format, you can use Bio::SeqIO to read them.  
What is in the EMBL files--protein or DNA?  Are the features named in a 
systematic manner (are the same genes called the same thing in both 
strains if they are present)?  If they are, can you simply do an ID 
matching between the two strains?  Judging from your email below, 
probably not.

If the question you are asking is truly the opposite of an alignment, 
then you will need to do more work.  This is beyond my usual 
bioinformatics realm, but I would imagine that you would need to align 
the two genomes first (and how you do this will greatly affect your 
results, I would suppose) and then look for what didn't align in each 
strain.  I'm sure others on the list have done this kind of thing 
before.  I'm just not sure what the state-of-the-art is for 
whole-genome alignments these days.

Sean

On Jan 18, 2005, at 4:36 AM, Robert Minshall wrote:

> i am basically trying to find the differences between 2 strains of 
> bacteria in
> embl format. what i really need is an inverted ACT (Artemis comparison 
> tool)
> diffseq from emboss wont do what i need, i just need to some how get a 
> list of
> protiens that are on one strain and not the other. This cn be done by 
> hand but
> will take months. oi was woundereing if there was a program out there 
> where i
> can input the 2 embl files and get a list of feature differences or the
> opposite of an alignment.
> Thanks
> Rob
> --
> Robert J Minshall
> Postgraduate Researcher in Microbiology,
> Biosciences Research Institute,
> School of Environment and Life Sciences,
> Lab 209 Cockcroft Building,
> University of Salford,
> Salford,
> Greater Manchester.
> M5 4WT
> UK
> 0161 2952652
> r.j.mishall@pgr.salford.ac.uk
>
>
>
> Quoting Sean Davis <sdavis2@mail.nih.gov>:
>
>> Rob,
>>
>> You will probably need to be a bit more specific.  What constitutes a
>> "genome" in your email below?  What are the features?  In what form 
>> are you
>> getting the data?  Do you have a specific question you are trying to 
>> answer?
>>
>> Sean
>>
>> ----- Original Message -----
>> From: "Robert Minshall" <R.J.Minshall@pgr.salford.ac.uk>
>> To: <bioperl-l@bioperl.org>
>> Sent: Monday, January 17, 2005 8:39 AM
>> Subject: [Bioperl-l] Feature table comparison
>>
>>
>>>
>>> Hi does any one know of or have a script that can compare the faeture
>>> tables of
>>> genomes and show what appears on one and not the other. ie i want to 
>>> find
>>> the
>>> differenmces on the feature tables. is this possible i'm new to perl 
>>> and
>>> was
>>> hoping that someone could point me in the right direction. my email 
>>> is
>>> r.j.minshall@pgr.salford.ac.uk
>>> thanks in advance
>>> Rob Minshall
>>>
>>> --
>>> Robert J Minshall
>>> Postgraduate Researcher in Microbiology,
>>> Biosciences Research Institute,
>>> School of Environment and Life Sciences,
>>> Lab 209 Cockcroft Building,
>>> University of Salford,
>>> Salford,
>>> Greater Manchester.
>>> M5 4WT
>>> UK
>>> 0161 2952652
>>> r.j.mishall@pgr.salford.ac.uk
>>>
>>>
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> Concerns about content should be sent to abuse@salford.ac.uk
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
>
> ----------------------------------------------------------------
> Concerns about content should be sent to abuse@salford.ac.uk

From sdavis2 at mail.nih.gov  Tue Jan 18 06:43:54 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Jan 18 06:41:37 2005
Subject: [Bioperl-l] Feature table comparison
In-Reply-To: <1106047454.41ecf1de788e5@webmail.salford.ac.uk>
References: <1105969185.41ebc021ee0a5@webmail.salford.ac.uk>
	<000801c4fd01$6caa6f00$7d75f345@WATSON>
	<1106041017.41ecd8b96aa2b@webmail.salford.ac.uk>
	<9A5CF428-6942-11D9-B052-000D933565E8@mail.nih.gov>
	<1106047454.41ecf1de788e5@webmail.salford.ac.uk>
Message-ID: <3B416620-6946-11D9-B052-000D933565E8@mail.nih.gov>

Rob,

Perhaps others have done something similar.  In general, it helps to 
post back to the list so we both benefit from others' knowledge and 
others benefit from your thoughts on what is not a straightforward 
problem.

As for my two-cents worth, can't you just go through the alignments for 
each strain, sort them in genomic order of one strain, and determine 
the segments not aligning based on the end of one alignment and the 
beginning of then next?  Do the sort for the other strain to get the 
same unaligned blocks for the other strain.  Then, move on to the next 
pairing of strains and repeat.  That will give you the unaligned blocks 
for each strain with respect to each other strain.  Then you can do go 
back to your feature table for each strain and look for overlaps 
between the unaligned segments and the annotated features--for this 
there are tools in bioperl.  See Bio::DB::GFF and ? others?

Sean

On Jan 18, 2005, at 6:24 AM, Robert Minshall wrote:

> tahnks for your help, so far i can align the DNA embl files no problem 
> and work
> out which bits are not alligned by habd but i have 6 strains to 
> compair against
> eachother and the first allignmnt has taken me months to work out so 
> far and
> i'm only 1/2 way through the thing. all i wanted to do was find 
> sections of dna
> not on one strain but on another and work out what the protiens were. 
> the
> feature table on most of the strains i have is not well annotated and 
> therefore
> i think that a feature table comparrison is now not the correct way 
> forward. i
> just want to separate out the sections of dna that are "unique" to one
> particular strain form the other and get the protien and see if it 
> appears on
> other strains or not. it seems simple in my head but in practice its 
> not.
> --
> Robert J Minshall
> Postgraduate Researcher in Microbiology,
> Biosciences Research Institute,
> School of Environment and Life Sciences,
> Lab 209 Cockcroft Building,
> University of Salford,
> Salford,
> Greater Manchester.
> M5 4WT
> UK
> 0161 2952652
> r.j.mishall@pgr.salford.ac.uk
>
>
>
> Quoting Sean Davis <sdavis2@mail.nih.gov>:
>
>> Rob,
>>
>> If you have files in EMBL format, you can use Bio::SeqIO to read them.
>> What is in the EMBL files--protein or DNA?  Are the features named in 
>> a
>> systematic manner (are the same genes called the same thing in both
>> strains if they are present)?  If they are, can you simply do an ID
>> matching between the two strains?  Judging from your email below,
>> probably not.
>>
>> If the question you are asking is truly the opposite of an alignment,
>> then you will need to do more work.  This is beyond my usual
>> bioinformatics realm, but I would imagine that you would need to align
>> the two genomes first (and how you do this will greatly affect your
>> results, I would suppose) and then look for what didn't align in each
>> strain.  I'm sure others on the list have done this kind of thing
>> before.  I'm just not sure what the state-of-the-art is for
>> whole-genome alignments these days.
>>
>> Sean
>>
>> On Jan 18, 2005, at 4:36 AM, Robert Minshall wrote:
>>
>>> i am basically trying to find the differences between 2 strains of
>>> bacteria in
>>> embl format. what i really need is an inverted ACT (Artemis 
>>> comparison
>>> tool)
>>> diffseq from emboss wont do what i need, i just need to some how get 
>>> a
>>> list of
>>> protiens that are on one strain and not the other. This cn be done by
>>> hand but
>>> will take months. oi was woundereing if there was a program out there
>>> where i
>>> can input the 2 embl files and get a list of feature differences or 
>>> the
>>> opposite of an alignment.
>>> Thanks
>>> Rob
>>> --
>>> Robert J Minshall
>>> Postgraduate Researcher in Microbiology,
>>> Biosciences Research Institute,
>>> School of Environment and Life Sciences,
>>> Lab 209 Cockcroft Building,
>>> University of Salford,
>>> Salford,
>>> Greater Manchester.
>>> M5 4WT
>>> UK
>>> 0161 2952652
>>> r.j.mishall@pgr.salford.ac.uk
>>>
>>>
>>>
>>> Quoting Sean Davis <sdavis2@mail.nih.gov>:
>>>
>>>> Rob,
>>>>
>>>> You will probably need to be a bit more specific.  What constitutes 
>>>> a
>>>> "genome" in your email below?  What are the features?  In what form
>>>> are you
>>>> getting the data?  Do you have a specific question you are trying to
>>>> answer?
>>>>
>>>> Sean
>>>>
>>>> ----- Original Message -----
>>>> From: "Robert Minshall" <R.J.Minshall@pgr.salford.ac.uk>
>>>> To: <bioperl-l@bioperl.org>
>>>> Sent: Monday, January 17, 2005 8:39 AM
>>>> Subject: [Bioperl-l] Feature table comparison
>>>>
>>>>
>>>>>
>>>>> Hi does any one know of or have a script that can compare the 
>>>>> faeture
>>>>> tables of
>>>>> genomes and show what appears on one and not the other. ie i want 
>>>>> to
>>>>> find
>>>>> the
>>>>> differenmces on the feature tables. is this possible i'm new to 
>>>>> perl
>>>>> and
>>>>> was
>>>>> hoping that someone could point me in the right direction. my email
>>>>> is
>>>>> r.j.minshall@pgr.salford.ac.uk
>>>>> thanks in advance
>>>>> Rob Minshall
>>>>>
>>>>> --
>>>>> Robert J Minshall
>>>>> Postgraduate Researcher in Microbiology,
>>>>> Biosciences Research Institute,
>>>>> School of Environment and Life Sciences,
>>>>> Lab 209 Cockcroft Building,
>>>>> University of Salford,
>>>>> Salford,
>>>>> Greater Manchester.
>>>>> M5 4WT
>>>>> UK
>>>>> 0161 2952652
>>>>> r.j.mishall@pgr.salford.ac.uk
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> Concerns about content should be sent to abuse@salford.ac.uk
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> Concerns about content should be sent to abuse@salford.ac.uk
>>
>>
>
>
> ----------------------------------------------------------------
> Concerns about content should be sent to abuse@salford.ac.uk

From palmeida at igc.gulbenkian.pt  Tue Jan 18 07:01:21 2005
From: palmeida at igc.gulbenkian.pt (palmeida@igc.gulbenkian.pt)
Date: Tue Jan 18 06:56:45 2005
Subject: [Bioperl-l] My last email about Bio::Graphics::Panel, please HELP
In-Reply-To: <20050118103450.97318.qmail@web53503.mail.yahoo.com>
References: <20050118103450.97318.qmail@web53503.mail.yahoo.com>
Message-ID: <20050118120121.GD5318@bioinf.igc.gulbenkian.pt>

Hi,

Have you tried: print $map;

You are using it as if $map were a subroutine of CGI, but you just want
to print whatever is in the variable $map.

-Paulo

On Tue, Jan 18, 2005 at 07:34:50AM -0300, Danielucg Sousa wrote:
> Hi,
> 
> I'm showing a sequence on browser, but I not get do a
> link http.
> When a use: print $q->$map;
> The out messanger is: 
> Undefined subroutine CGI::<map name="bgmap00001"
> id="bgmap00001">
> <area shape="rect" coords="10,0,490,11"
> href="http://www.google.com.br" title="test 2"
> alt="test 2" />
> <area shape="rect" coords="329,0,650,11"
> href="http://www.google.com.br" title="test 2"
> alt="test 2" />
> </map>
> 
> Please, What I do?
> I have used Bioperl 1.5 RC 2 
> Thanky for all.
> 
> My little code :
> #!/usr/bin/perl -wT
> 
> use strict;
> use Bio::Graphics;
> use Bio::Graphics::FeatureFile;
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> use CGI  qw / :standard /;
> use CGI::Pretty;
> 
> my $wholeseq =
> Bio::SeqFeature::Generic->new(-start=>1,-end=>600);
> 
> my $q = new CGI;
> 
> print $q->header('text/html');
> print $q->start_html('A Vector Rendering ');
> 
> print $q->h1('teste');
> my $panel = Bio::Graphics::Panel->new(-length  => 
> 1000, -width  => 800, -pad_left     => 10,  -pad_right
>    => 10,  -key_style =>'none', -spacing => -0.25, 
> -box_subparts => 'true',-link =>
> "http://www.google.com");
> 
> my $track =  $panel->add_track($wholeseq,  -glyph  =>
> 'transcript2', -bgcolor =>'orange', -bump   => 0,
> -height =>12,-title=>'test 2', -link
> =>'http://www.google.com.br' );
> 
> my $feature =
> Bio::SeqFeature::Generic->new(-display_name=>'teste',
> -score=>20, -start=>400, -end=>800,
> -url=>'http://www.google.com' );
>  $track -> add_feature($feature);
>       
>  my ($url,$map,$mapname) = $panel->image_and_map(-root
> => '/var/www/html',-url => '/tmpimages', -link =>
> "http://www.google.com" );
>  
> print $q->img({-src=>$url,-usemap=>"#$mapname", -link
> => "http://www.google.com" });
> print $q->$map;
> print $q->($panel->png);
> $panel->finished;
> print $q->exit_html;
> 
> exit;
> 
> Thank you very much,
> Daniel Xavier

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From avilella at ebi.ac.uk  Tue Jan 18 10:57:44 2005
From: avilella at ebi.ac.uk (Albert Vilella)
Date: Tue Jan 18 10:53:56 2005
Subject: [Bioperl-l] negative and decimal values in Bio::Graphics xyplot
Message-ID: <1106063864.5345.33.camel@magneto>

Hi,

I was uploading an xyplot file for Hapmap's GBrowse browser, that
contains negative decimal numbers. For what I saw, there seems to be a
problem with the plotting of negative values.

I assume that decimal values are allowed, not seeing any problem in
Bio::Graphics::Glyph::xyplot. This would make feasible to plot things
like:

----------------
[expression]
glyph = xyplot
graph_type=boxes
fgcolor = black
bgcolor = darkslateblue
height=100
min_score = 0.000001
max_score = 0.001
label=1
key=variscan_MRA_plots_for_genotypes_chr1_YRI.w100000
reference=chr1

##mra_levels_1-9
expression      mra_levels_1-9_YRI      1750001..1850000        0.000109
expression      mra_levels_1-9_YRI      1850001..1950000        0.000003
expression      mra_levels_1-9_YRI      1950001..2050000        0.000022
expression      mra_levels_1-9_YRI      2050001..2150000        0.000053
[...]
----------------

But negative values are either a problem in xyplot, or unlikely, in any
other step in the process of importing the data in GBrowse.

Any hint?

Thanks,

    Albert.

-- 
Albert Vilella Bertran    avilella_at_ub_edu
--------------------------------------------
Departament de Genetica
Universitat de Barcelona
Diagonal 645 08028, Barcelona
Tel: +34 934035306 Fax: +34 934034420
--------------------------------------------
avilella_at_ebi_ac_uk
EMBL Outstation, European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambs. CB10 1SD, United Kingdom
--------------------------------------------------
From cjm at fruitfly.org  Tue Jan 18 11:20:15 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Tue Jan 18 11:18:19 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.LNX.4.44.0501180909560.2722-100000@pigeon.ebi.ac.uk>
References: <Pine.LNX.4.44.0501180909560.2722-100000@pigeon.ebi.ac.uk>
Message-ID: <Pine.OSX.4.58.0501180813530.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


OK, so it looks like EnsMart may solve Vladimir's problem by bypassing the
genbank-format files altogether

Ewan - it'd be nice to see the GFF/GTFs appear in the main ftp download
area too at some point, as well as via dynamic EnsMart download. As far as
tweaking the ensembl genbank output, I think the addition of a feature of
type 'gene', with a single location covering the maximal extent of all
mRNAs, as is fairly-standard with genbank-format files - that should do
it.

Cheers
Chris

On Tue, 18 Jan 2005, Ewan Birney wrote:

> On Mon, 17 Jan 2005, Chris Mungall wrote:
>
> >
> > It is a genbank formatted file - you can download it from the url Vladmir
> > provides below.
> >
> > There seem to be a few oddities to do with the ensembl-flavour genbank
> > format which may be causing problems for the unflattener:
> >
> > * There doesn't appear to be any 'gene' features - a gene model is just
> > mRNAs and CDSs. This means the files don't even contain essential stuff
> > like the gene symbol!
>
> The symbols are on the mRNA and CDS (in fact most identifiers map to the
> mRNA and CDS). Each mRNA and CDS has the ENSG identifier in there. We
> could of course put in a Gene line as well, and I can flag this up to the
> guys. We should do this as it is easy enough to do.
>
>
> However Chris, as you imply, we don't consider our EMBL or GenBank flat
> files somehow definitive - the Mart tool allows highly flexible
> downloading of gene structure (GTF) and other things and if we do
> implement a GFF3 dumper it is likely to be via the Mart tool again.
>
>
> Underneath this the database and Perl and Java API allows nearly any sort
> of information to be yanked out, and the database is internet accessible
> directly at ensembldb.ensembl.org.
>
>
>    --> I'll ask the guys here to put in a gene line - Chris - what
> precisely do you need in the format to tickle your unflattener right?
>
>    --> GFF3 direct dumping is in 2005 todo list, but not at the top at the
> moment.
>
>
>
>
> >
> > * In the feature entry, for the reverse strand, ensembl nests the
> > complement function inside the join function, listing sublocations in a
> > 3'->5' direction. This is unusual, but not problemmatic in itself.
> > However, I'm not 100% convinced that the bioperl genbank parser handles
> > these cases correctly - I will expand on this in another email. It's not
> > a problem for the vast majority of cases, but it will be problemmatic for
> > certain rare situations where the sublocations are of mixed strand (eg
> > trans-spliced genes).
> >
> > I can implement a hack in the unflattener for the first problem. However,
> > the question is - is it worth it? Without the gene feature the
> > ensembl-flavoured genbank files seem not particularly useful (granted it
> > is possible to get the gene data by integrating with LocusLink/EntrezGene
> > but is it worth it?). I know for a fact that the data structures
> > underlying ensembl are sound, so it seems counterproductive to use nothing
> > but genbank/embl as a flat file distribution format (and to drop the gene
> > features on top of that!). I know ensembl use GTF a lot internally, it
> > would be great to see use made of this format (or even better, GFF3) for
> > data distribution. Perhaps there's something I'm missing here.. I'll wait
> > for comment from someone from ensembl before progressing here, to avoid
> > any pointless work...
> >
> > Cheers
> > Chris
> >
> > On Mon, 17 Jan 2005, Scott Cain wrote:
> >
> > > Hi Vladimir,
> > >
> > > Not to ask a question on the level of "is it plugged in", but are you sure
> > > it is a genbank formatted file?  I think you would get a different error
> > > if it weren't, but I just wanted to make sure.
> > >
> > > Scott
> > >
> > > ----------------------------------------------------------------------
> > > Scott Cain, Ph. D.				 	 cain@cshl.org
> > > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > > ----------------------------------------------------------------------
> > >
> > >
> > > On Mon, 17 Jan 2005, Chris Mungall wrote:
> > >
> > > >
> > > > Hi Vladimir
> > > >
> > > > The genbank2gff3 script, in scripts/Bio-DB-GFF is attempting to recover
> > > > information often which the genbank flat file format loses; this is the
> > > > information about which mRNA relates to which CDS. You may or may not need
> > > > this information, it depends why you are doing the conversion. If you
> > > > don't need this, you may want just a straightforward genbank->gff
> > > > conversion. Let me know if this is what you want to do and I can help with
> > > > that.
> > > >
> > > > If you _do_ wish to preserve the mRNA to CDS mappings, be aware that it
> > > > isn't always possible to recover these with 100% fidelity from the genbank
> > > > flat files. You may wish to pursue alternate approaches, such as
> > > > downloading ensembl as a mysql dump (any ensembl folks around.. any plans
> > > > to offer downloads in alternate formats such as gff3? This would be
> > > > fantastic)
> > > >
> > > > If you'd prefer to carry on via the genbank flat file route, here's what
> > > > you should do:
> > > >
> > > > * get the latest version of genbank2gff3.PLS I have just checked into cvs
> > > > (I can send you a copy if you are using a bioperl release and not cvs)
> > > >
> > > > * run the script with the "--ethresh 3" option. This will raise the error
> > > > severity threshold at which problems with genbank file become
> > > > showstoppers.
> > > >
> > > > In addition, I will take a look at this particular file and see what it is
> > > > that is causing problems and get back to you.
> > > >
> > > > Cheers
> > > > Chris
> > > >
> > > > On Mon, 17 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
> > > >
> > > > >     Greetings,
> > > > > While parsing a genbank file taken from:
> > > > > ftp://ftp.ensembl.org/pub/current_human/data/flatfiles/genbank/Homo_sapiens.
> > > > > 0.dat as of Jan 2005,
> > > > > I'm getting the following unflattening error:
> > > > > --------------------------------------------------------
> > > > > Processing file /ENSEMBL/Homo_sapiens.0.dat...
> > > > > working on contig
> > > > > chromosome:NCBI35:1:1:994676:1...chromosome:NCBI35:1:1:994676:1 Unflattening
> > > > > error:
> > > > > Details:
> > > > > ------------- EXCEPTION  -------------
> > > > > MSG: PROBLEM, SEVERITY==2
> > > > > no containers possible for SeqFeature of type: CDS; this SF is being placed
> > > > > at root level
> > > > > SF [Bio::SeqFeature::Generic=HASH(0x86485d8)]: CDS; ENSG00000146556
> > > > >
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::problem
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:940
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_group
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:1983
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_groups
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:1744
> > > > > STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
> > > > > /Bio/SeqFeature/Tools/Unflattener.pm:1449
> > > > > STACK (eval) genbank2gff3.PLS:345
> > > > > STACK main::unflatten_seq genbank2gff3.PLS:344
> > > > > STACK toplevel genbank2gff3.PLS:209
> > > > >
> > > > > --------------------------------------
> > > > >
> > > > > Possible gene unflattening error withchromosome:NCBI35:1:1:994676:1: consult
> > > > > STDERR
> > > > >
> > > > > Using bioperl-1.5.0.RC2 under Linux.
> > > > >
> > > > >     Would be grateful for the hint,
> > > > >       Vladimir
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l@portal.open-bio.org
> > > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > >
> > >
> > >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> -----------------------------------------------------------------
> Ewan Birney.  Work:  +44 1223 494420
>              Email:  birney "at" ebi.ac.uk
> Clerical Assistant:  shelley "at" ebi.ac.uk
> Please cc shelley for urgent or diary-dependent requests
> -----------------------------------------------------------------
>
>
From birney at ebi.ac.uk  Tue Jan 18 11:27:40 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Jan 18 11:23:53 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file with
	genbank2gff3. pls
In-Reply-To: <Pine.OSX.4.58.0501180813530.6764@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <Pine.LNX.4.44.0501181623500.2722-100000@pigeon.ebi.ac.uk>

On Tue, 18 Jan 2005, Chris Mungall wrote:

> 
> OK, so it looks like EnsMart may solve Vladimir's problem by bypassing the
> genbank-format files altogether
> 
> Ewan - it'd be nice to see the GFF/GTFs appear in the main ftp download
> area too at some point, as well as via dynamic EnsMart download. As far as
> tweaking the ensembl genbank output, I think the addition of a feature of
> type 'gene', with a single location covering the maximal extent of all
> mRNAs, as is fairly-standard with genbank-format files - that should do
> it.

We can't do putting the GTF file on the ftp site as matter of general
principal - too many people ask for "why can't you just put XXXX on the
ftp site"  - and then we will run out of disk space too fast. Mart is far
more scaleable solution. We can't just keep putting every possible format
combination on our ftp site - it wont scale. (admittedly GTF files wont 
dent our disk space much, but you get the idea). 

Mart should work well for people and we have command line tools to address 
mart as well as the web form. Vladimir - does this work for you?


I've cc'd in Arne (software lead) and Glenn (release coordinator for 
March) - guys - we need to add a "gene" line in our EMBL and GenBank 
dumper so it plays better with parsing scripts out there. Chris - just for 
the avoidance of us screwing up could you give a concrete example of the 
right sort of gene line?


From MEC at Stowers-Institute.org  Tue Jan 18 11:42:12 2005
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Tue Jan 18 11:38:55 2005
Subject: [Bioperl-l] Feature table comparison
Message-ID: <200501181638.j0IGcpKr027393@portal.open-bio.org>

You might find gffintersect.pl from
http://www.sanger.ac.uk/Software/formats/GFF/ relevant when it is
described to:

	gffintersect.pl - efficiently finds the intersection (or
exclusion) of two GFF streams, reporting intersection information in the
Group field. Definition of "intersection" allows for near-neighbours and
minimum-overlap


--Malcolm Cook


>-----Original Message-----
>From: bioperl-l-bounces@portal.open-bio.org 
>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
>Robert Minshall
>Sent: Monday, January 17, 2005 7:40 AM
>To: bioperl-l@bioperl.org
>Subject: [Bioperl-l] Feature table comparison
>
>
>
>Hi does any one know of or have a script that can compare the 
>faeture tables of
>genomes and show what appears on one and not the other. ie i 
>want to find the
>differenmces on the feature tables. is this possible i'm new 
>to perl and was
>hoping that someone could point me in the right direction. my email is
>r.j.minshall@pgr.salford.ac.uk
>thanks in advance
>Rob Minshall
>
>--
>Robert J Minshall
>Postgraduate Researcher in Microbiology,
>Biosciences Research Institute,
>School of Environment and Life Sciences,
>Lab 209 Cockcroft Building,
>University of Salford,
>Salford,
>Greater Manchester.
>M5 4WT
>UK
>0161 2952652
>r.j.mishall@pgr.salford.ac.uk
>
>
>
>
>
>
>----------------------------------------------------------------
>Concerns about content should be sent to abuse@salford.ac.uk
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From rob at salmonella.org  Tue Jan 18 13:54:21 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 18 13:50:31 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.LNX.4.58.0501180053380.21413@sumo.ctrl.ucla.edu>
References: <Pine.GSO.4.05.10501171203280.26102-100000@phage.cshl.edu>
	<Pine.LNX.4.58.0501172129280.19385@sumo.ctrl.ucla.edu>
	<64201ADE-692D-11D9-9265-000A959E1622@salmonella.org>
	<Pine.LNX.4.58.0501180053380.21413@sumo.ctrl.ucla.edu>
Message-ID: <5D5654B8-6982-11D9-9265-000A959E1622@salmonella.org>

I checked out bioperl-live again, and got the same errors. I just fixed 
the file and checked it back in. All tests pass for me.

Rob


On Jan 18, 2005, at 12:54 AM, Allen Day wrote:

>> The first series of errors die because the feature ID=AB000114 in
>> t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead 
>> of
>> ','
>
> i'm not getting these errors, are you are in sync with cvs HEAD?
>

From babenko at ncbi.nlm.nih.gov  Tue Jan 18 14:20:11 2005
From: babenko at ncbi.nlm.nih.gov (Babenko, Vladimir (NIH/NLM/NCBI))
Date: Tue Jan 18 14:17:23 2005
Subject: [Bioperl-l] Problem with parsing ENSEMBL genbank flat file wi
	th genbank2gff3. pls
Message-ID: <69BA0F938FAC6A4CBEF49461720696F2079659C2@nihexchange16.nih.gov>

    Sorry Ewan, 
Now I got that when I check multiple transcripts  it means genes with no
less than 2 transcripts.
     The Mart is amazing.
    Regards,
	Vladimir

>-----Original Message-----
>From: Ewan Birney [mailto:birney@ebi.ac.uk] 
>Sent: Tuesday, January 18, 2005 1:35 PM
>To: Babenko, Vladimir (NIH/NLM/NCBI)
>Cc: 'cjm@fruitfly.org'; 'cain@cshl.edu'; 'jason.stajich@duke.edu'
>Subject: RE: [Bioperl-l] Problem with parsing ENSEMBL genbank 
>flat file wi th genbank2gff3. pls
>
>On Tue, 18 Jan 2005, Babenko, Vladimir (NIH/NLM/NCBI) wrote:
>
>>        Hi all,
>> Thank you for your prompt response:
>> I believe that gff* is one of the ways to manage data, so 
>this is the 
>> reason I'm up to this.
>>      I played around for a while with all you propositions, 
>so further 
>> is a short response:
>> 1) Chris - I checked the script, it works fine, thank you. I'm 
>> currently exploring this option.
>> The point is that I do need to have both mRNA and CDS linked 
>to check 
>> for UTR, introns, and looks like genbank2gff3 works fine here.
>> 2) Ensmart - this is a great proposition. I haven't come to 
>the end of 
>> the investigation of this sound product, but it is my sneaky 
>suspicion 
>> that I need some kind of mysql dump to manage the stuff by myself in 
>> the same way bioperl does, but again it's a compromise between 
>> compexity and simplicity that I cannot fully embrace for a while.
>> Still the option of species comparison may annihilate my suspicions 
>> momentarily if I will be able to manage it.
>> BTW, Ewan, 'unchecking' for the entire genome and setting multiple 
>> transcripts for human yields (all other options unchecked) after 
>> Filters stage yields:
>> 7185 Entries pass Filters - that looks a bit few for human. 
>Probably I 
>> miss out something, sorry.
>
>You want to uncheck the "multiple transcripts" (this means 
>genes with more than one transcript: you want all genes).
>
>
>
From barry.moore at genetics.utah.edu  Tue Jan 18 14:21:47 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 18 14:18:21 2005
Subject: [Bioperl-l] bioperl
In-Reply-To: <20050116062005.37768.qmail@web30103.mail.mud.yahoo.com>
References: <20050116062005.37768.qmail@web30103.mail.mud.yahoo.com>
Message-ID: <41ED61CB.7010700@genetics.utah.edu>

Robin-

Have you checked out the BioJava project?  http://www.biojava.org/.  
Yes, the RichSeq objects use by bioperl contain the information from the 
GenBank features table.  Bio::SeqIO understands a variety of XML formats.

Barry

Robin XML wrote:

>Dear Sir,
>I am a beginner in bioinformatics. I am being excited
>by your fantastic biopel functions. But some questions
>confuse me:
>1.Is it possible to call bioperl functions by Java
>under Windows? because I need a GUI and need Java to
>handle XML template modification.
>2. Is it correct that with Bio::DB::GenBank() and
>Bio::SeqIO, I can get full GanBank data in XML format?
>Is it means include the features part?
>
>
>Thank you!!!!!!
>
>Best regards,
>Robin
>
>
>
>	
>		
>__________________________________ 
>Do you Yahoo!? 
>Yahoo! Mail - You care about security. So do we. 
>http://promotions.yahoo.com/new_mail
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From talcon at iastate.edu  Tue Jan 18 17:20:05 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Tue Jan 18 17:16:12 2005
Subject: [Bioperl-l] accessing GenBank
Message-ID: <41ED8B95.8020506@iastate.edu>

I seem unable to access GenBank.  When running bptutorial.exe, it seems 
like all the other examples run fine except that one.  Anyone know why 
that would be?  I'm using ActivePerl on Windows XP.  I have whichever 
version of bioperl is the current default using ppm (it's at least 
1.0).   When I run the exact same code from my campus Unix account, it 
works fine.

Tim

From barry.moore at genetics.utah.edu  Tue Jan 18 18:19:08 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 18 18:15:38 2005
Subject: [Bioperl-l] accessing GenBank
In-Reply-To: <41ED8B95.8020506@iastate.edu>
References: <41ED8B95.8020506@iastate.edu>
Message-ID: <41ED996C.8000301@genetics.utah.edu>

Tim,

If you just typed install bioperl at the ppm prompt you may well have 
1.2.x.  That doesn't necessarily explain why your tutorial script 
doesn't work, but it might.  You probably want to install at least 
bioperl 1.4 (1.5 is on the way soon).  Try the following script as 
another way to check if you've got bioperl working on your windows machine.

Barry

#!/usr/bin/perl

#A short script to demonstrate how to download sequences from GenBank 
and access
#the sequence and some associated annotations using Bioperl.

use strict;
use warnings;
use Bio::SeqIO;
use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed

#Get some sequence IDs either like below, or read in from a file.  Note that
#this sample script works with the accession numbers below (at least at 
the time
#it was written).  If you add different accession numbers, and you get 
errors,
#you may be calling for something that the sequence doesn't have.  
You'll have
#to add your own error trapping code to handle that.
my @ids = ('K03160', 'AB039327', 'BC035972');

#Create the GenBank database object to read from the database.
my $gb = new Bio::DB::GenBank();

#Create a sequence stream to pass the sequences from the database to the 
program.
my $seqio = $gb->get_Stream_by_id(\@ids);

#Loop over all of the sequences that you requested.
while (my $seq = $seqio->next_seq) {

  #Here is how you get methods directly from the RichSeq object.  Replace
  #'display_name' with any other method in Table 2. that can be called on
  #either the RichSeq object directly, or the PrimarySeq object which it has
  #inherited.
  print "Display Name:  ", $seq->display_name,"\n";
  print "Sequence Date:  ",$seq->get_dates,"\n";

  #Here is how to access the classification data from the species object.
  my $species = $seq->species;
  print "Species  :", $species->common_name,"\n";
  my @class = $species->classification;
  print "Classification:  @class\n";

  #Here is a general way to call things that are stored as a 
Bio::SeqFeature::
  #Generic object.  Replace 'source' with any other of the "major" 
headings in
  #the feature table (e.g gene, CDS, etc.) and replace 'organism' with 
any of
  #the tag values found under that heading (mol_type, locus_tag, gene, etc.)
  my @source_feats = grep { $_->primary_tag eq 'source' } 
$seq->get_SeqFeatures();
  my $source_feat = shift @source_feats;
  my @mol_type = $source_feat->get_tag_values('mol_type');
  print "Molecule Type:  @mol_type\n";
 
  #Here is a general way to call things that are stored as some type of a
  #Bio::Annotation oject.  This includes reference information, and 
comments.
  #Replace reference with 'comment' to get the comment, and replace
  #$ref->authors with $ref->title (or location, medline, etc.) to get other
  #reference categories
  my $ann = $seq->annotation();
  my @references = ($ann->get_Annotations('reference'));
  my $ref = shift @references;
  my ($title, $authors, $location, $pubmed, $reference);
  if (defined $ref) {
    $authors = $ref->authors;
    print "Authors:  $authors\n";
  }
  print "Sequence:  \n", $seq->seq, "\n\n";
}

Tim Alcon wrote:

> I seem unable to access GenBank.  When running bptutorial.exe, it 
> seems like all the other examples run fine except that one.  Anyone 
> know why that would be?  I'm using ActivePerl on Windows XP.  I have 
> whichever version of bioperl is the current default using ppm (it's at 
> least 1.0).   When I run the exact same code from my campus Unix 
> account, it works fine.
>
> Tim
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From palmeida at igc.gulbenkian.pt  Tue Jan 18 14:49:08 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Jan 18 21:36:46 2005
Subject: [Bioperl-l] NCBI BLink
Message-ID: <25361.192.168.50.3.1106077748.squirrel@192.168.50.3>

Is anyone working on a parser for BLink? I found a module by Rob Edwards
(http://www.salmonella.org/bioperl/Blink.pm), but I wanted to have the
Best Hits page, so I added an extra parameter (-besthits) to that module,
which you set to '1' to have the desired behavior.

I'm attaching the .diff file and the module itself.

--
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlinkNew.pm
Type: text/x-perl
Size: 11386 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050118/b721ccdc/BlinkNew.bin
-------------- next part --------------
115c115
<     my ($gi, $cutoff) = $self->_rearrange([qw(GI CUTOFF)], @args);
---
>     my ($gi, $cutoff, $besthits) = $self->_rearrange([qw(GI CUTOFF BESTHITS)], @args);
117a118
>     $self->{besthits}=$besthits || 0;
154a156,171
> =head2 besthits
> 
> Title	: besthits
> Usage	: $blink->besthits($besthits)
> Function: Get/Set All Hits or Best Hits
> Returns	: Current status
> Args	: 1 for best hits, anything else for all hits
> 
> =cut
> 
> sub besthits {
>  my ($self, $val)=@_;
>  if ($val) {$self->{besthits}=$val}
>  return $self->{besthits}
> }
> 
208a226
>  $header{'org'}=$self->{besthits};
256,257c274,278
<  return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url}, 
<          $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc});
---
>  
>   return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url},
>           $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}) if $self->{besthits} != 1;
>   return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url},
>           $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}, $self->{$r2r}->{hitsurl}, $self->{$r2r}->{hits}) if $self->{besthits} == 1;
259c280
< 
---
> 		
284c305,307
<          $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc});
---
>          $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}) if $self->{besthits} != 1;
>   return ($self->{$r2r}->{bl2sqlurl}, $self->{$r2r}->{score}, $self->{$r2r}->{p}, $self->{$r2r}->{prot2url},
> 	           $self->{$r2r}->{p2acc}, $self->{$r2r}->{p2blinkurl}, $self->{$r2r}->{p2gi}, $self->{$r2r}->{p2desc}, $self->{$r2r}->{hitsurl}, $self->{$r2r}->{hits}) if $self->{besthits} == 1;
317a341
>    next if (m#SCORE.*P.*ACCESSION#);
326,328c350
<    (m#^.*?onclick.*?href=(\S+?)>(\d+)</a>\s+(\d+)<img src.*?href=(\S+).*?>(\S+)</a>.*?<a.*?onclick.*?href=(\S+)>(\d+)</a>(.*?)$#i);
<      
< # fix vi!
---
> #   print STDERR "\n", $self->{besthits}, "\n";
330c352,359
<    unless ($1 && $2 && $3 && $4 && $5 && $6 && $7 && $8) {print STDERR "Couldn't parse\n$_\n"; next}
---
>    (m#^.*?onclick.*?href=(\S+?)>(\d+)</a>\s+(\d+)<img src.*?href=(\S+).*?>(\S+)</a>.*?<a.*?onclick.*?href=(\S+)>(\d+)</a>.*?<a.*?href=(\S+)>(\d+)</a>.*?<i>(.*?)</i>$#i) if $self->{besthits} ==1;  
>    unless  (($1 && $2 && $3 && $4 && $5 && $6 && $7 && $8 && $9 && $10) || ($self->{besthits} !=1))
>    {print STDERR "Couldn't parse\n$_\n"; next}
> 	 
>    (m#^.*?onclick.*?href=(\S+?)>(\d+)</a>\s+(\d+)<img src.*?href=(\S+).*?>(\S+)</a>.*?<a.*?onclick.*?href=(\S+)>(\d+)</a>(.*?)$#i) if $self->{besthits} != 1;
>    unless (($1 && $2 && $3 && $4 && $5 && $6 && $7 && $8) || ($self->{besthits}==1))
>    {print STDERR "Couldn't parse\n$_\n"; next}
>    
341c370,375
<    $self->{$rcount}->{p2desc}=$8;
---
>    if ($self->besthits != 1 ) { $self->{$rcount}->{p2desc}=$8; }
>    else {
>    $self->{$rcount}->{hitsurl}=$8;
>    $self->{$rcount}->{hits}=$9;
>    $self->{$rcount}->{p2desc}=$10;
>    }
From rob at salmonella.org  Wed Jan 19 00:05:01 2005
From: rob at salmonella.org (Rob Edwards)
Date: Wed Jan 19 00:01:17 2005
Subject: [Bioperl-l] NCBI BLink
In-Reply-To: <25361.192.168.50.3.1106077748.squirrel@192.168.50.3>
References: <25361.192.168.50.3.1106077748.squirrel@192.168.50.3>
Message-ID: <AC8A0EF9-69D7-11D9-8306-000A959E1622@salmonella.org>

I wrote that when I needed some BLink functionality, and the module did  
exactly what I wanted. However, I never really rolled it into bioperl  
proper, never committed it, and never pursued it. If you'd like to add  
more functionality go ahead.

Rob


On Jan 18, 2005, at 11:49 AM, Paulo Almeida wrote:

> Is anyone working on a parser for BLink? I found a module by Rob  
> Edwards
> (http://www.salmonella.org/bioperl/Blink.pm), but I wanted to have the
> Best Hits page, so I added an extra parameter (-besthits) to that  
> module,
> which you set to '1' to have the desired behavior.
>
> I'm attaching the .diff file and the module itself.
>
> --
> Paulo Almeida
> Instituto Gulbenkian de Ciencia
> Apartado 14, 2781-901, Oeiras, PORTUGAL
> tel  +351 21 446 46 35
> fax  +351 21 440 79 70
> http:// 
> www.igc.gulbenkian.pt<BlinkNew.pm><blink.diff>_________________________ 
> ______________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From nathanhaigh at ukonline.co.uk  Wed Jan 19 04:02:10 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 19 03:58:41 2005
Subject: [Bioperl-l] Installing Bioperl using PPM
In-Reply-To: <41ED8B95.8020506@iastate.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAfnQTzm+INUebF8JQZiYh7wEAAAAA@ukonline.co.uk>

Please read this even if you think you know how to install modules via PPM!

This is just a note on what to do to install the latest version of Bioperl (or any other module) via PPM:
Because of inconsistencies (see ActiveStates comments on this at the bottom) with the way PPM determines modules names/versions etc
it is NOT WISE to install modules by going:
    "install bioperl"
OR
    "upgrade bioperl"

You are very likely NOT to install the most recent version of a particular module by doing this! Instead you should do the
following:
    "search bioperl"
This gives a numbered list of the available modules in the repository's searched by your PPM (you can add additional repositories in
addition to the defaults given during installation - and this is advised). Chose the number of the correct module to install from
the list and do:
    "install <number>"
Where <number> is the number of the module you wish to install. This way you will ensure you install the correct module/version YOU
want not the arbitrary module that PPM seems to want to install most of the time!

As soon as the official Bioperl 1.5 is released, I'll make the ppd and tar.gz files so it can be installed via PPM.

Nathan

ActiveStates comment on PPM's inconsistencies for determining module name/versions:
"Sorry for the confusion, ppm3 is kind of inconsistent in spots."


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
> Sent: 18 January 2005 22:20
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] accessing GenBank
> 
> I seem unable to access GenBank.  When running bptutorial.exe, it seems
> like all the other examples run fine except that one.  Anyone know why
> that would be?  I'm using ActivePerl on Windows XP.  I have whichever
> version of bioperl is the current default using ppm (it's at least
> 1.0).   When I run the exact same code from my campus Unix account, it
> works fine.
> 
> Tim
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0503-0, 18/01/2005
> Tested on: 19/01/2005 08:41:49
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0503-0, 18/01/2005
Tested on: 19/01/2005 09:00:08
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From nathanhaigh at ukonline.co.uk  Wed Jan 19 04:05:10 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 19 04:01:52 2005
Subject: [Bioperl-l] accessing GenBank
In-Reply-To: <41ED8B95.8020506@iastate.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA/I1YTA6SREyZZhJZcK6V6QEAAAAA@ukonline.co.uk>

You should double check the versions you have installed on both systems, it may well be that one is out-of-date with respect to
connecting to genbank and the other is not. If you do indeed have a version of bioperl <1.4 installed on your windows machine,
follow my instructions to install 1.4 (1.5 should be available via PPM shortly after it's official release - some time soon!)

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
> Sent: 18 January 2005 22:20
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] accessing GenBank
> 
> I seem unable to access GenBank.  When running bptutorial.exe, it seems
> like all the other examples run fine except that one.  Anyone know why
> that would be?  I'm using ActivePerl on Windows XP.  I have whichever
> version of bioperl is the current default using ppm (it's at least
> 1.0).   When I run the exact same code from my campus Unix account, it
> works fine.
> 
> Tim
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0503-0, 18/01/2005
> Tested on: 19/01/2005 08:41:49
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0503-0, 18/01/2005
Tested on: 19/01/2005 09:05:03
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From michael.watson at bbsrc.ac.uk  Wed Jan 19 04:17:30 2005
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed Jan 19 04:17:01 2005
Subject: [Bioperl-l] My last email about Bio::Graphics::Panel, please HELP
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95E89BC4@iahce2knas1.iah.bbsrc.reserved>

Just for completion, Dan and I looked at this outside of the list and
finally discovered what he actually wanted was:

$q->print($map);
$q->print($panel->png);

Which makes a LOT more sense.... :-)

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of
palmeida@igc.gulbenkian.pt
Sent: 18 January 2005 12:01
To: Danielucg Sousa
Cc: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] My last email about Bio::Graphics::Panel,
please HELP


Hi,

Have you tried: print $map;

You are using it as if $map were a subroutine of CGI, but you just want
to print whatever is in the variable $map.

-Paulo

On Tue, Jan 18, 2005 at 07:34:50AM -0300, Danielucg Sousa wrote:
> Hi,
> 
> I'm showing a sequence on browser, but I not get do a
> link http.
> When a use: print $q->$map;
> The out messanger is:
> Undefined subroutine CGI::<map name="bgmap00001"
> id="bgmap00001">
> <area shape="rect" coords="10,0,490,11"
> href="http://www.google.com.br" title="test 2"
> alt="test 2" />
> <area shape="rect" coords="329,0,650,11"
> href="http://www.google.com.br" title="test 2"
> alt="test 2" />
> </map>
> 
> Please, What I do?
> I have used Bioperl 1.5 RC 2
> Thanky for all.
> 
> My little code :
> #!/usr/bin/perl -wT
> 
> use strict;
> use Bio::Graphics;
> use Bio::Graphics::FeatureFile;
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
> use CGI  qw / :standard /;
> use CGI::Pretty;
> 
> my $wholeseq = Bio::SeqFeature::Generic->new(-start=>1,-end=>600);
> 
> my $q = new CGI;
> 
> print $q->header('text/html');
> print $q->start_html('A Vector Rendering ');
> 
> print $q->h1('teste');
> my $panel = Bio::Graphics::Panel->new(-length  => 
> 1000, -width  => 800, -pad_left     => 10,  -pad_right
>    => 10,  -key_style =>'none', -spacing => -0.25,
> -box_subparts => 'true',-link =>
> "http://www.google.com");
> 
> my $track =  $panel->add_track($wholeseq,  -glyph  =>
> 'transcript2', -bgcolor =>'orange', -bump   => 0,
> -height =>12,-title=>'test 2', -link =>'http://www.google.com.br' );
> 
> my $feature = Bio::SeqFeature::Generic->new(-display_name=>'teste',
> -score=>20, -start=>400, -end=>800,
> -url=>'http://www.google.com' );
>  $track -> add_feature($feature);
>       
>  my ($url,$map,$mapname) = $panel->image_and_map(-root
> => '/var/www/html',-url => '/tmpimages', -link => 
> "http://www.google.com" );
>  
> print $q->img({-src=>$url,-usemap=>"#$mapname", -link
> => "http://www.google.com" });
> print $q->$map;
> print $q->($panel->png);
> $panel->finished;
> print $q->exit_html;
> 
> exit;
> 
> Thank you very much,
> Daniel Xavier

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l

From palmeida at igc.gulbenkian.pt  Wed Jan 19 06:48:16 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 19 06:43:34 2005
Subject: [Bioperl-l] Sequence features - complete sequences
Message-ID: <20050119114816.GA2618@bioinf.igc.gulbenkian.pt>

Hi,

I want to retrieve only complete sequences from GenPept records. I'm not
sure this is possible, because the notation may not be consistent, but I
was thinking of checking the 'Protein' feature for something like
<1..>952 (protein here:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=34223805), which says there is more sequence upstream. I have
been looking at BioPerl feature objects but I couldn't find a way of
doing this. What module would need to be changed, or what code could be
written (maybe using the 'tag' feature of SeqFeature::Generic.pm? But then there would have to be some code attached to the tag, to parse the required information.) to accomplish
this?

Thank you,
Paulo

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From Marc.Logghe at devgen.com  Wed Jan 19 07:32:39 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Jan 19 07:31:34 2005
Subject: [Bioperl-l] Sequence features - complete sequences
Message-ID: <BEE28BF86078B6429D6C780635718E219050F3@morelia.be.devgen.com>

> I want to retrieve only complete sequences from GenPept 
> records. I'm not
> sure this is possible, because the notation may not be 
> consistent, but I
> was thinking of checking the 'Protein' feature for something like
> <1..>952 (protein here:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=34223805), 
> which says there is more sequence upstream. I have

I think in a lot of cases this means that there _is_ more sequence in _reality_, but not available in the databases (e.g. not a full length cDNA clone). Just because that part was never isolated and thus, not sequenced. Meaning there is no way to fetch the missing sequence information.
Regards,
Marc

From palmeida at igc.gulbenkian.pt  Wed Jan 19 07:52:26 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 19 07:47:24 2005
Subject: [Bioperl-l] Sequence features - complete sequences
In-Reply-To: <BEE28BF86078B6429D6C780635718E219050F3@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E219050F3@morelia.be.devgen.com>
Message-ID: <20050119125226.GB2618@bioinf.igc.gulbenkian.pt>

That's what I thought, but I don't want the rest of the information; I
just want to skip those sequences, because I am using them to generate
ProtDist matrices, and they may distort the results.

-Paulo

> I think in a lot of cases this means that there _is_ more sequence in _reality_, but not available in the databases (e.g. not a full length cDNA clone). Just because that part was never isolated and thus, not sequenced. Meaning there is no way to fetch the missing sequence information.
> Regards,
> Marc

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From jason.stajich at duke.edu  Wed Jan 19 08:17:46 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 19 08:15:00 2005
Subject: [Bioperl-l] Sequence features - complete sequences
In-Reply-To: <20050119125226.GB2618@bioinf.igc.gulbenkian.pt>
References: <BEE28BF86078B6429D6C780635718E219050F3@morelia.be.devgen.com>
	<20050119125226.GB2618@bioinf.igc.gulbenkian.pt>
Message-ID: <8297945F-6A1C-11D9-8F42-000393C44276@duke.edu>

This is encoded in the Location object. In fact if the location is not 
exact we create a "fuzzy" location, you can just test if it is-a 
Bio::Location::FuzzyLocationI.

More properly (if you only cared about proteins that were incomplete in 
C-terminus or N-terminus) - You just need to check the  start_pos_type 
and  end_pos_type of the location. If they are 'EXACT' then the 
position is, well, exact.
if( $f->location->start_pos_type eq 'EXACT' && 
$f->location->end_pos_type eq 'EXACT' ) {
}

-jason
On Jan 19, 2005, at 7:52 AM, Paulo Almeida wrote:

> That's what I thought, but I don't want the rest of the information; I
> just want to skip those sequences, because I am using them to generate
> ProtDist matrices, and they may distort the results.
>
> -Paulo
>
>> I think in a lot of cases this means that there _is_ more sequence in 
>> _reality_, but not available in the databases (e.g. not a full length 
>> cDNA clone). Just because that part was never isolated and thus, not 
>> sequenced. Meaning there is no way to fetch the missing sequence 
>> information.
>> Regards,
>> Marc
>
> -- 
> Paulo Almeida
> Instituto Gulbenkian de Ciencia
> Apartado 14, 2781-901, Oeiras, PORTUGAL
> tel  +351 21 446 46 35
> fax  +351 21 440 79 70
> http://www.igc.gulbenkian.pt
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From g0404203 at nus.edu.sg  Wed Jan 19 01:57:34 2005
From: g0404203 at nus.edu.sg (Lee Ping Alison)
Date: Wed Jan 19 08:16:17 2005
Subject: [Bioperl-l] Attribute Tags in Bio::Tools::GFF &
	Bio::SeqFeature::Generic
Message-ID: <000201c4fdf4$4d8536c0$7347d90a@imcb.astar.edu.sg>

Hi,

Referring to the last column of the GFF file format which holds various user-specified attributes (tags), if I need to read in a GFF file and output the information to another GFF file, how do I retain the tags in the output?

e.g. 
i've used the following code:

my $gff = Bio::Tools::GFF->new(-file => $file, -gff_version => 2);
my $f = $gff->next_feature;
print $f->gff_string, "\n";


input is:
chr13   hg15.chr13      transcript      17950005        17951026        .       .       .       -name "chr13.0"

the output becomes:
chr13   hg15.chr13      transcript      17950005        17951026        .       .       .


Is there some way to retain the tag information? I figure this is related to the way the GFF line is parsed and the way the Generic feature object is created.

Thanks a lot in advance!

Best Regards,
Alison.
From malatorr at genoma.ciencias.uchile.cl  Wed Jan 19 08:57:00 2005
From: malatorr at genoma.ciencias.uchile.cl (Mariano Latorre A)
Date: Wed Jan 19 08:55:04 2005
Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
Message-ID: <1106143020.8976.17.camel@peach4>

Hi!

I'm developing a bioperl cgi to show a png PHRAP assembly using
Bio::Graphics + perl cgi.

The problem is that the Contig deffines the zero position and usually
the alignment ESTs are located before the contig. This implies I need to
use negative positions...but Bio::graphics doesn't allow to use negative
positions...it just cuts them off.

PD: I paste my source code.
-- 
Mariano Latorre A <malatorr@genoma.ciencias.uchile.cl>
Universidad de Chile

######################################################################################
#THE CGI "render.pl"

#!/usr/bin/perl


use CGI;
use lib "$ENV{HOME}/projects/bioperl-live";
use Bio::Graphics;
use Bio::SeqFeature::Generic;

my $form = new CGI;
print "Content-type: image/png\n\n";

my $panel = Bio::Graphics::Panel->new(-length => 2000,-width  => 1800, -
pad_left => 10, -pad_right => 10,);

my $full_length = Bio::SeqFeature::Generic->new(-start=>$form->param
("start"),-end=>$form->param("end"));

$panel->add_track($full_length,
                  -glyph   => 'arrow',
                  -tick    => 2,
                  -fgcolor => 'black',
                  -double  => 1,
                 );

my $track = $panel->add_track(-glyph => 'graded_segments',
                              -label  => 1,
                              -bgcolor => 'blue',
                              -min_score => -200,
                              -max_score => 1000);


for($i=1;defined($form->param("est$i"));$i++){
  my($name,$score,$start,$end) = split /\@/,$form->param("est$i");
  my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,
            -score=>$score,
            -start=>$start,
            -end=>$end);
  $track->add_feature($feature);
}
print $panel->png;


######################################################################################
# The Url to call the CGI
render.pl?
est1=hola@300@-200@367@&est2=chau@50@300@600@&est3=nada@50@310@25@&start=1&end=800

######################################################################################


From crabtree at tigr.org  Wed Jan 19 10:07:19 2005
From: crabtree at tigr.org (Crabtree, Jonathan)
Date: Wed Jan 19 10:05:06 2005
Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
Message-ID: <CAAF27359A31D44FA9A90AF7E299C36F8A03A3@EXCHANGE.TIGR.ORG>


Mariano-

You should be able to use negative coordinates by setting the -offset
parameter (to the absolute value of the smallest negative coordinate
that you want to use in your image) when you call Panel->new().  Someone
else asked about this a few months ago and reported that this solution
worked for them:

http://bioperl.org/pipermail/bioperl-l/2004-July/016538.html

Jonathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> Mariano Latorre A
> Sent: Wednesday, January 19, 2005 8:57 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
> 
> 
> Hi!
> 
> I'm developing a bioperl cgi to show a png PHRAP assembly 
> using Bio::Graphics + perl cgi.
> 
> The problem is that the Contig deffines the zero position and 
> usually the alignment ESTs are located before the contig. 
> This implies I need to use negative positions...but 
> Bio::graphics doesn't allow to use negative positions...it 
> just cuts them off.
> 
> PD: I paste my source code.
> -- 
> Mariano Latorre A <malatorr@genoma.ciencias.uchile.cl>
> Universidad de Chile
> 
> ##############################################################
> ########################
> #THE CGI "render.pl"
> 
> #!/usr/bin/perl
> 
> 
> use CGI;
> use lib "$ENV{HOME}/projects/bioperl-live";
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> 
> my $form = new CGI;
> print "Content-type: image/png\n\n";
> 
> my $panel = Bio::Graphics::Panel->new(-length => 2000,-width  
> => 1800, - pad_left => 10, -pad_right => 10,);
> 
> my $full_length = Bio::SeqFeature::Generic->new(-start=>$form->param
> ("start"),-end=>$form->param("end"));
> 
> $panel->add_track($full_length,
>                   -glyph   => 'arrow',
>                   -tick    => 2,
>                   -fgcolor => 'black',
>                   -double  => 1,
>                  );
> 
> my $track = $panel->add_track(-glyph => 'graded_segments',
>                               -label  => 1,
>                               -bgcolor => 'blue',
>                               -min_score => -200,
>                               -max_score => 1000);
> 
> 
> for($i=1;defined($form->param("est$i"));$i++){
>   my($name,$score,$start,$end) = split /\@/,$form->param("est$i");
>   my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,
>             -score=>$score,
>             -start=>$start,
>             -end=>$end);
>   $track->add_feature($feature);
> }
> print $panel->png;
> 
> 
> ##############################################################
> ########################
> # The Url to call the CGI
> render.pl? 
> est1=hola@300@-200@367@&est2=chau@50@300@600@&est3=nada@50@310
> @25@&start=1&end=800
> 
> ##############################################################
> ########################
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 

From crabtree at tigr.org  Wed Jan 19 10:22:24 2005
From: crabtree at tigr.org (Crabtree, Jonathan)
Date: Wed Jan 19 10:20:44 2005
Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
Message-ID: <CAAF27359A31D44FA9A90AF7E299C36F9383BA@EXCHANGE.TIGR.ORG>


Mariano-

I just realized that I got the sign wrong (again!).  You'll want to set -offset to a *negative* number, not a positive number.  For example:

#!/usr/bin/perl

use Bio::Graphics::Panel;
use Bio::SeqFeature::Generic;

my $panel = Bio::Graphics::Panel->new(-length=> 1000, -offset=> -100, -width=> 600);

my $scale = Bio::SeqFeature::Generic->new(-start => -75, -end => 100);
$panel->add_track($scale, 
		  -glyph => 'anchored_arrow',
		  -tick => 2,
		  -fontcolor => '#3d5315',
		  -fgcolor => '#3d5315',
		  -bgcolor => '#e3ffb7',
		  );

my $gd =  $panel->gd();
print $gd->png();


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org on behalf of Crabtree, Jonathan
Sent: Wed 1/19/2005 10:07 AM
To: malatorr@genoma.ciencias.uchile.cl; bioperl-l@bioperl.org
Subject: RE: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
 

Mariano-

You should be able to use negative coordinates by setting the -offset
parameter (to the absolute value of the smallest negative coordinate
that you want to use in your image) when you call Panel->new().  Someone
else asked about this a few months ago and reported that this solution
worked for them:

http://bioperl.org/pipermail/bioperl-l/2004-July/016538.html

Jonathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> Mariano Latorre A
> Sent: Wednesday, January 19, 2005 8:57 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] help with PHRAP assembly & Bio::Graphics
> 
> 
> Hi!
> 
> I'm developing a bioperl cgi to show a png PHRAP assembly 
> using Bio::Graphics + perl cgi.
> 
> The problem is that the Contig deffines the zero position and 
> usually the alignment ESTs are located before the contig. 
> This implies I need to use negative positions...but 
> Bio::graphics doesn't allow to use negative positions...it 
> just cuts them off.
> 
> PD: I paste my source code.
> -- 
> Mariano Latorre A <malatorr@genoma.ciencias.uchile.cl>
> Universidad de Chile
> 
> ##############################################################
> ########################
> #THE CGI "render.pl"
> 
> #!/usr/bin/perl
> 
> 
> use CGI;
> use lib "$ENV{HOME}/projects/bioperl-live";
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> 
> my $form = new CGI;
> print "Content-type: image/png\n\n";
> 
> my $panel = Bio::Graphics::Panel->new(-length => 2000,-width  
> => 1800, - pad_left => 10, -pad_right => 10,);
> 
> my $full_length = Bio::SeqFeature::Generic->new(-start=>$form->param
> ("start"),-end=>$form->param("end"));
> 
> $panel->add_track($full_length,
>                   -glyph   => 'arrow',
>                   -tick    => 2,
>                   -fgcolor => 'black',
>                   -double  => 1,
>                  );
> 
> my $track = $panel->add_track(-glyph => 'graded_segments',
>                               -label  => 1,
>                               -bgcolor => 'blue',
>                               -min_score => -200,
>                               -max_score => 1000);
> 
> 
> for($i=1;defined($form->param("est$i"));$i++){
>   my($name,$score,$start,$end) = split /\@/,$form->param("est$i");
>   my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,
>             -score=>$score,
>             -start=>$start,
>             -end=>$end);
>   $track->add_feature($feature);
> }
> print $panel->png;
> 
> 
> ##############################################################
> ########################
> # The Url to call the CGI
> render.pl? 
> est1=hola@300@-200@367@&est2=chau@50@300@600@&est3=nada@50@310
> @25@&start=1&end=800
> 
> ##############################################################
> ########################
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org 
> http://portal.open-> bio.org/mailman/listinfo/bioperl-l
> 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From kmdaily at indiana.edu  Wed Jan 19 11:48:38 2005
From: kmdaily at indiana.edu (Daily, Kenneth Michael)
Date: Wed Jan 19 11:44:49 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in SwissProt
	file
Message-ID: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>

I want to work with a local copy of the SwissProt database, and need to search through all of the entries. I only see methods to return sequences by accession. However, I cannot use just FASTA format of the SwissProt records, as I need to use the feature fields. What I need to learn is how to do a DB search on the features field of the SwissProt records, if its possible. Would there be any advantage do doing it with the DB instead of just using SeqIO as an input stream? I think it might, since every time I want to do a search I must read in the entire file again, which is very costly. Thank you.

Kenny Daily
Indiana University
School of Informatics
kmdaily [at] indiana [dot] edu

From sdavis2 at mail.nih.gov  Wed Jan 19 13:01:21 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Jan 19 12:57:35 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
	SwissProt file
In-Reply-To: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>
References: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>
Message-ID: <2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>

Kenny,

If this is something you are going to be doing often, you might want to 
look at bioperl-db.  Alternatively, UCSC maintains a fully-relational 
swissprot database 
(http://hgdownload.cse.ucsc.edu/goldenPath/swissProt/database/) that 
you could pretty easily load into a mysql server.  You can access their 
mysql server directly (let me know if you want to do this), also, but 
if you are running any kind of batch query, I would suggest you 
download the tables and load them yourself (really pretty easy to do).

Sean

On Jan 19, 2005, at 11:48 AM, Daily, Kenneth Michael wrote:

> I want to work with a local copy of the SwissProt database, and need 
> to search through all of the entries. I only see methods to return 
> sequences by accession. However, I cannot use just FASTA format of the 
> SwissProt records, as I need to use the feature fields. What I need to 
> learn is how to do a DB search on the features field of the SwissProt 
> records, if its possible. Would there be any advantage do doing it 
> with the DB instead of just using SeqIO as an input stream? I think it 
> might, since every time I want to do a search I must read in the 
> entire file again, which is very costly. Thank you.
>
> Kenny Daily
> Indiana University
> School of Informatics
> kmdaily [at] indiana [dot] edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at duke.edu  Wed Jan 19 13:30:39 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 19 13:34:47 2005
Subject: [Bioperl-l] Fwd: [Bioperl-guts-l] [Bug 1734] New: a RefSeq entry
	not converted to SwissProt format
Message-ID: <37FB8937-6A48-11D9-88CC-000393C44276@duke.edu>


Do any Swissprot experts know what the feature table should look like 
for this type of feature table (genpept)?

      Site            join(243,538)
                      /note="involved in regulation of interaction with
                      glutamyl substrate"
                      /site_type="unclassified"
      Site            join(279,338,361)
                      /note="catalytic triad"
                      /site_type="active"
      Site            join(403,450,455)
                      /note="involved in Ca2+ complexation"
                      /site_type="unclassified"

-jason


Begin forwarded message:

> From: bugzilla-daemon@portal.open-bio.org
> Date: January 19, 2005 8:34:10 AM EST
> To: bioperl-guts-l@bioperl.org
> Cc: Subject: [Bioperl-guts-l] [Bug 1734] New: a RefSeq entry not 
> converted to SwissProt format
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1734
>
>            Summary: a RefSeq entry not converted to SwissProt format
>            Product: Bioperl
>            Version: 1.4 branch
>           Platform: Sun
>         OS/Version: Solaris
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: Bio::SeqIO
>         AssignedTo: bioperl-guts-l@bioperl.org
>         ReportedBy: laurent.falquet@isb-sib.ch
>
>
> The RefSeq entry NP_443187 generates an error when I tried to convert 
> it to SwissProt format using
> Bio::SeqIO and genbank format as input. (all other RefSeq entries are 
> converted normally using the
> same method).
>
> Here is the error message:
> len 1 is 56 len 2 is 34
> Error sequence not parsable
> Programming error - cannot called write_line_swissprot_regex with 
> different length
> pre1 (FT   Site     join(279,338,361) join(279,338,361)       ) and
> pre2 (FT                                ) tags! at 
> /usr/local/lib/perl5/site_perl/5.8.3/Bio/SeqIO/swiss.pm line
> 1124, <GEN0> line 98.
>
>
>
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2497 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050119/23f4032c/attachment.bin
From cjm at fruitfly.org  Wed Jan 19 15:37:25 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Wed Jan 19 15:33:38 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in	SwissProt
	file
In-Reply-To: <2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
References: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>
	<2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
Message-ID: <Pine.OSX.4.58.0501191224250.15979@skerryvore.dhcp.lbl.gov>


This is a good solution. If you're after something a bit more lightweight
than a relational solution, which typically involves a lot of admin and
(often slow) database loading (although this isn't a problem here as the
UCSC folks are nice enough to make their SP db available), then you may
want to look into an xml db solution

For example, you can download the swiss xml from the EBI and stick it into
something like Apache Xindice, then grab the sequences you want using an
arbitrary XPath query, and transform the results with something like XSLT
or XML::Twig. There's more of an initial learning curve but the same
solution pattern is reusable in lots of other contexts.

XPath isn't as powerful as SQL, but on the other hand the admin & coding
overhead is lower. It's very similar to the Bio::Index solution, with the
additional advantage of more queries & indexing.

There's also SRS too, which give you fairly flexible querying
capabilities. YMMV.

Cheers
Chris

On Wed, 19 Jan 2005, Sean Davis wrote:

> Kenny,
>
> If this is something you are going to be doing often, you might want to
> look at bioperl-db.  Alternatively, UCSC maintains a fully-relational
> swissprot database
> (http://hgdownload.cse.ucsc.edu/goldenPath/swissProt/database/) that
> you could pretty easily load into a mysql server.  You can access their
> mysql server directly (let me know if you want to do this), also, but
> if you are running any kind of batch query, I would suggest you
> download the tables and load them yourself (really pretty easy to do).
>
> Sean
>
> On Jan 19, 2005, at 11:48 AM, Daily, Kenneth Michael wrote:
>
> > I want to work with a local copy of the SwissProt database, and need
> > to search through all of the entries. I only see methods to return
> > sequences by accession. However, I cannot use just FASTA format of the
> > SwissProt records, as I need to use the feature fields. What I need to
> > learn is how to do a DB search on the features field of the SwissProt
> > records, if its possible. Would there be any advantage do doing it
> > with the DB instead of just using SeqIO as an input stream? I think it
> > might, since every time I want to do a search I must read in the
> > entire file again, which is very costly. Thank you.
> >
> > Kenny Daily
> > Indiana University
> > School of Informatics
> > kmdaily [at] indiana [dot] edu
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From yguo at vbi.vt.edu  Wed Jan 19 16:16:02 2005
From: yguo at vbi.vt.edu (yguo@vbi.vt.edu)
Date: Wed Jan 19 16:12:41 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisher website.
In-Reply-To: <Pine.OSX.4.58.0501191224250.15979@skerryvore.dhcp.lbl.gov>
References: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu><2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
	<Pine.OSX.4.58.0501191224250.15979@skerryvore.dhcp.lbl.gov>
Message-ID: <1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>

Hi,

While working for BRC project in VBI, I wrote a perl module for retrieving
full text pdf files from the publisher website using the information of
pubmed abstract page. If anyone wants to use it, you can contact with me.
I will see if it is worthwhile to put it in Bioperl.

Yongjian
at
Virginia Bioinformatics Institute.


From Peter.Robinson at t-online.de  Wed Jan 19 16:53:04 2005
From: Peter.Robinson at t-online.de (Peter Robinson)
Date: Wed Jan 19 16:48:29 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisher
	website.
In-Reply-To: <1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>
References: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>
	<2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
	<Pine.OSX.4.58.0501191224250.15979@skerryvore.dhcp.lbl.gov>
	<1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>
Message-ID: <1106171584.3667.16.camel@localhost.localdomain>

That sounds extremely interesting and I would appreciate getting a copy
for testing.
-peter


On Wed, 2005-01-19 at 22:16, yguo@vbi.vt.edu wrote:
> Hi,
> 
> While working for BRC project in VBI, I wrote a perl module for retrieving
> full text pdf files from the publisher website using the information of
> pubmed abstract page. If anyone wants to use it, you can contact with me.
> I will see if it is worthwhile to put it in Bioperl.
> 
> Yongjian
> at
> Virginia Bioinformatics Institute.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Peter N. Robinson
peter.robinson@t-online.de
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson/

From cain at cshl.edu  Wed Jan 19 16:56:07 2005
From: cain at cshl.edu (Scott Cain)
Date: Wed Jan 19 16:52:19 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.LNX.4.58.0501180053380.21413@sumo.ctrl.ucla.edu>
Message-ID: <Pine.GSO.4.05.10501191655300.8235-100000@phage.cshl.edu>

I just did a cvs update and the last few tests are failing on MacOSX 10.3.
I'll try to sort it out over the next couple of days.

Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Tue, 18 Jan 2005, Allen Day wrote:

> > The first series of errors die because the feature ID=AB000114 in 
> > t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> > ','
> 
> i'm not getting these errors, are you are in sync with cvs HEAD?
> 
> > The second failure is because  hybrid1.gff3 isn't in cvs
> 
> gff files are in cvs now.
> 
> > 
> > Rob
> > 
> > 
> > 
> > % perl -I. -w t/FeatureIO.t
> > 1..19
> > ok 1
> > ok 2
> > ok 3
> > ok 4
> > ok 5
> > ok 6
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > Use of uninitialized value in substitution (s///) at 
> > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > <GEN5> line 10.
> > ok 7
> > ok 8
> > 
> > ------------- EXCEPTION  -------------
> > MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> > STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> > STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> > STACK toplevel t/FeatureIO.t:83
> > 
> > --------------------------------------
> > 
> 

From cain at cshl.edu  Wed Jan 19 16:57:54 2005
From: cain at cshl.edu (Scott Cain)
Date: Wed Jan 19 16:54:02 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.LNX.4.58.0501172129280.19385@sumo.ctrl.ucla.edu>
Message-ID: <Pine.GSO.4.05.10501191656330.8235-100000@phage.cshl.edu>

Allen,

Sorry about the ID problem/question--FeatureIO is fine in that respect.  I
was misremembering a problem with a chado loader as a bioperl problem.

Thanks,
Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Mon, 17 Jan 2005, Allen Day wrote:

> Hi,
> 
> On Mon, 17 Jan 2005, Scott Cain wrote:
> 
> > Hi Rob,
> > 
> > Thanks for your work on this--I've put several comments in your
> > original message below.
> > 
> > Scott
> > 
> > ---------Original Message--------
> > Date: Sat, 15 Jan 2005 15:22:23 -0800
> > From: Rob Edwards <rob@salmonella.org>
> > Subject: [Bioperl-l] GFF3
> > To: Bioperl list <bioperl-l@portal.open-bio.org>
> > 
> > Because I need it for some things that I am doing, I have worked quite 
> > a bit on the GFF3 parser Bio::FeatureIO::gff. Several people have 
> > written this module, I have just made some cosmetic changes:
> > 
> > I have improved the validation processes that are applied as a gff3 
> > file is parsed, and the module should now validate essentially 
> > everything in the file except alignments. Validation is optional and is 
> > based on the specification described at : 
> > http://song.sourceforge.net/gff3.shtml
> > 
> > SC> Excellent--Did you happen to relax the requirement that ID be unique
> > SC> for each line of the GFF?  Allen and I put that in due to a misreading
> > SC> of the spec.  The ID has to be unique for a *feature*, which can be
> > SC> spread across several lines.
> 
> I'm not sure if this is taken care of in the code... actually, I'm a bit 
> foggy on exactly what the problem is.
> 
> > For clarification and edification I have created a couple of tables
> > describing the module and the validation that is applied to GFF3 files,
> > which you can see online: http://www.salmonella.org/bioperl/gff3.html
> > 
> > SC> Very nice and well done--do you happen to have a pod-ified version
> > SC> of this page?  It would be nice to include in the pod for
> > SC> Bio::FeatureIO::gff.
> 
> That's nice, I'd like to see it folded into the gff.pm perldoc as well.
> 
> > I also wrote a Bio::SeqIO::gff module. Since gff3 files can hold 
> > sequences, it seems that you'd want to be able to call the next_seq 
> > methods, and therefore SeqIO is more appropriate than FeatureIO for 
> > those aspects. Currently the SeqIO module uses the FeatureIO module for 
> > parsing the file, it just reorganizes things.
> > 
> > This provides two different interfaces for getting objects out of GFF3 
> > files:
> > 	Bio::FeatureIO::gff will return Bio::SeqFeature::Annotated objects 
> > representing the annotations.
> > 	Bio::SeqIO::gff will return Bio::Seq objects representing the 
> > sequences with all the annotations attached.
> > 
> > The other difference between the two is that the former passes out the 
> > objects as they are read, but the latter has to read the whole file to 
> > get the annotations and the sequences.
> > 
> > SC> I thought about doing something similar with SeqIO, but I am worried 
> > SC> about the case where somebody tries to use SeqIO on a well 
> > SC> annotated human Chr1 GFF3 file (if one were ever to exist :-) ,
> > SC> but I suppose the same machine killing thing could be done if
> > SC> someone tried to use SeqIO on a genbank file of Chr1.
> 
> See my previous email, I don't think we need the SeqIO module.
> 
> > At the moment I focussed on reading GFF3 files.
> > 
> > I have not committed these to cvs yet, pending comments from others. I 
> > have some specific questions:
> > 	Should I wait until after 1.5 is out?
> > 
> > SC> I don't have the definative answer, but I would say it doesn't
> > SC> matter much, as long as it passes tests.  Bio::FeatureIO::gff is
> > SC> hardly a fully functional module as it is, so if we could 
> > SC> squeeze a little more functionality into it before we
> > SC> release it, that would be fine with me.
> 
> well it's in now.  and it passes tests.  there weren't any before, but i 
> wrote some.  look in t/FeatureIO.t
> 
> > 	Is two separate modules really the right way to go about this?
> > 
> > SC> As long as it works for this case, I don't mind:  calling
> > SC> 'next_feature' on a FeatureIO object until I run out of features
> > SC> and then calling 'next_sequence' (and get a Bio::PrimarySeq) on
> > SC> the same FeatureIO object until I run out of sequences.
> > 
> > 	What about other GFF modules (like Bio::Tools::GFF)?
> > 
> > SC> I am willing to let Bio::Tools::GFF die a terrible death.  While
> > SC> it will have to be kept around for apps that depend on it, I don't
> > SC> see adding any major functionality as time well spent.
> > 
> > 	Could someone give the modules a workout and let me know about bugs? I 
> > am sure there are many.
> > 
> > SC> I will try to soon, but it won't be until next week at 
> > SC> the earliest.
> > 
> > I have posted these modules online via anonymous ftp at 
> > ftp://ftp.salmonella.org/rob/bioperl/GFF_modules.tgz
> > Take a look and let me know what you do and don't like!
> > 
> > Rob
> > 
> > 
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 

From allenday at ucla.edu  Wed Jan 19 17:28:40 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 17:27:08 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisher
	website.
In-Reply-To: <1106171584.3667.16.camel@localhost.localdomain>
References: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>
	<2083AF7C-6A44-11D9-B052-000D933565E8@mail.nih.gov>
	<Pine.OSX.4.58.0501191224250.15979@skerryvore.dhcp.lbl.gov>
	<1280.198.82.192.209.1106169362.squirrel@webmail.vbi.vt.edu>
	<1106171584.3667.16.camel@localhost.localdomain>
Message-ID: <Pine.LNX.4.58.0501191425250.11486@sumo.ctrl.ucla.edu>

please post the code here.  i've been meaning to add that functionality
into Bio::DB::Biblio::eutils.

do you have a list of which publishers are usable in this way?

-allen

On Wed, 19 Jan 2005, Peter Robinson wrote:

> That sounds extremely interesting and I would appreciate getting a copy
> for testing.
> -peter
> 
> 
> On Wed, 2005-01-19 at 22:16, yguo@vbi.vt.edu wrote:
> > Hi,
> > 
> > While working for BRC project in VBI, I wrote a perl module for retrieving
> > full text pdf files from the publisher website using the information of
> > pubmed abstract page. If anyone wants to use it, you can contact with me.
> > I will see if it is worthwhile to put it in Bioperl.
> > 
> > Yongjian
> > at
> > Virginia Bioinformatics Institute.
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From allenday at ucla.edu  Wed Jan 19 17:31:26 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 17:28:07 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.GSO.4.05.10501191655300.8235-100000@phage.cshl.edu>
References: <Pine.GSO.4.05.10501191655300.8235-100000@phage.cshl.edu>
Message-ID: <Pine.LNX.4.58.0501191429590.11486@sumo.ctrl.ucla.edu>

okay, let me know.  we should probably add some validation tests as well, 
right now i'm just making sure the lines can be processed but don't do any 
typechecking on the document.

Rob, would you mind writing some tests into FeatureIO.t for your 
validation code?

-allen


On Wed, 19 Jan 2005, Scott Cain wrote:

> I just did a cvs update and the last few tests are failing on MacOSX 10.3.
> I'll try to sort it out over the next couple of days.
> 
> Scott
> 
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
> 
> 
> On Tue, 18 Jan 2005, Allen Day wrote:
> 
> > > The first series of errors die because the feature ID=AB000114 in 
> > > t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> > > ','
> > 
> > i'm not getting these errors, are you are in sync with cvs HEAD?
> > 
> > > The second failure is because  hybrid1.gff3 isn't in cvs
> > 
> > gff files are in cvs now.
> > 
> > > 
> > > Rob
> > > 
> > > 
> > > 
> > > % perl -I. -w t/FeatureIO.t
> > > 1..19
> > > ok 1
> > > ok 2
> > > ok 3
> > > ok 4
> > > ok 5
> > > ok 6
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > Use of uninitialized value in substitution (s///) at 
> > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > <GEN5> line 10.
> > > ok 7
> > > ok 8
> > > 
> > > ------------- EXCEPTION  -------------
> > > MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> > > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> > > STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> > > STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> > > STACK toplevel t/FeatureIO.t:83
> > > 
> > > --------------------------------------
> > > 
> > 
> 
From cain at cshl.edu  Wed Jan 19 18:30:35 2005
From: cain at cshl.edu (Scott Cain)
Date: Wed Jan 19 18:26:58 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.LNX.4.58.0501191429590.11486@sumo.ctrl.ucla.edu>
Message-ID: <Pine.GSO.4.05.10501191828040.14290-100000@phage.cshl.edu>

Weird: when I run the FeatureIO test on the command line (via `perl
t/FeatureIO.t`), all tests pass.  When I run it as part of 'make test',
tests 20-22 fail.  Does anyone know why that sort of thing might happen?

thanks,
Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Wed, 19 Jan 2005, Allen Day wrote:

> okay, let me know.  we should probably add some validation tests as well, 
> right now i'm just making sure the lines can be processed but don't do any 
> typechecking on the document.
> 
> Rob, would you mind writing some tests into FeatureIO.t for your 
> validation code?
> 
> -allen
> 
> 
> On Wed, 19 Jan 2005, Scott Cain wrote:
> 
> > I just did a cvs update and the last few tests are failing on MacOSX 10.3.
> > I'll try to sort it out over the next couple of days.
> > 
> > Scott
> > 
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> > 
> > 
> > On Tue, 18 Jan 2005, Allen Day wrote:
> > 
> > > > The first series of errors die because the feature ID=AB000114 in 
> > > > t/data/knownGene.gff3 has several Dbxrefs separated with ';' instead of 
> > > > ','
> > > 
> > > i'm not getting these errors, are you are in sync with cvs HEAD?
> > > 
> > > > The second failure is because  hybrid1.gff3 isn't in cvs
> > > 
> > > gff files are in cvs now.
> > > 
> > > > 
> > > > Rob
> > > > 
> > > > 
> > > > 
> > > > % perl -I. -w t/FeatureIO.t
> > > > 1..19
> > > > ok 1
> > > > ok 2
> > > > ok 3
> > > > ok 4
> > > > ok 5
> > > > ok 6
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > > <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > > <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > > <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > > <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > > <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> > > > Use of uninitialized value in substitution (s///) at 
> > > > Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> > > > Use of uninitialized value in split at Bio/FeatureIO/gff.pm line 593, 
> > > > <GEN5> line 10.
> > > > ok 7
> > > > ok 8
> > > > 
> > > > ------------- EXCEPTION  -------------
> > > > MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> > > > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> > > > STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> > > > STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> > > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> > > > STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> > > > STACK toplevel t/FeatureIO.t:83
> > > > 
> > > > --------------------------------------
> > > > 
> > > 
> > 
> 

From jason.stajich at duke.edu  Wed Jan 19 18:41:00 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 19 18:37:12 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <Pine.GSO.4.05.10501191828040.14290-100000@phage.cshl.edu>
References: <Pine.GSO.4.05.10501191828040.14290-100000@phage.cshl.edu>
Message-ID: <92F4C972-6A73-11D9-B5B1-000393C44276@duke.edu>

test count was off.
1..19
[SNIP]
ok 20
ok 21
ok 22

You can also try
% make test_FeatureIO
to run just a specific test within the test framework to look at things.

Fixed.
On Jan 19, 2005, at 6:30 PM, Scott Cain wrote:

> Weird: when I run the FeatureIO test on the command line (via `perl
> t/FeatureIO.t`), all tests pass.  When I run it as part of 'make test',
> tests 20-22 fail.  Does anyone know why that sort of thing might  
> happen?
>
> thanks,
> Scott
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Wed, 19 Jan 2005, Allen Day wrote:
>
>> okay, let me know.  we should probably add some validation tests as  
>> well,
>> right now i'm just making sure the lines can be processed but don't  
>> do any
>> typechecking on the document.
>>
>> Rob, would you mind writing some tests into FeatureIO.t for your
>> validation code?
>>
>> -allen
>>
>>
>> On Wed, 19 Jan 2005, Scott Cain wrote:
>>
>>> I just did a cvs update and the last few tests are failing on MacOSX  
>>> 10.3.
>>> I'll try to sort it out over the next couple of days.
>>>
>>> Scott
>>>
>>> --------------------------------------------------------------------- 
>>> -
>>> Scott Cain, Ph. D.				 	 cain@cshl.org
>>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
>>> --------------------------------------------------------------------- 
>>> -
>>>
>>>
>>> On Tue, 18 Jan 2005, Allen Day wrote:
>>>
>>>>> The first series of errors die because the feature ID=AB000114 in
>>>>> t/data/knownGene.gff3 has several Dbxrefs separated with ';'  
>>>>> instead of
>>>>> ','
>>>>
>>>> i'm not getting these errors, are you are in sync with cvs HEAD?
>>>>
>>>>> The second failure is because  hybrid1.gff3 isn't in cvs
>>>>
>>>> gff files are in cvs now.
>>>>
>>>>>
>>>>> Rob
>>>>>
>>>>>
>>>>>
>>>>> % perl -I. -w t/FeatureIO.t
>>>>> 1..19
>>>>> ok 1
>>>>> ok 2
>>>>> ok 3
>>>>> ok 4
>>>>> ok 5
>>>>> ok 6
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>> <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>> <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>> <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>> <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>> <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
>>>>> Use of uninitialized value in substitution (s///) at
>>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
>>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
>>>>> 593,
>>>>> <GEN5> line 10.
>>>>> ok 7
>>>>> ok 8
>>>>>
>>>>> ------------- EXCEPTION  -------------
>>>>> MSG: Could not open t/data/hybrid1.gff3: No such file or directory
>>>>> STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
>>>>> STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
>>>>> STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
>>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
>>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
>>>>> STACK toplevel t/FeatureIO.t:83
>>>>>
>>>>> --------------------------------------
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From allenday at ucla.edu  Wed Jan 19 20:14:25 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 20:10:58 2005
Subject: [Bioperl-l] Re: GFF3
In-Reply-To: <92F4C972-6A73-11D9-B5B1-000393C44276@duke.edu>
References: <Pine.GSO.4.05.10501191828040.14290-100000@phage.cshl.edu>
	<92F4C972-6A73-11D9-B5B1-000393C44276@duke.edu>
Message-ID: <Pine.LNX.4.58.0501191714200.11486@sumo.ctrl.ucla.edu>

doh!  my bad.

On Wed, 19 Jan 2005, Jason Stajich wrote:

> test count was off.
> 1..19
> [SNIP]
> ok 20
> ok 21
> ok 22
> 
> You can also try
> % make test_FeatureIO
> to run just a specific test within the test framework to look at things.
> 
> Fixed.
> On Jan 19, 2005, at 6:30 PM, Scott Cain wrote:
> 
> > Weird: when I run the FeatureIO test on the command line (via `perl
> > t/FeatureIO.t`), all tests pass.  When I run it as part of 'make test',
> > tests 20-22 fail.  Does anyone know why that sort of thing might  
> > happen?
> >
> > thanks,
> > Scott
> >
> > ----------------------------------------------------------------------
> > Scott Cain, Ph. D.				 	 cain@cshl.org
> > GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> > ----------------------------------------------------------------------
> >
> >
> > On Wed, 19 Jan 2005, Allen Day wrote:
> >
> >> okay, let me know.  we should probably add some validation tests as  
> >> well,
> >> right now i'm just making sure the lines can be processed but don't  
> >> do any
> >> typechecking on the document.
> >>
> >> Rob, would you mind writing some tests into FeatureIO.t for your
> >> validation code?
> >>
> >> -allen
> >>
> >>
> >> On Wed, 19 Jan 2005, Scott Cain wrote:
> >>
> >>> I just did a cvs update and the last few tests are failing on MacOSX  
> >>> 10.3.
> >>> I'll try to sort it out over the next couple of days.
> >>>
> >>> Scott
> >>>
> >>> --------------------------------------------------------------------- 
> >>> -
> >>> Scott Cain, Ph. D.				 	 cain@cshl.org
> >>> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> >>> --------------------------------------------------------------------- 
> >>> -
> >>>
> >>>
> >>> On Tue, 18 Jan 2005, Allen Day wrote:
> >>>
> >>>>> The first series of errors die because the feature ID=AB000114 in
> >>>>> t/data/knownGene.gff3 has several Dbxrefs separated with ';'  
> >>>>> instead of
> >>>>> ','
> >>>>
> >>>> i'm not getting these errors, are you are in sync with cvs HEAD?
> >>>>
> >>>>> The second failure is because  hybrid1.gff3 isn't in cvs
> >>>>
> >>>> gff files are in cvs now.
> >>>>
> >>>>>
> >>>>> Rob
> >>>>>
> >>>>>
> >>>>>
> >>>>> % perl -I. -w t/FeatureIO.t
> >>>>> 1..19
> >>>>> ok 1
> >>>>> ok 2
> >>>>> ok 3
> >>>>> ok 4
> >>>>> ok 5
> >>>>> ok 6
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>> <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>> <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>> <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>> <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>> <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 590, <GEN5> line 10.
> >>>>> Use of uninitialized value in substitution (s///) at
> >>>>> Bio/FeatureIO/gff.pm line 591, <GEN5> line 10.
> >>>>> Use of uninitialized value in split at Bio/FeatureIO/gff.pm line  
> >>>>> 593,
> >>>>> <GEN5> line 10.
> >>>>> ok 7
> >>>>> ok 8
> >>>>>
> >>>>> ------------- EXCEPTION  -------------
> >>>>> MSG: Could not open t/data/hybrid1.gff3: No such file or directory
> >>>>> STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:314
> >>>>> STACK Bio::FeatureIO::_initialize Bio/FeatureIO.pm:345
> >>>>> STACK Bio::FeatureIO::gff::_initialize Bio/FeatureIO/gff.pm:92
> >>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:268
> >>>>> STACK Bio::FeatureIO::new Bio/FeatureIO.pm:288
> >>>>> STACK toplevel t/FeatureIO.t:83
> >>>>>
> >>>>> --------------------------------------
> >>>>>
> >>>>
> >>>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
From allenday at ucla.edu  Wed Jan 19 20:20:12 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 19 20:16:17 2005
Subject: [Bioperl-l] Finding Alignment overlaps
In-Reply-To: <81da19f3050113225018d1c01a@mail.gmail.com>
References: <81da19f3050113225018d1c01a@mail.gmail.com>
Message-ID: <Pine.LNX.4.58.0501191717040.11486@sumo.ctrl.ucla.edu>

yeah, look in Bio::Range.  specifically:

Geometrical methods
       These methods do things to the geometry of ranges, and return triplets
       (start, end, strand) from which new ranges could be built.

       intersection

         Title    : intersection
         Usage    : ($start, $stop, $strand) = $r1->intersection($r2)
         Function : gives the range that is contained by both ranges
         Args     : a range to compare this one to
         Returns  : nothing if they do not overlap, or the range that they do overlap
         Inherited: Bio::RangeI::intersection


       union

         Title    : union
         Usage    : ($start, $stop, $strand) = $r1->union($r2);
                  : ($start, $stop, $strand) = Bio::Range->union(@ranges);
         Function : finds the minimal range that contains all of the ranges
         Args     : a range or list of ranges
         Returns  : the range containing all of the ranges
         Inherited: Bio::RangeI::union

-allen


On Fri, 14 Jan 2005, zayed albertyn wrote:

> Dear Bioperl Community
> 
> I have output from an alignment program that produces coordinates with
> reference to the query sequence e.g.
> 
> 3665384,3665702-1770163,1770480
> 3665130,3665474-3695657,3696000
> 3665115,3665357-1770508,1770749
> 
> Each line represent <querybegin>,<queryend>-<targetbegin>,<targetend>
> 
> I know how to add each line as a sequence feature using
> Bio::Seqfeature::Generic. Is there a bioperl class or associated
> method that can be used for determing possible overlaps in these
> alignments?
> Eventually I would like to find all overlaps and merge them if possible.
> 
> Thanks for the help,
> Zayed
> 
> 
> 
> 
> 
From yguo at vbi.vt.edu  Wed Jan 19 21:12:47 2005
From: yguo at vbi.vt.edu (yguo@vbi.vt.edu)
Date: Wed Jan 19 21:10:29 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the 
	publisherwebsite.
Message-ID: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>

Ok, I will first add more comments and instructions to the module and post
the code to this list. I can make this done before the weekend.

I donot have the publisher list for successful retrieval. But it is a good
idea to make one.


Yongjian
at
Virginia Bioinformatics Institute

> please post the code here.  i've been meaning to add that functionality
> into Bio::DB::Biblio::eutils.
>
> do you have a list of which publishers are usable in this way?
>
> -allen
>
> On Wed, 19 Jan 2005, Peter Robinson wrote:
>
>> That sounds extremely interesting and I would appreciate getting a copy
>> for testing.
>> -peter
>>
>>
>> On Wed, 2005-01-19 at 22:16, yguo@vbi.vt.edu wrote:
>> > Hi,
>> >
>> > While working for BRC project in VBI, I wrote a perl module for
>> retrieving
>> > full text pdf files from the publisher website using the information
>> of
>> > pubmed abstract page. If anyone wants to use it, you can contact with
>> me.
>> > I will see if it is worthwhile to put it in Bioperl.
>> >
>> > Yongjian
>> > at
>> > Virginia Bioinformatics Institute.
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l@portal.open-bio.org
>> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From sdavis2 at mail.nih.gov  Wed Jan 19 22:06:28 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Jan 19 22:02:56 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisherwebsite.
References: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>
Message-ID: <001c01c4fe9d$08c64a70$7d75f345@WATSON>

----- Original Message ----- 
From: <yguo@vbi.vt.edu>
To: <bioperl-l@portal.open-bio.org>
Sent: Wednesday, January 19, 2005 9:12 PM
Subject: Re: [Bioperl-l] Automatic retrieve pdf file from the 
publisherwebsite.


> Ok, I will first add more comments and instructions to the module and post
> the code to this list. I can make this done before the weekend.
>
> I donot have the publisher list for successful retrieval. But it is a good
> idea to make one.

Wouldn't such a publisher list depend somewhat on institutional 
subscriptions--just curious?

Sean

> Yongjian
> at
> Virginia Bioinformatics Institute


From allenday at ucla.edu  Thu Jan 20 00:23:46 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu Jan 20 00:20:07 2005
Subject: [Bioperl-l] Automatic retrieve pdf file from the publisherwebsite.
In-Reply-To: <001c01c4fe9d$08c64a70$7d75f345@WATSON>
References: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>
	<001c01c4fe9d$08c64a70$7d75f345@WATSON>
Message-ID: <Pine.LNX.4.58.0501192119520.17383@sumo.ctrl.ucla.edu>

On Wed, 19 Jan 2005, Sean Davis wrote:

> ----- Original Message ----- 
> From: <yguo@vbi.vt.edu>
> To: <bioperl-l@portal.open-bio.org>
> Sent: Wednesday, January 19, 2005 9:12 PM
> Subject: Re: [Bioperl-l] Automatic retrieve pdf file from the 
> publisherwebsite.
> 
> 
> > Ok, I will first add more comments and instructions to the module and post
> > the code to this list. I can make this done before the weekend.
> >
> > I donot have the publisher list for successful retrieval. But it is a good
> > idea to make one.
> 
> Wouldn't such a publisher list depend somewhat on institutional 
> subscriptions--just curious?

sure, there are two separate issues:

#1 is the ip/gateway/proxy allowed to access the host with the resource.
#2 if so, is the module able to find the resource on the host.

i was asking about #2.  of course #1 depends on where you are.  this could
make it difficult to do extensive unit tests.

-allen

> Sean
> 
> > Yongjian
> > at
> > Virginia Bioinformatics Institute
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From dlondon at ebi.ac.uk  Thu Jan 20 12:58:59 2005
From: dlondon at ebi.ac.uk (Darin London)
Date: Thu Jan 20 13:11:56 2005
Subject: [Bioperl-l] BOSC 2005
Message-ID: <20050120175859.GA7254@parrot.ebi.ac.uk>

 {Please pass the word!}
 
 MEETING ANNOUNCEMENT & CALL FOR SPEAKERS

 The 6th annual Bioinformatics Open Source Conference (BOSC'2005) is organized by the
 not-for-profit Open Bioinformatics Foundation. The meeting will take place
 June 23-24, 2005 in Detroit, Michigan, USA, and is one of several Special Interest
 Group (SIG) meetings occurring in conjunction with the 13th International Conference
 on Intelligent Systems for Molecular Biology.

 see http://www.iscb.org/ismb2005 for more information.

 Because of the power of many Open Source bioinformatics packages in
 use by the Research Community today, it is not too presumptuous to say 
 that the work of the Open Source Bioinformatics Community represents 
 the cutting edge of Bioinformatics in general. This has been repeatedly 
 demonstrated by the quality of presentations at previous BOSC conferences.
 This year, at BOSC 2006, we want to continue this tradition of excellence, 
 while presenting this message to a wider part of the Research Community.  
 Please, pass this message on to anyone you know that is interested in
 Bioinformatics software. 


 BOSC PROGRAM & CONTACT INFO
 
 * Web: http://www.open-bio.org/bosc2005/
 * Email: bosc@open-bio.org
 
 FEES

  TO BE ANNOUNCED. Watch the bosc website for more information.
 
 
 SPEAKERS & ABSTRACTS WANTED
 
 The program committee is currently seeking abstracts for talks at BOSC 
 2005. BOSC is a great opportunity for you to tell the community about 
 your use, development, or philosophy of open source software development 
 in bioinformatics. The committee will select several submitted abstracts 
 for 25-minute talks and others for shorter "lightning" talks. Accepted 
 abstracts will be published on the BOSC web site.
 
 If you are interested in speaking at BOSC 2005, 
 please send us before April 26, 2005:
 
 * an abstract (no more than a few paragraphs)
 * a URL for the project page, if applicable
 * information about the open source license used for your software or 
   your release plans.

 Abstracts will be accepted for submission until April 26, 2005.
 Abstracts chosen for presentation will be announced May 12, 2005 
 (before the ISMB Early Registration Deadline).

 LIGHTNING-TALK SPEAKERS WANTED!
 
 The program committee is currently seeking speakers for the lightning 
 talks at BOSC 2005. Lightning talks are quick - only five minutes 
 long - and a great opportunity for you to give people a quick 
 summary of your open source project, code, idea, or vision of the future.

 If you are interested in giving a lightning talk at BOSC 2005, 
 please send us:

 * a brief title and summary (one or two lines)
 * a URL for the project page, if applicable
 * information about the open source license used for your software or 
   your release plans.

 We will accept entries on-line until BOSC starts, but
 space for demos and lightning talks is limited.<br/>
    
 SOFTWARE DEMONSTRATIONS WANTED!
 If you are involved in the development of Open Source Bioinformatics Software, 
 you are invited to provide a short demonstration to attendees of BOSC 2005.

 If you are interested in giving a software demonstration at BOSC 2005,
 please send us:

 * a brief title and summary (one or two lines)
 * a URL for the project page, if applicable
 * Internet connectivity requirements (e.g. website Application served on the 
   world wide web, or web based client application).

   We will accept entries on-line until the BOSC starts, but
   space for demos and lightning talks is limited. 

** Because the mission of the OBF is to promote Open Source software, we will favor submissions for
   projects that apply a recognized Open Source License, or adhere to the general Open Source Philosophy.
   See the following websites for further details:
   href="http://www.opensource.org/licenses/
   href="http://www.opensource.org/docs/definition.php


  SESSION CHAIRS WANTED
  If you would like to be involved BOSC 2005, we invite you to chair a session.  This will 
  not require much of your time.  You will be given a schedule of presenters during your session. 
  You simply introduce each speaker, and manage the time of their presentation (25 minutes for full 
  presentations, 5-10 minutes for lightning talks/demos, depending on the number of entries).

  If you are interested in chairing a session, please send us your name and affiliation (if applicable).

-- 
cheers,

Darin London dlondon@ebi.ac.uk    European Bioinformatics Institute, 
+44 (0)1223 49 2566               Wellcome Trust Genome Campus, Hinxton 
+44 (0)1223 49 4468 (fax)         Cambridgeshire CB10 1SD, UK
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050120/9bb1da79/attachment.bin
From gyang at plantbio.uga.edu  Thu Jan 20 14:30:29 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Thu Jan 20 14:27:12 2005
Subject: [Bioperl-l] standaloneblast large seq retrieving
In-Reply-To: <Pine.LNX.4.58.0501192119520.17383@sumo.ctrl.ucla.edu>
Message-ID: <20050120143029.2ce67836@dogwood.plantbio.uga.edu>

Hi,all,
I was trying to use the following sub to get seq after a standaloneblast. It worked with DB with short entries (~200kb), but it failed to work with a DB with much longer entries (up to ~30 Mb an entry). Can anybody give me a hint?

 sub getseq {
my $name=$_[0];
my $file_name = $_[1];
my $inx=Bio::Index::Fasta->new (-filename => $file_name.".idx",
                                -write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index($file_name);
$seq = $inx->fetch($name);  
return $seq;
	    }

Thanks,
Yang


From talcon at iastate.edu  Thu Jan 20 18:34:06 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Thu Jan 20 18:30:55 2005
Subject: [Bioperl-l] accessing GenBank
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA/I1YTA6SREyZZhJZcK6V6QEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA/I1YTA6SREyZZhJZcK6V6QEAAAAA@ukonline.co.uk>
Message-ID: <41F03FEE.6080907@iastate.edu>

Thanks Barry and Nathan.  I installed version 1.4, and the remote 
GenBank access now works.

Tim


Nathan Haigh wrote:

>You should double check the versions you have installed on both systems, it may well be that one is out-of-date with respect to
>connecting to genbank and the other is not. If you do indeed have a version of bioperl <1.4 installed on your windows machine,
>follow my instructions to install 1.4 (1.5 should be available via PPM shortly after it's official release - some time soon!)
>
>Nathan
>
>  
>
>>-----Original Message-----
>>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
>>Sent: 18 January 2005 22:20
>>To: bioperl-l@portal.open-bio.org
>>Subject: [Bioperl-l] accessing GenBank
>>
>>I seem unable to access GenBank.  When running bptutorial.exe, it seems
>>like all the other examples run fine except that one.  Anyone know why
>>that would be?  I'm using ActivePerl on Windows XP.  I have whichever
>>version of bioperl is the current default using ppm (it's at least
>>1.0).   When I run the exact same code from my campus Unix account, it
>>works fine.
>>
>>Tim
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>---
>>avast! Antivirus: Inbound message clean.
>>Virus Database (VPS): 0503-0, 18/01/2005
>>Tested on: 19/01/2005 08:41:49
>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>http://www.avast.com
>>
>>
>>    
>>
>
>---
>avast! Antivirus: Outbound message clean.
>Virus Database (VPS): 0503-0, 18/01/2005
>Tested on: 19/01/2005 09:05:03
>avast! is copyright (c) 2000-2003 ALWIL Software.
>http://www.avast.com
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>

From talcon at iastate.edu  Thu Jan 20 18:35:55 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Thu Jan 20 18:33:53 2005
Subject: [Bioperl-l] Installing Bioperl using PPM
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAfnQTzm+INUebF8JQZiYh7wEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAfnQTzm+INUebF8JQZiYh7wEAAAAA@ukonline.co.uk>
Message-ID: <41F0405B.2050301@iastate.edu>

Typing "install 1.4" didn't work, but typing "install Bioperl-1.4" did.  
Thanks.

Tim


Nathan Haigh wrote:

>Please read this even if you think you know how to install modules via PPM!
>
>This is just a note on what to do to install the latest version of Bioperl (or any other module) via PPM:
>Because of inconsistencies (see ActiveStates comments on this at the bottom) with the way PPM determines modules names/versions etc
>it is NOT WISE to install modules by going:
>    "install bioperl"
>OR
>    "upgrade bioperl"
>
>You are very likely NOT to install the most recent version of a particular module by doing this! Instead you should do the
>following:
>    "search bioperl"
>This gives a numbered list of the available modules in the repository's searched by your PPM (you can add additional repositories in
>addition to the defaults given during installation - and this is advised). Chose the number of the correct module to install from
>the list and do:
>    "install <number>"
>Where <number> is the number of the module you wish to install. This way you will ensure you install the correct module/version YOU
>want not the arbitrary module that PPM seems to want to install most of the time!
>
>As soon as the official Bioperl 1.5 is released, I'll make the ppd and tar.gz files so it can be installed via PPM.
>
>Nathan
>
>ActiveStates comment on PPM's inconsistencies for determining module name/versions:
>"Sorry for the confusion, ppm3 is kind of inconsistent in spots."
>
>
>  
>
>>-----Original Message-----
>>From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
>>Sent: 18 January 2005 22:20
>>To: bioperl-l@portal.open-bio.org
>>Subject: [Bioperl-l] accessing GenBank
>>
>>I seem unable to access GenBank.  When running bptutorial.exe, it seems
>>like all the other examples run fine except that one.  Anyone know why
>>that would be?  I'm using ActivePerl on Windows XP.  I have whichever
>>version of bioperl is the current default using ppm (it's at least
>>1.0).   When I run the exact same code from my campus Unix account, it
>>works fine.
>>
>>Tim
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>---
>>avast! Antivirus: Inbound message clean.
>>Virus Database (VPS): 0503-0, 18/01/2005
>>Tested on: 19/01/2005 08:41:49
>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>http://www.avast.com
>>
>>
>>    
>>
>
>---
>avast! Antivirus: Outbound message clean.
>Virus Database (VPS): 0503-0, 18/01/2005
>Tested on: 19/01/2005 09:00:08
>avast! is copyright (c) 2000-2003 ALWIL Software.
>http://www.avast.com
>
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>

From peter.robinson at charite.de  Fri Jan 21 08:30:14 2005
From: peter.robinson at charite.de (Robinson, Peter)
Date: Fri Jan 21 08:26:53 2005
Subject: [Bioperl-l] dssp script
Message-ID: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de>

Dear BioPerlers,

I am writing a script to use the BioPerl DSSP module to print out a list of phi and psi angles for all applicable residues  of all chains. Although the results are correct, I get the following error message at the end of each chain:

Argument "" isn't numeric in numeric eq (==) at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1168.

and I am not quite sure where it is coming from. Perhaps I am using the wrong part of the API, but I am trying to get a list of all residues for each chain as follows:

foreach my $ch (@chains) {
  my $ss_elements_pts = $dssp->secBounds($ch);
  print "Chain $ch:\n";
  my $pos = 0;
  my $max = 0;
  foreach my $stretch (@{$ss_elements_pts}) {
    my $start = $stretch->[0];
    my $end = $stretch->[1]; 
    if ($end =~ m/(\d+)/) { $end = $1; }
   
    if ($end  > $max) { $max = $end; }
  }
  ## END is now the last residue in this chain
  for my $res (1..$max) {
    my $residueID = $res . ":" . $ch;
    my ($phi,$psi,$SS,$SSsum,$AA);
    eval { $phi = $dssp->resPhi($residueID);};
	etc.

The full script is appended to the bottom of this mail.


I also noticed what might be a minor bug in the module DSSP/Res.pm; when I use dsspcmbi to analyze a PDB file, it produces a results file with an empty last line. This causes a crash:

Use of uninitialized value in chomp at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1284, <GEN1> line 955.


 If I manually remove this last empty line, there was no error. By adding the following line at Res.pm l.1284, you can fix the problem:


 while ( chomp( $cur = <$file> ) ) {
      next if ($cur =~ m/^\s*$/);  *********************************************
	$res_num = substr( $cur, 0, 5 );
	$res_num =~ s/\s//g;
	$self->{ 'Res' }->[ $res_num ] = &_parseResLine( $cur );
    }
}


Thanks in adavance for any tips! Peter
Peter N. Robinson, M.D.
Institute of Medical Genetics
Charit? University Hospital
Augustenburger Platz 1
13353 Berlin
Germany
++49-30-450 569124
peter.robinson@charite.de
http://www.charite.de/ch/medgen/robinson
Beware of bugs in the above code; I have only proved it correct, not tried it. -Donald Knuth, computer scientist (1938- )

########################

#!/usr/bin/perl -w
use IO::File;
use Bio::Structure::SecStr::DSSP::Res;
use Data::Dumper;


=pod
parseDSSP.pl
Script to parse the output of DSSP using the BioPerl module
Bio::Structure::SecStr::DSSP::Res. To use it, process a PDB
file with dssp or dsspcmbi, and pass the resulting file to 
this script. For more information on dssp and BioPerl see the
module documentation at http://bioperl.org

@email peter.robinson@charite.de
21 January, 2005

=cut


my $file = "pdb43ca.dssp";
my $dssp = new Bio::Structure::SecStr::DSSP::Res('-file'=> "$file");

my $pdbID = $dssp->pdbID();
my $auth  = $dssp->pdbAuthor();
my $cmpd = $dssp->pdbCompound();
my $pdb_date = $dssp->pdbDate();
my $header = $dssp->pdbHeader();
my $pdbSource = $dssp->pdbSource();

print "PDB entry $pdbID \n\tauthor:\t$auth",
  "\n\tCompound:\t$cmpd",
  "\n\tDate:\t$pdb_date",
  "\n\tHeader:\t$header",
  "\n\tsource:\t$pdbSource\n\n";

my $totalRes = $dssp->numResidues();
print "Total residue count (all chains):$totalRes\n";


my $surArea= $dssp->totSurfArea();
print "Total accessible surface area:\t$surArea  (square Ang)\n";


my $chainRef = $dssp->chains();
my @chains = sort  @{$chainRef};
print "Chain[s]:\n";
foreach my $ch (@chains) {
  print "\t$ch";
}
print "\n";

my $hb = $dssp->hBonds();
print "H BONDS.\n";
print "TYPE O(I)-->H-N(J): $hb->[0]\n",
   "IN PARALLEL BRIDGES: $hb->[1]\n",
   "IN ANTIPARALLEL BRIDGES $hb->[2]\n",
   "TYPE O(I)-->H-N(I-5) $hb->[3]\n",
   "TYPE O(I)-->H-N(I-4) $hb->[4]\n",
   "TYPE O(I)-->H-N(I-3) $hb->[5]\n",
   "TYPE O(I)-->H-N(I-2) $hb->[6]\n",
   "TYPE O(I)-->H-N(I-1) $hb->[7]\n",
   "TYPE O(I)-->H-N(I+0) $hb->[8]\n",
   "TYPE O(I)-->H-N(I+1) $hb->[9]\n",
   "TYPE O(I)-->H-N(I+2) $hb->[10]\n",
   "TYPE O(I)-->H-N(I+3) $hb->[11]\n",
   "TYPE O(I)-->H-N(I+4) $hb->[12]\n",
   "TYPE O(I)-->H-N(I+5) $hb->[13]\n",
  "\n";

   
foreach my $ch (@chains) {
  my $ss_elements_pts = $dssp->secBounds($ch);
  print "Chain $ch:\n";
  my $pos = 0;
  my $max = 0;
  foreach my $stretch (@{$ss_elements_pts}) {
    my $start = $stretch->[0];
    my $end = $stretch->[1]; 
    if ($end =~ m/(\d+)/) { $end = $1; }
   
    if ($end  > $max) { $max = $end; }
  }
  ## END is now the last residue in this chain
  for my $res (1..$max) {
    my $residueID = $res . ":" . $ch;
    my ($phi,$psi,$SS,$SSsum,$AA);
    eval { $phi = $dssp->resPhi($residueID);};
    eval { $psi = $dssp->resPsi($residueID);};
    eval { $SS = $dssp->resSecStr($residueID);};
    eval { $SSsum = $dssp->resSecStrSum($residueID);};
    $AA = $dssp->resAA($residueID);
    $phi = $phi || "n/a";
    $psi = $psi || "n/a";
    $SS = $SS || "-";
    my $SSclass;
    if ($SSsum eq "H") { $SSclass = "helix"; }
    elsif ($SSsum eq "T") { $SSclass = "turn"; }
    elsif ($SSsum eq "B") { $SSclass = "beta"; }
    else { $SSclass = $SSsum; }
    print "$residueID) [$AA] phi:$phi psi:$psi SecStruct: $SS ($SSclass) \n";
  }
}


From cjfields at uiuc.edu  Fri Jan 21 09:44:39 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri Jan 21 09:43:02 2005
Subject: [Bioperl-l] Installing Bioperl using PPM
In-Reply-To: <41F0405B.2050301@iastate.edu>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAfnQTzm+INUebF8JQZiYh7wEAAAAA@ukonline.co.uk>
	<41F0405B.2050301@iastate.edu>
Message-ID: <6.1.1.1.2.20050121084352.01a67ec8@express.cites.uiuc.edu>

I think he means that you should do the following:

1) use "search bioperl"
2) pick the number of the correct bioperl from the list (NOT the version 
number) and type "install #"

Here's what it looks like get when I use PPM3

C:\Documents and Settings\Chris Fields>ppm
PPM - Programmer's Package Manager version 3.1.
Copyright (c) 2001 ActiveState SRL. All Rights Reserved.

Entering interactive shell. Using Term::ReadLine::Stub as readline library.

Type 'help' to get started.

ppm> rep
Repositories:
[1] bioperl
[ ] ActiveState Package Repository
[ ] ActiveState PPM2 Repository
[ ] gmod
[ ] kobes
[ ] local
ppm> search bioperl
Searching in Active Repositories
   1. Bioperl-1.2     [1.2] Bioperl 1.2 PPM3 Archive
   2. Bioperl-1.2.1 [1.2.1] Bioperl 1.2.1 PPM3 Archive
   3. Bioperl-1.2.3 [1.2.3] Bioperl 1.2.3 PPM3 Archive
   4. Bioperl-1.4     [1.4] Bioperl 1.4 PPM3 Archive
ppm> install 4

<Should install bioperl 1.4 unless already installed>
....


Chris


At 05:35 PM 1/20/2005, Tim Alcon wrote:
>Typing "install 1.4" didn't work, but typing "install Bioperl-1.4" did.
>Thanks.
>
>Tim
>
>
>
>Nathan Haigh wrote:
>
>>Please read this even if you think you know how to install modules via PPM!
>>
>>This is just a note on what to do to install the latest version of 
>>Bioperl (or any other module) via PPM:
>>Because of inconsistencies (see ActiveStates comments on this at the 
>>bottom) with the way PPM determines modules names/versions etc
>>it is NOT WISE to install modules by going:
>>    "install bioperl"
>>OR
>>    "upgrade bioperl"
>>
>>You are very likely NOT to install the most recent version of a 
>>particular module by doing this! Instead you should do the
>>following:
>>    "search bioperl"
>>This gives a numbered list of the available modules in the repository's 
>>searched by your PPM (you can add additional repositories in
>>addition to the defaults given during installation - and this is 
>>advised). Chose the number of the correct module to install from
>>the list and do:
>>    "install <number>"
>>Where <number> is the number of the module you wish to install. This way 
>>you will ensure you install the correct module/version YOU
>>want not the arbitrary module that PPM seems to want to install most of 
>>the time!
>>
>>As soon as the official Bioperl 1.5 is released, I'll make the ppd and 
>>tar.gz files so it can be installed via PPM.
>>
>>Nathan
>>
>>ActiveStates comment on PPM's inconsistencies for determining module 
>>name/versions:
>>"Sorry for the confusion, ppm3 is kind of inconsistent in spots."
>>
>>
>>
>>
>>>-----Original Message-----
>>>From: bioperl-l-bounces@portal.open-bio.org 
>>>[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
>>>Sent: 18 January 2005 22:20
>>>To: bioperl-l@portal.open-bio.org
>>>Subject: [Bioperl-l] accessing GenBank
>>>
>>>I seem unable to access GenBank.  When running bptutorial.exe, it seems
>>>like all the other examples run fine except that one.  Anyone know why
>>>that would be?  I'm using ActivePerl on Windows XP.  I have whichever
>>>version of bioperl is the current default using ppm (it's at least
>>>1.0).   When I run the exact same code from my campus Unix account, it
>>>works fine.
>>>
>>>Tim
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l@portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>---
>>>avast! Antivirus: Inbound message clean.
>>>Virus Database (VPS): 0503-0, 18/01/2005
>>>Tested on: 19/01/2005 08:41:49
>>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>>http://www.avast.com
>>>
>>>
>>>
>>
>>---
>>avast! Antivirus: Outbound message clean.
>>Virus Database (VPS): 0503-0, 18/01/2005
>>Tested on: 19/01/2005 09:00:08
>>avast! is copyright (c) 2000-2003 ALWIL Software.
>>http://www.avast.com
>>
>>
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

From raoul.bonnal at itb.cnr.it  Fri Jan 21 10:11:11 2005
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Fri Jan 21 10:08:01 2005
Subject: [Bioperl-l] bioperl-1.5.0 RC2
In-Reply-To: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
References: <891FA5D7-64D6-11D9-A0F3-000393C44276@duke.edu>
Message-ID: <1106320271.7583.10.camel@localhost>

This is perl, v5.8.4 built for i386-linux-thread-multi

Debian Unstable/Amd Athlon XP

PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
"test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/AAChange...................ok
t/AAReverseMutate............ok
t/AlignIO....................ok
t/AlignStats.................ok
t/AlignUtil..................ok
t/Allele.....................ok
t/Alphabet...................ok
t/Annotation.................ok
t/AnnotationAdaptor..........ok
t/Assembly...................ok
t/Biblio.....................ok
t/Biblio_biofetch............ok
t/Biblio_eutils..............ok
t/BiblioReferences...........ok
t/BioDBGFF...................ok
t/BioFetch_DB................ok
t/BioGraphics................ok
t/BlastIndex.................ok
t/BPbl2seq...................ok
t/BPlite.....................ok
t/BPpsilite..................ok
t/Chain......................ok
t/cigarstring................ok
t/ClusterIO..................ok
t/Coalescent.................ok
t/CodonTable.................ok
t/consed.....................ok
t/CoordinateGraph............ok
t/CoordinateMapper...........ok
t/Correlate..................ok
t/CytoMap....................ok
t/DB.........................ok
t/DBCUTG.....................ok
        22/24 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/DBFasta....................ok
t/DNAMutation................ok
t/Domcut.....................ok
        22/25 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/ECnumber...................ok
t/ELM........................ok
t/EMBL_DB....................ok
t/EMBOSS_Tools...............ok
t/EncodedSeq.................ok
t/ePCR.......................ok
t/ESEfinder..................error is 0
t/ESEfinder..................ok
        10/12 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/est2genome.................ok
t/Exception..................ok
t/Exonerate..................ok
t/flat.......................ok
t/FootPrinter................ok
t/game.......................ok
t/GDB........................ok
t/GeneCoordinateMapper.......ok
t/Geneid.....................ok
t/Genewise...................ok
        2/51 skipped:
t/Genomewise.................ok
t/Genpred....................ok
t/GFF........................ok
t/GOR4.......................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/GOterm.....................ok
t/GuessSeqFormat.............ok
t/hmmer......................ok
t/HNN........................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/HtSNP......................ok
t/Index......................ok
t/InstanceSite...............ok
t/InterProParser.............ok
t/IUPAC......................ok
t/largefasta.................ok
t/LargeLocatableSeq..........ok
t/largepseq..................ok
t/LinkageMap.................ok
t/LiveSeq....................ok
t/LocatableSeq...............ok
t/Location...................ok
t/LocationFactory............ok
t/LocusLink..................ok
t/lucy.......................ok
t/Map........................ok
t/MapIO......................ok
t/Matrix.....................ok
t/Measure....................ok
t/MeSH.......................ok
t/MetaSeq....................ok
t/MicrosatelliteMarker.......ok
t/MiniMIMentry...............ok
t/MitoProt...................ok
        5/8 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Molphy.....................ok
t/multiple_fasta.............ok
t/Mutation...................ok
t/Mutator....................ok
t/NetPhos....................ok
t/Node.......................ok
t/OddCodes...................ok
t/OMIMentry..................ok
t/OMIMentryAllelicVariant....ok
t/OMIMparser.................ok
t/Ontology...................ok
t/OntologyEngine.............ok
t/OntologyStore..............ok
t/PAML.......................ok
t/Perl.......................ok
t/phd........................ok
t/Phenotype..................ok
t/PhylipDist.................ok
t/pICalculator...............ok
t/Pictogram..................SVG not installed, skipping tests at
t/Pictogram.t line 29.
t/Pictogram..................ok
t/PopGen.....................ok
t/PopGenSims.................ok
t/primaryqual................ok
t/PrimarySeq.................ok
t/primedseq..................ok
t/Primer.....................ok
t/primer3....................ok
t/Promoterwise...............ok
t/ProtDist...................ok
t/protgraph..................Class::AutoClass or Clone not installed.
This means that the module is not usable. Skipping tests at
t/protgraph.t line 23.
t/protgraph..................ok
t/ProtMatrix.................ok
t/ProtPsm....................ok
t/psm........................ok
t/QRNA.......................ok
t/qual.......................ok
t/RandDistFunctions..........ok
t/RandomTreeFactory..........ok
t/Range......................ok
t/RangeI.....................ok
t/RefSeq.....................ok
        10/13 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Registry...................ok
t/Relationship...............ok
t/RelationshipType...........ok
t/RemoteBlast................ok
        4/6 skipped: to avoid timeout
t/RepeatMasker...............ok
t/RestrictionAnalysis........ok
t/RestrictionEnzyme..........ok
t/RestrictionIO..............ok
t/RNAChange..................ok
t/RootI......................ok
t/RootIO.....................ok
t/RootStorable...............ok
t/Scansite...................ok
t/scf........................ok
t/SearchDist.................ok
t/SearchIO...................ok
t/Seq........................ok
t/SeqAnalysisParser..........ok
t/SeqBuilder.................ok
t/SeqDiff....................ok
t/SeqFeatCollection..........ok
t/SeqFeature.................ok
t/seqfeaturePrimer...........ok
t/SeqIO......................XML::DOM::XPath not found - skipping
interpro tests
XML::SAX::Base or XML::SAX or XML::SAX::Writer not found - skipping
BSML_SAX tests
t/SeqIO......................ok
t/SeqPattern.................ok
t/seqread_fail...............ok
t/SeqStats...................ok
t/SequenceFamily.............ok
t/sequencetrace..............ok
t/SeqUtils...................ok
t/seqwithquality.............ok
t/SeqWords...................ok
t/Sigcleave..................ok
t/Sim4.......................ok
t/SimilarityPair.............ok
t/SimpleAlign................ok
t/simpleGOparser.............ok 88/101Use of uninitialized value in hash
element
at /home/febo/DownLoad/bioperl-1.5.0-RC2/blib/lib/Bio/Ontology/OntologyStore.pm line 263, <GEN3> line 11.
t/simpleGOparser.............ok
t/singlet....................ok
t/sirna......................ok
t/SiteMatrix.................ok
t/SNP........................ok
t/Sopma......................ok
        12/15 skipped: tests which require remote servers - set env
variable BIOPERLDEBUG to test
t/Species....................ok
t/splicedseq.................ok
t/StandAloneBlast............ok
t/StructIO...................ok
t/Structure..................ok
t/Swiss......................ok
t/Symbol.....................ok
t/TagHaplotype...............ok
t/Taxonomy...................ok
        7/8 skipped: to avoid blocking
t/Tempfile...................ok
t/Term.......................ok
t/tinyseq....................ok
t/Tools......................ok
t/Tree.......................ok
t/TreeBuild..................ok
t/TreeIO.....................ok
        2/50 skipped: SVG::Graph output, SVG::Graph not installed
t/trim.......................ok
t/tRNAscanSE.................ok
t/tutorial...................ok 18/21Use of uninitialized value in print
at /home/febo/DownLoad/bioperl-1.5.0-RC2/blib/lib/bptutorial.pl line
4039, <GEN21>line 934.
t/tutorial...................ok
t/UCSCParsers................ok
t/Unflattener................ok
t/Unflattener2...............ok
t/UniGene....................ok
t/Variation_IO...............ok
t/WABA.......................ok
t/XEMBL_DB...................ok
All tests successful, 116 subtests skipped.
Files=193, Tests=8942, 298 wallclock secs (114.66 cusr +  6.80 csys =
121.46 CPU)


by RJP

From raoul.bonnal at itb.cnr.it  Fri Jan 21 10:45:14 2005
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Fri Jan 21 10:41:33 2005
Subject: [Bioperl-l] gff -> match/hsp in gbrowse
Message-ID: <1106322314.7583.27.camel@localhost>

Dear Community,
today I have upgraded my bioperl installation to 1.5.0-rc2.
How can I configure my gbrose db.conf to display match/hsp from
myfile.gff ( default bioperl 1.5.0-rc2 format ) ?
Gbrowse's tutorial describe the configuration of the previous format and
it doesn't work for gff3.

Is it possible to filter hsp for every match by rank or score from
gbrowser db.conf file ? Can you post a working example, plez?


tnx in advance.

by RJP

From jdw at ou.edu  Fri Jan 21 11:47:22 2005
From: jdw at ou.edu (James D. White)
Date: Fri Jan 21 11:43:24 2005
Subject: [Bioperl-l] Re: Bioperl-l Digest, Vol 21, Issue 12
References: <200501161451.j0GEpNKr028052@portal.open-bio.org>
Message-ID: <41F1321A.72FB2289@ou.edu>

Starting with:

$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;

The slashes in tr/// confused the Perl parser.  You need to use
different delimiters for the m// operator (the m is implied by //)
and the tr/// operator.  Also the tr/// operator does not use the
i flag, so lower case needs to be handled explicitly.  So let's
try the following:

$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i;

This gives the error:
Can't modify constant item in transliteration (tr///) at (re_eval 1)
line 1, near "tr/ATCGatcg/TAGCtagc/)"

Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of
\1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern
interpolation", p. 213) Inside the evaluated CODE, \2 is a
constant, not the value of the second captured substring.  Also I'm
not sure what modifying $2 would do, so let's try:

$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i;

This works, but I would get rid of the leading "\S+" and trailing
".*".  The ".*" adds nothing useful, so just drop it.  You
probably don't need the leading "\S+", because the pattern is not
anchored to the beginning of the string with "^".  The leading
"\S+" gobbles up the entire string, forcing the match to backtrack
character by character from the end.  It also forces the substring
match saved in $1 to occur after the first character.  Unless you
never want $1 to consider the first character, just drop the
leading "\S+".  If you don't want to search the first character,
then just use "\S".  This results in:

$regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;

Finally I would probably change the remaining ".*" to ".*?".  If
you search with ".*" on a long sequence which could contain
multiple sequences of interest, the ".*" pattern will match the rest
of the sequence and force backtracking to match the first occurrence
of "$1$2" with the last occurrence of "revcomp($2)$1".  If you use
".*?", you match the first occurrence of "$1$2" with the nearest
occurrence of "revcomp($2)$1".  This results in the final regular
expression:

$regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;


> Date: Fri, 14 Jan 2005 14:12:46 -0500
> From: Guojun Yang <gyang@plantbio.uga.edu>
> Subject: [Bioperl-l] regular expression help!
> To: bioperl-l@portal.open-bio.org
> Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>
> Content-Type: text/plain;       charset="us-ascii"
>
> Hi, Everybody,
> I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
> The regex I have is:
> $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
> Thank you,
> Yang
>

--
James D. White   (jdw@ou.edu)
Director of Bioinformatics
Department of Chemistry and Biochemistry/ACGT
University of Oklahoma
101 David L. Boren Blvd., SRTC 2100
Norman, OK 73019
Phone: (405) 325-4912, FAX: (405) 325-7762


From brian_osborne at cognia.com  Fri Jan 21 11:48:52 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Jan 21 11:45:18 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
	SwissProtfile
In-Reply-To: <BE45276F7BBEFE49B80193E22C5B1FED01B05FC0@iu-mssg-mbx04.exchange.iu.edu>
Message-ID: <GAEDKMGOKFBLJPKCLKCCGEBMEIAA.brian_osborne@cognia.com>

Kenny,

Did you take a look at Bio/Index/Swissprot.pm? What's important for you will
be building the index using the keys you're interested in as opposed to the
default key, using the id_parser method. See the Bio::Index section in the
bptutorial for an example.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily,
Kenneth Michael
Sent: Wednesday, January 19, 2005 11:49 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
SwissProtfile


I want to work with a local copy of the SwissProt database, and need to
search through all of the entries. I only see methods to return sequences by
accession. However, I cannot use just FASTA format of the SwissProt records,
as I need to use the feature fields. What I need to learn is how to do a DB
search on the features field of the SwissProt records, if its possible.
Would there be any advantage do doing it with the DB instead of just using
SeqIO as an input stream? I think it might, since every time I want to do a
search I must read in the entire file again, which is very costly. Thank
you.

Kenny Daily
Indiana University
School of Informatics
kmdaily [at] indiana [dot] edu

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From jdw at ou.edu  Fri Jan 21 11:54:37 2005
From: jdw at ou.edu (James D. White)
Date: Fri Jan 21 11:50:38 2005
Subject: [Bioperl-l] regular expression help!
References: <200501161451.j0GEpNKr028052@portal.open-bio.org>
	<41F1321A.72FB2289@ou.edu>
Message-ID: <41F133CD.3BDCA957@ou.edu>

Sorry about double posting, but I forgot to change the subject before
sending the first message.

> Starting with:
>
> $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
>
> The slashes in tr/// confused the Perl parser.  You need to use
> different delimiters for the m// operator (the m is implied by //)
> and the tr/// operator.  Also the tr/// operator does not use the
> i flag, so lower case needs to be handled explicitly.  So let's
> try the following:
>
> $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i;
>
> This gives the error:
> Can't modify constant item in transliteration (tr///) at (re_eval 1)
> line 1, near "tr/ATCGatcg/TAGCtagc/)"
>
> Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of
> \1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern
> interpolation", p. 213) Inside the evaluated CODE, \2 is a
> constant, not the value of the second captured substring.  Also I'm
> not sure what modifying $2 would do, so let's try:
>
> $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i;
>
> This works, but I would get rid of the leading "\S+" and trailing
> ".*".  The ".*" adds nothing useful, so just drop it.  You
> probably don't need the leading "\S+", because the pattern is not
> anchored to the beginning of the string with "^".  The leading
> "\S+" gobbles up the entire string, forcing the match to backtrack
> character by character from the end.  It also forces the substring
> match saved in $1 to occur after the first character.  Unless you
> never want $1 to consider the first character, just drop the
> leading "\S+".  If you don't want to search the first character,
> then just use "\S".  This results in:
>
> $regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
>
> Finally I would probably change the remaining ".*" to ".*?".  If
> you search with ".*" on a long sequence which could contain
> multiple sequences of interest, the ".*" pattern will match the rest
> of the sequence and force backtracking to match the first occurrence
> of "$1$2" with the last occurrence of "revcomp($2)$1".  If you use
> ".*?", you match the first occurrence of "$1$2" with the nearest
> occurrence of "revcomp($2)$1".  This results in the final regular
> expression:
>
> $regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
>
> > Date: Fri, 14 Jan 2005 14:12:46 -0500
> > From: Guojun Yang <gyang@plantbio.uga.edu>
> > Subject: [Bioperl-l] regular expression help!
> > To: bioperl-l@portal.open-bio.org
> > Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>
> > Content-Type: text/plain;       charset="us-ascii"
> >
> > Hi, Everybody,
> > I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
> > The regex I have is:
> > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
> > Thank you,
> > Yang
> >
>
> --
> James D. White   (jdw@ou.edu)
> Director of Bioinformatics
> Department of Chemistry and Biochemistry/ACGT
> University of Oklahoma
> 101 David L. Boren Blvd., SRTC 2100
> Norman, OK 73019
> Phone: (405) 325-4912, FAX: (405) 325-7762

--
James D. White   (jdw@ou.edu)
Director of Bioinformatics
Department of Chemistry and Biochemistry/ACGT
University of Oklahoma
101 David L. Boren Blvd., SRTC 2100
Norman, OK 73019
Phone: (405) 325-4912, FAX: (405) 325-7762


From ed at compbio.berkeley.edu  Fri Jan 21 12:09:04 2005
From: ed at compbio.berkeley.edu (Ed Green)
Date: Fri Jan 21 12:06:05 2005
Subject: [Bioperl-l] dssp script
In-Reply-To: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de>
References: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de>
Message-ID: <41F13730.1080203@compbio.berkeley.edu>

Dear Peter,

These two are in fact bugs that I will fix. The first results because of 
  the presence of 'termination residues' that don't have residue 
numbers. Their residue numbers, then, can't be compared numerically. 
Fortunately, this bug won't result in wrong results as we want this 
comparison to always be false anyway. The solution to this is to first 
check if either of the termination residue signals are set and if so, 
don't do this numerical comparison.

The second, blank line(s) at end of file will also be fixed.

Beware that there is, I think, a bug in your script. It appears that you 
are attempting to iterate over all residues. However, iterating A:1 .. 
A:max doesn't get it done because of the crazy way residues can be 
numbered in PDB files: you'll miss all the residues with altloc codes 
(A:27A, A:27B, A:27C, e.g.).

To make this easy an iterator is called for. It will just return all 
'real' residues for the pdb file or for a specified chain - I'll try to 
get that done this weekend.

Regards,
Ed Green

Robinson, Peter wrote:
> Dear BioPerlers,
> 
> I am writing a script to use the BioPerl DSSP module to print out a list of phi and psi angles for all applicable residues  of all chains. Although the results are correct, I get the following error message at the end of each chain:
> 
> Argument "" isn't numeric in numeric eq (==) at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1168.
> 
> and I am not quite sure where it is coming from. Perhaps I am using the wrong part of the API, but I am trying to get a list of all residues for each chain as follows:
> 
> foreach my $ch (@chains) {
>   my $ss_elements_pts = $dssp->secBounds($ch);
>   print "Chain $ch:\n";
>   my $pos = 0;
>   my $max = 0;
>   foreach my $stretch (@{$ss_elements_pts}) {
>     my $start = $stretch->[0];
>     my $end = $stretch->[1]; 
>     if ($end =~ m/(\d+)/) { $end = $1; }
>    
>     if ($end  > $max) { $max = $end; }
>   }
>   ## END is now the last residue in this chain
>   for my $res (1..$max) {
>     my $residueID = $res . ":" . $ch;
>     my ($phi,$psi,$SS,$SSsum,$AA);
>     eval { $phi = $dssp->resPhi($residueID);};
> 	etc.
> 
> The full script is appended to the bottom of this mail.
> 
> 
> I also noticed what might be a minor bug in the module DSSP/Res.pm; when I use dsspcmbi to analyze a PDB file, it produces a results file with an empty last line. This causes a crash:
> 
> Use of uninitialized value in chomp at /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1284, <GEN1> line 955.
> 
> 
>  If I manually remove this last empty line, there was no error. By adding the following line at Res.pm l.1284, you can fix the problem:
> 
> 
>  while ( chomp( $cur = <$file> ) ) {
>       next if ($cur =~ m/^\s*$/);  *********************************************
> 	$res_num = substr( $cur, 0, 5 );
> 	$res_num =~ s/\s//g;
> 	$self->{ 'Res' }->[ $res_num ] = &_parseResLine( $cur );
>     }
> }
> 
> 
> 
> 
> Thanks in adavance for any tips! Peter
> Peter N. Robinson, M.D.
> Institute of Medical Genetics
> Charit? University Hospital
> Augustenburger Platz 1
> 13353 Berlin
> Germany
> ++49-30-450 569124
> peter.robinson@charite.de
> http://www.charite.de/ch/medgen/robinson
> Beware of bugs in the above code; I have only proved it correct, not tried it. -Donald Knuth, computer scientist (1938- )
> 
> ########################
> 
> #!/usr/bin/perl -w
> use IO::File;
> use Bio::Structure::SecStr::DSSP::Res;
> use Data::Dumper;
> 
> 
> =pod
> parseDSSP.pl
> Script to parse the output of DSSP using the BioPerl module
> Bio::Structure::SecStr::DSSP::Res. To use it, process a PDB
> file with dssp or dsspcmbi, and pass the resulting file to 
> this script. For more information on dssp and BioPerl see the
> module documentation at http://bioperl.org
> 
> @email peter.robinson@charite.de
> 21 January, 2005
> 
> =cut
> 
> 
> 
> my $file = "pdb43ca.dssp";
> my $dssp = new Bio::Structure::SecStr::DSSP::Res('-file'=> "$file");
> 
> my $pdbID = $dssp->pdbID();
> my $auth  = $dssp->pdbAuthor();
> my $cmpd = $dssp->pdbCompound();
> my $pdb_date = $dssp->pdbDate();
> my $header = $dssp->pdbHeader();
> my $pdbSource = $dssp->pdbSource();
> 
> print "PDB entry $pdbID \n\tauthor:\t$auth",
>   "\n\tCompound:\t$cmpd",
>   "\n\tDate:\t$pdb_date",
>   "\n\tHeader:\t$header",
>   "\n\tsource:\t$pdbSource\n\n";
> 
> my $totalRes = $dssp->numResidues();
> print "Total residue count (all chains):$totalRes\n";
> 
> 
> my $surArea= $dssp->totSurfArea();
> print "Total accessible surface area:\t$surArea  (square Ang)\n";
> 
> 
> my $chainRef = $dssp->chains();
> my @chains = sort  @{$chainRef};
> print "Chain[s]:\n";
> foreach my $ch (@chains) {
>   print "\t$ch";
> }
> print "\n";
> 
> my $hb = $dssp->hBonds();
> print "H BONDS.\n";
> print "TYPE O(I)-->H-N(J): $hb->[0]\n",
>    "IN PARALLEL BRIDGES: $hb->[1]\n",
>    "IN ANTIPARALLEL BRIDGES $hb->[2]\n",
>    "TYPE O(I)-->H-N(I-5) $hb->[3]\n",
>    "TYPE O(I)-->H-N(I-4) $hb->[4]\n",
>    "TYPE O(I)-->H-N(I-3) $hb->[5]\n",
>    "TYPE O(I)-->H-N(I-2) $hb->[6]\n",
>    "TYPE O(I)-->H-N(I-1) $hb->[7]\n",
>    "TYPE O(I)-->H-N(I+0) $hb->[8]\n",
>    "TYPE O(I)-->H-N(I+1) $hb->[9]\n",
>    "TYPE O(I)-->H-N(I+2) $hb->[10]\n",
>    "TYPE O(I)-->H-N(I+3) $hb->[11]\n",
>    "TYPE O(I)-->H-N(I+4) $hb->[12]\n",
>    "TYPE O(I)-->H-N(I+5) $hb->[13]\n",
>   "\n";
> 
>    
>  
> foreach my $ch (@chains) {
>   my $ss_elements_pts = $dssp->secBounds($ch);
>   print "Chain $ch:\n";
>   my $pos = 0;
>   my $max = 0;
>   foreach my $stretch (@{$ss_elements_pts}) {
>     my $start = $stretch->[0];
>     my $end = $stretch->[1]; 
>     if ($end =~ m/(\d+)/) { $end = $1; }
>    
>     if ($end  > $max) { $max = $end; }
>   }
>   ## END is now the last residue in this chain
>   for my $res (1..$max) {
>     my $residueID = $res . ":" . $ch;
>     my ($phi,$psi,$SS,$SSsum,$AA);
>     eval { $phi = $dssp->resPhi($residueID);};
>     eval { $psi = $dssp->resPsi($residueID);};
>     eval { $SS = $dssp->resSecStr($residueID);};
>     eval { $SSsum = $dssp->resSecStrSum($residueID);};
>     $AA = $dssp->resAA($residueID);
>     $phi = $phi || "n/a";
>     $psi = $psi || "n/a";
>     $SS = $SS || "-";
>     my $SSclass;
>     if ($SSsum eq "H") { $SSclass = "helix"; }
>     elsif ($SSsum eq "T") { $SSclass = "turn"; }
>     elsif ($SSsum eq "B") { $SSclass = "beta"; }
>     else { $SSclass = $SSsum; }
>     print "$residueID) [$AA] phi:$phi psi:$psi SecStruct: $SS ($SSclass) \n";
>   }
> }
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From MAG at Stowers-Institute.org  Fri Jan 21 12:14:47 2005
From: MAG at Stowers-Institute.org (Goel, Manisha)
Date: Fri Jan 21 12:11:28 2005
Subject: [Bioperl-l] Amino acid frequency counter
Message-ID: <200501211711.j0LHBCKr023115@portal.open-bio.org>

Hi All,
I have recently started using Bio-perl to analyse and manipulate my
protein sequence alignments.
I need to calculate aminoacid frequencies at each column of the
alignment. Which module could be of help ?
Thanks for guiding,
-Manisha

From cjm at fruitfly.org  Fri Jan 21 12:32:33 2005
From: cjm at fruitfly.org (Chris Mungall)
Date: Fri Jan 21 12:29:32 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
	SwissProtfile
In-Reply-To: <GAEDKMGOKFBLJPKCLKCCGEBMEIAA.brian_osborne@cognia.com>
References: <GAEDKMGOKFBLJPKCLKCCGEBMEIAA.brian_osborne@cognia.com>
Message-ID: <Pine.OSX.4.58.0501210916130.17957@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


Brian,

Unfortunately the id_parser method isn't supported in
Bio::Index::Swissprot

Even if it was I don't think it would be sufficient here - Kenny needs to
index using the feature fields. This implies that the search key wouldn't
be unique. Bio::Index::Abstract requires a unique key for the index.

Flexible indexing and retrieval such as this is best handled using some
generic non-bioperl specific solution - RDB, XMLDB, SRS, Lucene, LuceGene
etc

I forgot to mention Don Gilbert's LuceGene in my original reply - it's a
fairly sane open-source alternative to SRS. It handles lots of
bioinformatics file formats (not sure about swissprot but I'm sure it
could be added)

See:
http://www.gmod.org/lucegene/index.shtml

Cheers
Chris

On Fri, 21 Jan 2005, Brian Osborne wrote:

> Kenny,
>
> Did you take a look at Bio/Index/Swissprot.pm? What's important for you will
> be building the index using the keys you're interested in as opposed to the
> default key, using the id_parser method. See the Bio::Index section in the
> bptutorial for an example.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily,
> Kenneth Michael
> Sent: Wednesday, January 19, 2005 11:49 AM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
> SwissProtfile
>
>
> I want to work with a local copy of the SwissProt database, and need to
> search through all of the entries. I only see methods to return sequences by
> accession. However, I cannot use just FASTA format of the SwissProt records,
> as I need to use the feature fields. What I need to learn is how to do a DB
> search on the features field of the SwissProt records, if its possible.
> Would there be any advantage do doing it with the DB instead of just using
> SeqIO as an input stream? I think it might, since every time I want to do a
> search I must read in the entire file again, which is very costly. Thank
> you.
>
> Kenny Daily
> Indiana University
> School of Informatics
> kmdaily [at] indiana [dot] edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From yguo at vbi.vt.edu  Fri Jan 21 12:39:27 2005
From: yguo at vbi.vt.edu (yguo@vbi.vt.edu)
Date: Fri Jan 21 12:35:31 2005
Subject: [Bioperl-l] Code for retrieving PDF file using a Pubmed link.
In-Reply-To: <200501211711.j0LHBCKr023115@portal.open-bio.org>
References: <200501211711.j0LHBCKr023115@portal.open-bio.org>
Message-ID: <34464.128.173.99.81.1106329167.squirrel@webmail.vbi.vt.edu>

[Seems that attachment is not supported. Here I re-send it...]

Hi,

Here I attached the code mentioned earlier. I donot know if the mailing
list system supports attachement. So, I also paste the code at the end of
this email.

I have put the detailed instruction in the comment part. Any usage
problem, please contact me.

The module will do its best to find the PDF link. But it can fail at some
publisher sites. You can let the module to put the processing result in a
log file. The flag of "NOT_FOUND_OR_ALLOWED" means that it failed to
download the PDF file. It is possible that the PDF location is too
complicated to the parser, or your institute does not have right to view
the full text.

For around 360 publication (with full text link) required in our project,
the module can got the PDF for around 330 of them. While our project going
on, I will update this module to make it more robust.

I hope the module can be a part of Bioperl ultimately. But before that,
you guys can help me to test.


Good weekend,

Yongjian Guo
at
Virginia Bioinformatics Institute


-----------------------------------------------------------------------

# $Id: PDFDownloader.pm   2005/1/20$
# Version 0.1
#
# Cared for by Yongjian Guo <yguo@vbi.vt.edu>
# For copyright and disclaimer see below.

# POD documentation - main docs before the code

=head1 NAME

PDFDownloader - Download full text PDF file using a Pubmed entry.

=head1 SYNOPSIS

              use PDFDownloader;
              #build the object,
              $worker = new PDFDownloader({logFile=>$logFile,
                                           link=>$link,
                                           dir=>$dirName,
                                           fileName=>$fileName});
              #start to download.
              $worker->start();

              The log information can be saved in the log file or shown on
screen.
              The following information will be given:

              DONE : Successfully finish downloading.
              NOT_OPEN_MED : Can not open the medine page.
              NOT_OPEN_PUB : Can not open the publisher site,
              NO_LINK : The given link does not have full text link out.
              NOT_FOUND_OR_ALLOWED : PDF entry can not be found or user
does not have right to view full text.


=head1 DESCRIPTION

This module will download the full text PDF file from the publisher website
using a Pubmed entry, if there is full text available.

=head1 Attributes

              link:  The pubmed link for an article.
              logFile: The assigned log file name. If it is the empty, the
information will be shown on screen.
              dir:   The directory of the pdf file to be saved.
              fileName: The name prefix of the target PDF file to be
saved. The downloaded file has the name
                        of fileName.pdf


=head1 FEEDBACK

=head2 Reporting Bugs

Report bugs to yguo@vbi.vt.edu.


=head1 AUTHORS

Yongjian Guo @ Virginia Bioinformatics Institute.

=head1 COPYRIGHT

Copyright (c) 2004 Virginia Bioinformatics Institute. All Rights Reserved.

This module is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=head1 DISCLAIMER

This software is provided "as is" without warranty of any kind.

=cut


package PDFDownloader;


use strict;
use LWP::UserAgent;
use HTTP::Cookies;

#Function to create the PDFDownloader object.
#the parameter is a hash and its required entry
#is "link", which is a Pubmed article entry, like:
#http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8931319


sub new{
    my $self=shift;
    my $para=shift;
    my %op=();
    $op{keep_alive}=1;
    $op{agent}="Mozilla/5.0";
    $op{timeout}=20;
    $op{cookie_jar}=HTTP::Cookies->new(file => "cookies.txt");
    my $class=bless{
	logFile=>$para->{logFile} || "",
	link=>$para->{link}, #what given is a link.
	dir=>$para->{dir} || ".",
	fileName=>$para->{fileName} || rand(),
	base=>"", #use to save the base url of the publisher site.
	fp=>LWP::UserAgent->new(%op),

    }, $self;
    if(!defined $class) {
	die "can not create object $class\n";
    }
   return $class;
}


#function to start searching and download process.

sub start{
    my $self=shift;
    my $ncbiBase="http://www.ncbi.nlm.nih.gov";
    my $data=$self->_getLinkContent($self->{link});
    if(length($data)==0){
	$self->_log("NOT_OPEN_MED\t".$self->{link});
    }
    #ok we get the link content, we analysis.
    if($data=~/href=\".*?db=pubmed\&url=(.*?)\"\s+/){
	#if we can get this pattern.  direct
	$self->_parsePubSite($1);

    }elsif($data=~/href=\"(.*?articlerender.*?)\"\s+/){
	#if is the direct deposit, not direct
	$self->_parsePubSite($1);
    }else{
	$self->_log("NO_LINK\t".$self->{link});
    }

}


#function to parse the first page on the publisher site.

sub _parsePubSite{
    my($self, $link)=@_;
    my $data=$self->_getLinkContent($link);
    my ($pos, $pos2, $tmpString, $result, @array);
    $result=-1; #initial is negative.
    if(length($data)==0){
	$self->_log("NOT_OPEN_PUB\t".$link);
	return;
    }
    #find the PDF string,
    $data=~s/\n//g;
    $data=~s/&nbsp;/ /g;
    #first we try if this is a direct link,
    $tmpString=$self->_getPDFLink("", $data);
    if(length($tmpString)!=0){
	#found the link,
	$result=$self->_tryGetPDF($tmpString);
	if($result==1){
	    return;
	}
    }

    if($data=~/([\s|(|>]pdf[\)|\s|<])/ ||
$data=~/([\s|(|>]PDF[\)|\s|<|"])/ ){
       #two possiblities,
       # 1. a link,
       # 2. a javascript.
       $pos=index($data, $1);
       $pos2=rindex(substr($data, 0, $pos), "href="); #found the earliest
href.
       $tmpString=substr($data, $pos2, $pos-$pos2);
       if($tmpString=~/\"(.*?)\"/){
         $tmpString=$1;
       }
       #further extraction
       if($tmpString=~/\'(.*?)\'/){
      	 $tmpString=$1;
       }
       #ok, here we got the $tmpString for a next link,
       #use a try mechanism,
       $result=$self->_tryGetPDF($tmpString);
       if($result==-1){
	 #no success,
	 @array=$self->_getSubLinks($tmpString);
	 foreach my $entry (@array){
             if($self->_tryGetPDF($self->_getPDFLink($entry))==1){
       	     $result=1; #success,
	     last;
	   }
	 }
	}
    }
    #further try,
    if($result!=1){
       #it is possible that the direct link is a frame,
       @array=$self->_getSubLinks($link);
       foreach my $entry (@array){
	 if($self->_tryGetPDF($self->_getPDFLink($entry))==1){
	   $result=1; #success,
	   last;
	 }
       }
       if($result!=1){
	 $self->_log("NOT_FOUND_OR_ALLOWED\t".$self->{link});
       }
     }
}


sub _tryGetPDF{
    my($self, $link)=@_;
    my $result=$self->_getPDFFile($link);
    if($result==1){
	$self->_log("DONE\t".$self->{link});
    }
    return $result;
}

#given a web page, use this one to get all of the links in that page.

sub _getSubLinks{
    my ($self, $link)=@_;
    my $data=$self->_getLinkContent($link);
    my @array=();
    my $pos=0;
    my $pos2=0;
    my $tmp="";
    my $count=0;
    while(1){
	$pos=index($data, "\"", $pos);
	if($pos==-1 || $count>50){   #it is possible the page does not have link.
we use the number to control.
	    last;
	}
	$pos2=index($data, "\"", $pos+1);
	$tmp=substr($data, $pos+1, $pos2-$pos-1);
	if($tmp=~/^(http)|\//){
	    push(@array, $tmp);
	}
	$pos=$pos2+1;
	$count++;
    }
    if($count>=50){
	@array=();
    }
    return @array;
}


#function to return a pdf file link from a webpage.

sub _getPDFLink{
    my($self, $link, $data)=@_;
    if(length($link)!=0){
	$data=$self->_getLinkContent($link);
	$data=~s/\n//g;
    }
    if($data=~/.*[\"|\'](.{5,}\.pdf)[\"|\']/ ||
$data=~/.*[\"|\'](.{5,}\.PDF)[\"|\']/ ){
	#ok, there is pdf file.
	return $1;
    }
    return "";  #not found.
}


#function to get the homepage. redirection is taken cared.

sub _getLinkContent{
    my ($self, $link)=@_;
    if($link!~/http:\/\//){  #some link has the format of http:/www..
	my $pos=index($link, "/");
	$link=substr($link, $pos);
    }
    if($link!~/^http/){
	$link=$self->_buildURL($link);
    }

    $link=~s/&amp\;/&/g;
    my $response=$self->{fp}->get($link);
    my $rHeader="";
    #ok, we need to analysis the header. to see if there is a refresh,
    #if yes, we will refresh the link,
    if($response->is_success()){
	$rHeader=$response->header("Refresh");
	if(length($rHeader)>0){
	    if($rHeader=~/URL\=(.*)/){
		return $self->_getLinkContent($1);
	    }
	}else{
	    #update the base url.
	    $self->{base}=$response->base();
	    return $response->content;
	}
    }
    return "";

}

#get the real pdf file. redirection is taken cared.

sub _getPDFFile{
    my($self, $link)=@_;
    if($link!~/^http/){
	$link=$self->_buildURL($link);
    }
    $link=~s/&amp\;/&/g;
    my $done=0;
    my $fileName=$self->{dir}."/".$self->{fileName}.".pdf";
    #try to see if there is a refresh,
    my $response=$self->{fp}->get($link);
    if($response->is_success()){
	my $rHeader=$response->header("Refresh");
	if(length($rHeader)>0 && $rHeader=~/URL\=(.*)/){
	    return $self->_getPDFFile($1);
	}
    }

    $self->{fp}->get($link, ":content_file"=>$fileName);
    #ok now, we test if this file is the pdf file, if yes,
    #we done, if not, return some message.
    open PDFIN, $fileName;
    while(<PDFIN>){
	if($_=~/^%PDF/){
	    $done=1;
	    last;
	}
    }
    close PDFIN;
    if($done==0){
	unlink $fileName;
	return -1;
    }
    return 1; #everything ok,

}

#function to record the log

sub _log{
    my($self, $data)=@_;
    if(length($self->{logFile})==0){
	print $data,"\n";
	return;
    }
    open LOGOUT, ">>".$self->{logFile} or die "can not open the log file
to write\n";
    print LOGOUT $data, "\n";
    close LOGOUT;
    return;
}

#function to build the full url.

sub _buildURL{
    my($self, $target)=@_;

    if($target=~/^\//){
	if($self->{base}=~/(http:\/\/.*?)\//){
	    return $1.$target;
	}
    }else{
	if($self->{base}=~/(http:\/\/.*)\//){
	    return $1."/".$target;
	}
    }
    return $target;
}


1;


From barry.moore at genetics.utah.edu  Fri Jan 21 13:51:38 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Fri Jan 21 13:47:44 2005
Subject: [Bioperl-l] regular expression help!
In-Reply-To: <41F133CD.3BDCA957@ou.edu>
References: <200501161451.j0GEpNKr028052@portal.open-bio.org>	<41F1321A.72FB2289@ou.edu>
	<41F133CD.3BDCA957@ou.edu>
Message-ID: <41F14F3A.2010604@genetics.utah.edu>

Excellent reply.  I think we all learned something from that one.

Barry

James D. White wrote:

>Sorry about double posting, but I forgot to change the subject before
>sending the first message.
>
>  
>
>>Starting with:
>>
>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
>>
>>The slashes in tr/// confused the Perl parser.  You need to use
>>different delimiters for the m// operator (the m is implied by //)
>>and the tr/// operator.  Also the tr/// operator does not use the
>>i flag, so lower case needs to be handled explicitly.  So let's
>>try the following:
>>
>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCGatcg/TAGCtagc/);})\1.*:i;
>>
>>This gives the error:
>>Can't modify constant item in transliteration (tr///) at (re_eval 1)
>>line 1, near "tr/ATCGatcg/TAGCtagc/)"
>>
>>Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of
>>\1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern
>>interpolation", p. 213) Inside the evaluated CODE, \2 is a
>>constant, not the value of the second captured substring.  Also I'm
>>not sure what modifying $2 would do, so let's try:
>>
>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1.*:i;
>>
>>This works, but I would get rid of the leading "\S+" and trailing
>>".*".  The ".*" adds nothing useful, so just drop it.  You
>>probably don't need the leading "\S+", because the pattern is not
>>anchored to the beginning of the string with "^".  The leading
>>"\S+" gobbles up the entire string, forcing the match to backtrack
>>character by character from the end.  It also forces the substring
>>match saved in $1 to occur after the first character.  Unless you
>>never want $1 to consider the first character, just drop the
>>leading "\S+".  If you don't want to search the first character,
>>then just use "\S".  This results in:
>>
>>$regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
>>
>>Finally I would probably change the remaining ".*" to ".*?".  If
>>you search with ".*" on a long sequence which could contain
>>multiple sequences of interest, the ".*" pattern will match the rest
>>of the sequence and force backtracking to match the first occurrence
>>of "$1$2" with the last occurrence of "revcomp($2)$1".  If you use
>>".*?", you match the first occurrence of "$1$2" with the nearest
>>occurrence of "revcomp($2)$1".  This results in the final regular
>>expression:
>>
>>$regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
>>
>>    
>>
>>>Date: Fri, 14 Jan 2005 14:12:46 -0500
>>>From: Guojun Yang <gyang@plantbio.uga.edu>
>>>Subject: [Bioperl-l] regular expression help!
>>>To: bioperl-l@portal.open-bio.org
>>>Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>
>>>Content-Type: text/plain;       charset="us-ascii"
>>>
>>>Hi, Everybody,
>>>I was trying to use a regex recognizing a patter of inverted repeat DNA seq flanked by direct repeats (see below), it returns errors saying "(?{...}) not terminated or {...} not balanced. Can anybody help me sorting this out?
>>>The regex I have is:
>>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
>>>Thank you,
>>>Yang
>>>
>>>      
>>>
>>--
>>James D. White   (jdw@ou.edu)
>>Director of Bioinformatics
>>Department of Chemistry and Biochemistry/ACGT
>>University of Oklahoma
>>101 David L. Boren Blvd., SRTC 2100
>>Norman, OK 73019
>>Phone: (405) 325-4912, FAX: (405) 325-7762
>>    
>>
>
>--
>James D. White   (jdw@ou.edu)
>Director of Bioinformatics
>Department of Chemistry and Biochemistry/ACGT
>University of Oklahoma
>101 David L. Boren Blvd., SRTC 2100
>Norman, OK 73019
>Phone: (405) 325-4912, FAX: (405) 325-7762
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

From akozik at atgc.org  Fri Jan 21 19:21:15 2005
From: akozik at atgc.org (Alexander Kozik)
Date: Fri Jan 21 18:17:07 2005
Subject: [Bioperl-l] GenBank gene field
Message-ID: <41F19C7B.3000101@atgc.org>

Please take a look on two sample records from GenBank files (Arabidopsis 
and C.elegans)
C.elegans file has "/gene" entries for both "gene" and "CDS" fields. 
Arabidopsis file has no "/gene" entries at all.
Previous version of Arabidopsis GenBank file was with "/gene" entries.
Could you help to understand why it happens and what entry you suggest 
to extract if user is interested in extraction of corresponding gene names.
Do I use terms "entry" and "field" properly?

Thanks a lot in advance,

Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email: akozik@atgc.org
web: http://www.atgc.org/

----

Arabidopsis GenBank file NC_003070.gbk:

     gene            complement(38753..40944)
                     /locus_tag="At1g01070"
                     /note="synonym: T25K16.7; nodulin MtN21 family protein"
                     /db_xref="GeneID:839550"
...
     CDS             complement(join(38898..39054,39136..39287,39409..39814,
                     40213..40329,40473..40535,40675..40877))
                     /locus_tag="At1g01070"
                     /note="similar to MtN21 GI:2598575 (root nodule
                     development) from [Medicago truncatula]"
                     /codon_start=1
                     /protein_id="NP_563617.1"
                     /db_xref="GI:18378792"
                     /db_xref="GeneID:839550"
                     /translation="MAG...
----

C.elegans GenBank file NC_003279.gbk:

     gene            43733..44677
                     /gene="1A519"
                     /locus_tag="1A519"
                     /synonym="Y74C9A.1"
                     /note="Title: Caenorhabditis elegans expressed gene
                     1A519."
...
     CDS             
join(43733..43961,44030..44234,44281..44328,44521..44677)
                     /gene="1A519"
                     /locus_tag="1A519"
                     /codon_start=1
                     /product="putative protein (1A519)"
                     /protein_id="17510627"
                     /db_xref="GI:17510627"
...


From jason.stajich at duke.edu  Fri Jan 21 22:22:52 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan 21 22:19:06 2005
Subject: [Bioperl-l] Amino acid frequency counter
In-Reply-To: <200501211711.j0LHBCKr023115@portal.open-bio.org>
References: <200501211711.j0LHBCKr023115@portal.open-bio.org>
Message-ID: <E63E2218-6C24-11D9-A728-000393C44276@duke.edu>

Bio::AlignIO for reading in sequence alignments produces 
Bio::SimpleAlign objects.

On Jan 21, 2005, at 12:14 PM, Goel, Manisha wrote:

> Hi All,
> I have recently started using Bio-perl to analyse and manipulate my
> protein sequence alignments.
> I need to calculate aminoacid frequencies at each column of the
> alignment. Which module could be of help ?
> Thanks for guiding,
> -Manisha
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Fri Jan 21 22:23:51 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan 21 22:19:55 2005
Subject: [Bioperl-l] GenBank gene field
In-Reply-To: <41F19C7B.3000101@atgc.org>
References: <41F19C7B.3000101@atgc.org>
Message-ID: <09DD5061-6C25-11D9-A728-000393C44276@duke.edu>

You should probably ask the data providers....

On Jan 21, 2005, at 7:21 PM, Alexander Kozik wrote:

> Please take a look on two sample records from GenBank files 
> (Arabidopsis and C.elegans)
> C.elegans file has "/gene" entries for both "gene" and "CDS" fields. 
> Arabidopsis file has no "/gene" entries at all.
> Previous version of Arabidopsis GenBank file was with "/gene" entries.
> Could you help to understand why it happens and what entry you suggest 
> to extract if user is interested in extraction of corresponding gene 
> names.
> Do I use terms "entry" and "field" properly?
>
> Thanks a lot in advance,
>
> Alexander Kozik
> Bioinformatics Specialist
> Genome and Biomedical Sciences Facility
> 451 East Health Sciences Drive
> University of California
> Davis, CA 95616-8816
> Phone: (530) 754-9127
> email: akozik@atgc.org
> web: http://www.atgc.org/
>
> ----
>
> Arabidopsis GenBank file NC_003070.gbk:
>
>     gene            complement(38753..40944)
>                     /locus_tag="At1g01070"
>                     /note="synonym: T25K16.7; nodulin MtN21 family 
> protein"
>                     /db_xref="GeneID:839550"
> ...
>     CDS             
> complement(join(38898..39054,39136..39287,39409..39814,
>                     40213..40329,40473..40535,40675..40877))
>                     /locus_tag="At1g01070"
>                     /note="similar to MtN21 GI:2598575 (root nodule
>                     development) from [Medicago truncatula]"
>                     /codon_start=1
>                     /protein_id="NP_563617.1"
>                     /db_xref="GI:18378792"
>                     /db_xref="GeneID:839550"
>                     /translation="MAG...
> ----
>
> C.elegans GenBank file NC_003279.gbk:
>
>     gene            43733..44677
>                     /gene="1A519"
>                     /locus_tag="1A519"
>                     /synonym="Y74C9A.1"
>                     /note="Title: Caenorhabditis elegans expressed gene
>                     1A519."
> ...
>     CDS             
> join(43733..43961,44030..44234,44281..44328,44521..44677)
>                     /gene="1A519"
>                     /locus_tag="1A519"
>                     /codon_start=1
>                     /product="putative protein (1A519)"
>                     /protein_id="17510627"
>                     /db_xref="GI:17510627"
> ...
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From gyang at plantbio.uga.edu  Fri Jan 21 22:31:55 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri Jan 21 22:28:08 2005
Subject: what about the speed on longer seq? Re: [Bioperl-l] regular
	expression help!
In-Reply-To: <41F133CD.3BDCA957@ou.edu>
Message-ID: <20050121223155.bd16abb4@dogwood.plantbio.uga.edu>

Thank you James for your detailed info. An earlier solution given is to use 
=~ /(\S{4,})(\S{10,}).+(??{sub($2)})\1/i; the sub is to do the transliteration and reversion of $2. It works greatly on ~80 bp seq. However, on a seq ~500 bp, it takes forever to do. Is there any similarity in processing time for the regex? I will definitely try it.
Have a great one,
Yang
----- Original Message -----
From: James D. White <jdw@ou.edu>
To: bioperl-l@portal.open-bio.org
Sent: Fri, 21 Jan 2005 11:54:37 -0500
Subject: Re: [Bioperl-l] regular expression help!


> Sorry about double posting, but I forgot to change the subject before
> sending the first message.
> 
> > Starting with:
> >
> > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
> >
> > The slashes in tr/// confused the Perl parser.  You need to use
> > different delimiters for the m// operator (the m is implied by //)
> > and the tr/// operator.  Also the tr/// operator does not use the
> > i flag, so lower case needs to be handled explicitly.  So let's
> > try the following:
> >
> > $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~
> tr/ATCGatcg/TAGCtagc/);})\1.*:i;
> >
> > This gives the error:
> > Can't modify constant item in transliteration (tr///) at (re_eval 1)
> > line 1, near "tr/ATCGatcg/TAGCtagc/)"
> >
> > Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of
> > \1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern
> > interpolation", p. 213) Inside the evaluated CODE, \2 is a
> > constant, not the value of the second captured substring.  Also I'm
> > not sure what modifying $2 would do, so let's try:
> >
> > $regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/;
> reverse($rev);})\1.*:i;
> >
> > This works, but I would get rid of the leading "\S+" and trailing
> > ".*".  The ".*" adds nothing useful, so just drop it.  You
> > probably don't need the leading "\S+", because the pattern is not
> > anchored to the beginning of the string with "^".  The leading
> > "\S+" gobbles up the entire string, forcing the match to backtrack
> > character by character from the end.  It also forces the substring
> > match saved in $1 to occur after the first character.  Unless you
> > never want $1 to consider the first character, just drop the
> > leading "\S+".  If you don't want to search the first character,
> > then just use "\S".  This results in:
> >
> > $regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/;
> reverse($rev);})\1:i;
> >
> > Finally I would probably change the remaining ".*" to ".*?".  If
> > you search with ".*" on a long sequence which could contain
> > multiple sequences of interest, the ".*" pattern will match the rest
> > of the sequence and force backtracking to match the first occurrence
> > of "$1$2" with the last occurrence of "revcomp($2)$1".  If you use
> > ".*?", you match the first occurrence of "$1$2" with the nearest
> > occurrence of "revcomp($2)$1".  This results in the final regular
> > expression:
> >
> > $regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/;
> reverse($rev);})\1:i;
> >
> > > Date: Fri, 14 Jan 2005 14:12:46 -0500
> > > From: Guojun Yang <gyang@plantbio.uga.edu>
> > > Subject: [Bioperl-l] regular expression help!
> > > To: bioperl-l@portal.open-bio.org
> > > Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>
> > > Content-Type: text/plain;       charset="us-ascii"
> > >
> > > Hi, Everybody,
> > > I was trying to use a regex recognizing a patter of inverted repeat DNA seq
> flanked by direct repeats (see below), it returns errors saying "(?{...}) not
> terminated or {...} not balanced. Can anybody help me sorting this out?
> > > The regex I have is:
> > > $regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~
> tr/ATCG/TAGC/i);})\1.*/i;
> > > Thank you,
> > > Yang
> > >
> >
> > --
> > James D. White   (jdw@ou.edu)
> > Director of Bioinformatics
> > Department of Chemistry and Biochemistry/ACGT
> > University of Oklahoma
> > 101 David L. Boren Blvd., SRTC 2100
> > Norman, OK 73019
> > Phone: (405) 325-4912, FAX: (405) 325-7762
> 
> --
> James D. White   (jdw@ou.edu)
> Director of Bioinformatics
> Department of Chemistry and Biochemistry/ACGT
> University of Oklahoma
> 101 David L. Boren Blvd., SRTC 2100
> Norman, OK 73019
> Phone: (405) 325-4912, FAX: (405) 325-7762
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From davhum at garvan.unsw.edu.au  Thu Jan 20 21:11:26 2005
From: davhum at garvan.unsw.edu.au (davhum@garvan.unsw.edu.au)
Date: Sat Jan 22 11:17:29 2005
Subject: [Bioperl-l] WebDBSeqI Request error (Bad protocol 'tcp')????
Message-ID: <4248.129.94.225.7.1106273486.squirrel@gimr.garvan.unsw.edu.au>

Hi bioperl-groovers,

Has anyone ever had to deal with the following error message?


MSG: WebDBSeqI Request error:
500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp')

I have traced previous threads from the archives but they appear to be
slightly different. What is confusing me the most is that the script that
leaves this error worked perfectly on my machine but not on my colleagues
new machine (running win XP). I don't understand why the "Bad protocol
'tcp'" comment is there, but the error cascades out from bioperl modules.
Is it possible I did  not install perl or bioperl correctly?. Any ideas or
suggestions would be most appreciated?


thanks in advance

David Humphreys

From yguo at vbi.vt.edu  Fri Jan 21 12:35:47 2005
From: yguo at vbi.vt.edu (yguo@vbi.vt.edu)
Date: Sat Jan 22 11:17:41 2005
Subject: [Bioperl-l] Code for automatic retrieving pdf file from the
	publisherwebsite.
In-Reply-To: <001c01c4fe9d$08c64a70$7d75f345@WATSON>
References: <1109.151.199.12.38.1106187167.squirrel@webmail.vbi.vt.edu>
	<001c01c4fe9d$08c64a70$7d75f345@WATSON>
Message-ID: <34461.128.173.99.81.1106328947.squirrel@webmail.vbi.vt.edu>

Hi,

Here I attached the code mentioned earlier. I donot know if the mailing
list system supports attachement. So, I also paste the code at the end of
this email.

I have put the detailed instruction in the comment part. Any usage
problem, please contact me.

The module will do its best to find the PDF link. But it can fail at some
publisher sites. You can let the module to put the processing result in a
log file. The flag of "NOT_FOUND_OR_ALLOWED" means that it failed to
download the PDF file. It is possible that the PDF location is too
complicated to the parser, or your institute does not have right to view
the full text.

For around 360 publication (with full text link) required in our project,
the module can got the PDF for around 330 of them. While our project going
on, I will update this module to make it more robust.

I hope the module can be a part of Bioperl ultimately. But before that,
you guys can help me to test.


Good weekend,

Yongjian Guo
at
Virginia Bioinformatics Institute


-----------------------------------------------------------------------

# $Id: PDFDownloader.pm   2005/1/20$
# Version 0.1
#
# Cared for by Yongjian Guo <yguo@vbi.vt.edu>
# For copyright and disclaimer see below.

# POD documentation - main docs before the code

=head1 NAME

PDFDownloader - Download full text PDF file using a Pubmed entry.

=head1 SYNOPSIS

              use PDFDownloader;
              #build the object,
              $worker = new PDFDownloader({logFile=>$logFile,
                                           link=>$link,
                                           dir=>$dirName,
                                           fileName=>$fileName});
              #start to download.
              $worker->start();

              The log information can be saved in the log file or shown on
screen.
              The following information will be given:

              DONE : Successfully finish downloading.
              NOT_OPEN_MED : Can not open the medine page.
              NOT_OPEN_PUB : Can not open the publisher site,
              NO_LINK : The given link does not have full text link out.
              NOT_FOUND_OR_ALLOWED : PDF entry can not be found or user
does not have right to view full text.


=head1 DESCRIPTION

This module will download the full text PDF file from the publisher website
using a Pubmed entry, if there is full text available.

=head1 Attributes

              link:  The pubmed link for an article.
              logFile: The assigned log file name. If it is the empty, the
information will be shown on screen.
              dir:   The directory of the pdf file to be saved.
              fileName: The name prefix of the target PDF file to be
saved. The downloaded file has the name
                        of fileName.pdf


=head1 FEEDBACK

=head2 Reporting Bugs

Report bugs to yguo@vbi.vt.edu.


=head1 AUTHORS

Yongjian Guo @ Virginia Bioinformatics Institute.

=head1 COPYRIGHT

Copyright (c) 2004 Virginia Bioinformatics Institute. All Rights Reserved.

This module is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

=head1 DISCLAIMER

This software is provided "as is" without warranty of any kind.

=cut


package PDFDownloader;


use strict;
use LWP::UserAgent;
use HTTP::Cookies;

#Function to create the PDFDownloader object.
#the parameter is a hash and its required entry
#is "link", which is a Pubmed article entry, like:
#http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=8931319


sub new{
    my $self=shift;
    my $para=shift;
    my %op=();
    $op{keep_alive}=1;
    $op{agent}="Mozilla/5.0";
    $op{timeout}=20;
    $op{cookie_jar}=HTTP::Cookies->new(file => "cookies.txt");
    my $class=bless{
	logFile=>$para->{logFile} || "",
	link=>$para->{link}, #what given is a link.
	dir=>$para->{dir} || ".",
	fileName=>$para->{fileName} || rand(),
	base=>"", #use to save the base url of the publisher site.
	fp=>LWP::UserAgent->new(%op),

    }, $self;
    if(!defined $class) {
	die "can not create object $class\n";
    }
   return $class;
}


#function to start searching and download process.

sub start{
    my $self=shift;
    my $ncbiBase="http://www.ncbi.nlm.nih.gov";
    my $data=$self->_getLinkContent($self->{link});
    if(length($data)==0){
	$self->_log("NOT_OPEN_MED\t".$self->{link});
    }
    #ok we get the link content, we analysis.
    if($data=~/href=\".*?db=pubmed\&url=(.*?)\"\s+/){
	#if we can get this pattern.  direct
	$self->_parsePubSite($1);

    }elsif($data=~/href=\"(.*?articlerender.*?)\"\s+/){
	#if is the direct deposit, not direct
	$self->_parsePubSite($1);
    }else{
	$self->_log("NO_LINK\t".$self->{link});
    }

}


#function to parse the first page on the publisher site.

sub _parsePubSite{
    my($self, $link)=@_;
    my $data=$self->_getLinkContent($link);
    my ($pos, $pos2, $tmpString, $result, @array);
    $result=-1; #initial is negative.
    if(length($data)==0){
	$self->_log("NOT_OPEN_PUB\t".$link);
	return;
    }
    #find the PDF string,
    $data=~s/\n//g;
    $data=~s/ / /g;
    #first we try if this is a direct link,
    $tmpString=$self->_getPDFLink("", $data);
    if(length($tmpString)!=0){
	#found the link,
	$result=$self->_tryGetPDF($tmpString);
	if($result==1){
	    return;
	}
    }

    if($data=~/([\s|(|>]pdf[\)|\s|<])/ ||
$data=~/([\s|(|>]PDF[\)|\s|<|"])/ ){
       #two possiblities,
       # 1. a link,
       # 2. a javascript.
       $pos=index($data, $1);
       $pos2=rindex(substr($data, 0, $pos), "href="); #found the earliest
href.
       $tmpString=substr($data, $pos2, $pos-$pos2);
       if($tmpString=~/\"(.*?)\"/){
         $tmpString=$1;
       }
       #further extraction
       if($tmpString=~/\'(.*?)\'/){
      	 $tmpString=$1;
       }
       #ok, here we got the $tmpString for a next link,
       #use a try mechanism,
       $result=$self->_tryGetPDF($tmpString);
       if($result==-1){
	 #no success,
	 @array=$self->_getSubLinks($tmpString);
	 foreach my $entry (@array){
             if($self->_tryGetPDF($self->_getPDFLink($entry))==1){
       	     $result=1; #success,
	     last;
	   }
	 }
	}
    }
    #further try,
    if($result!=1){
       #it is possible that the direct link is a frame,
       @array=$self->_getSubLinks($link);
       foreach my $entry (@array){
	 if($self->_tryGetPDF($self->_getPDFLink($entry))==1){
	   $result=1; #success,
	   last;
	 }
       }
       if($result!=1){
	 $self->_log("NOT_FOUND_OR_ALLOWED\t".$self->{link});
       }
     }
}


sub _tryGetPDF{
    my($self, $link)=@_;
    my $result=$self->_getPDFFile($link);
    if($result==1){
	$self->_log("DONE\t".$self->{link});
    }
    return $result;
}

#given a web page, use this one to get all of the links in that page.

sub _getSubLinks{
    my ($self, $link)=@_;
    my $data=$self->_getLinkContent($link);
    my @array=();
    my $pos=0;
    my $pos2=0;
    my $tmp="";
    my $count=0;
    while(1){
	$pos=index($data, "\"", $pos);
	if($pos==-1 || $count>50){   #it is possible the page does not have link.
we use the number to control.
	    last;
	}
	$pos2=index($data, "\"", $pos+1);
	$tmp=substr($data, $pos+1, $pos2-$pos-1);
	if($tmp=~/^(http)|\//){
	    push(@array, $tmp);
	}
	$pos=$pos2+1;
	$count++;
    }
    if($count>=50){
	@array=();
    }
    return @array;
}


#function to return a pdf file link from a webpage.

sub _getPDFLink{
    my($self, $link, $data)=@_;
    if(length($link)!=0){
	$data=$self->_getLinkContent($link);
	$data=~s/\n//g;
    }
    if($data=~/.*[\"|\'](.{5,}\.pdf)[\"|\']/ ||
$data=~/.*[\"|\'](.{5,}\.PDF)[\"|\']/ ){
	#ok, there is pdf file.
	return $1;
    }
    return "";  #not found.
}


#function to get the homepage. redirection is taken cared.

sub _getLinkContent{
    my ($self, $link)=@_;
    if($link!~/http:\/\//){  #some link has the format of http:/www..
	my $pos=index($link, "/");
	$link=substr($link, $pos);
    }
    if($link!~/^http/){
	$link=$self->_buildURL($link);
    }

    $link=~s/&\;/&/g;
    my $response=$self->{fp}->get($link);
    my $rHeader="";
    #ok, we need to analysis the header. to see if there is a refresh,
    #if yes, we will refresh the link,
    if($response->is_success()){
	$rHeader=$response->header("Refresh");
	if(length($rHeader)>0){
	    if($rHeader=~/URL\=(.*)/){
		return $self->_getLinkContent($1);
	    }
	}else{
	    #update the base url.
	    $self->{base}=$response->base();
	    return $response->content;
	}
    }
    return "";

}

#get the real pdf file. redirection is taken cared.

sub _getPDFFile{
    my($self, $link)=@_;
    if($link!~/^http/){
	$link=$self->_buildURL($link);
    }
    $link=~s/&\;/&/g;
    my $done=0;
    my $fileName=$self->{dir}."/".$self->{fileName}.".pdf";
    #try to see if there is a refresh,
    my $response=$self->{fp}->get($link);
    if($response->is_success()){
	my $rHeader=$response->header("Refresh");
	if(length($rHeader)>0 && $rHeader=~/URL\=(.*)/){
	    return $self->_getPDFFile($1);
	}
    }

    $self->{fp}->get($link, ":content_file"=>$fileName);
    #ok now, we test if this file is the pdf file, if yes,
    #we done, if not, return some message.
    open PDFIN, $fileName;
    while(<PDFIN>){
	if($_=~/^%PDF/){
	    $done=1;
	    last;
	}
    }
    close PDFIN;
    if($done==0){
	unlink $fileName;
	return -1;
    }
    return 1; #everything ok,

}

#function to record the log

sub _log{
    my($self, $data)=@_;
    if(length($self->{logFile})==0){
	print $data,"\n";
	return;
    }
    open LOGOUT, ">>".$self->{logFile} or die "can not open the log file
to write\n";
    print LOGOUT $data, "\n";
    close LOGOUT;
    return;
}

#function to build the full url.

sub _buildURL{
    my($self, $target)=@_;

    if($target=~/^\//){
	if($self->{base}=~/(http:\/\/.*?)\//){
	    return $1.$target;
	}
    }else{
	if($self->{base}=~/(http:\/\/.*)\//){
	    return $1."/".$target;
	}
    }
    return $target;
}


1;

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PDFDownloader.pm
Type: application/octet-stream
Size: 8962 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050121/a5eb3594/PDFDownloader.obj
From lstein at cshl.edu  Fri Jan 21 18:17:26 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Sat Jan 22 11:17:44 2005
Subject: [Bioperl-l] gff -> match/hsp in gbrowse
In-Reply-To: <1106322314.7583.27.camel@localhost>
References: <1106322314.7583.27.camel@localhost>
Message-ID: <200501211817.27073.lstein@cshl.edu>

You can continue to work in gff2, even when using bioperl 1.5.  
Alternatively the GFF3 version of HSP alignments is a simple matter 
of replacing the target coordinates with the Target=XXXXXXX attribute 
using the format described in the GFF3 spec.

Lincoln

On Friday 21 January 2005 10:45 am, Raoul Jean Pierre Bonnal wrote:
> Dear Community,
> today I have upgraded my bioperl installation to 1.5.0-rc2.
> How can I configure my gbrose db.conf to display match/hsp from
> myfile.gff ( default bioperl 1.5.0-rc2 format ) ?
> Gbrowse's tutorial describe the configuration of the previous
> format and it doesn't work for gff3.
>
> Is it possible to filter hsp for every match by rank or score from
> gbrowser db.conf file ? Can you post a working example, plez?
>
>
> tnx in advance.
>
> by RJP
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050121/b89071dd/attachment.bin
From colemanm at MIT.EDU  Sat Jan 22 16:07:12 2005
From: colemanm at MIT.EDU (Maureen L Coleman)
Date: Sat Jan 22 16:03:40 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
Message-ID: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu>

Hi.
I'm trying to use the protal2dna script (downloaded from Pasteur site) 
to convert protein alignments back to DNA alignments. It works in some 
cases but not in others.  In the cases where it doesn't work, it pulls 
out the same sequence twice instead of pulling out seq1 and seq2 from 
my protein alignment.  Then when it tries to match it up with the 
corresponding DNA sequence, it doesn't work - it matches prot1 with 
dna1 (correctly) and prot1 with dna2 (incorrectly).

I suspect this might be related to the name,start,end (nse) method in 
Bio::SimpleAlign.  Any suggestions?

Thanks,
Maureen

From talcon at iastate.edu  Sat Jan 22 19:57:43 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Sat Jan 22 19:53:54 2005
Subject: [Bioperl-l] bioperl-run for windows?
Message-ID: <41F2F687.4040303@iastate.edu>

Does a Windows version of bioperl-run exitst?  If so, how do I get it?

Tim


From mlemieux at bioinfo.ca  Sat Jan 22 23:04:11 2005
From: mlemieux at bioinfo.ca (Madeleine Lemieux)
Date: Sat Jan 22 23:00:23 2005
Subject: what about the speed on longer seq? Re: [Bioperl-l] regular 
Message-ID: <D6A9C704-6CF3-11D9-BE31-000A95B139D2@bioinfo.ca>

Below is a test string seeded with 3 instances of inverted repeats  
flanked by direct repeats and some code to find all such patterns. It's  
not as flexible as the EMBOSS palindrome finder nor is it a one-liner  
but it finds perfect inverted repeats fast.

HTH,
Madeleine

---------------
#!/usr/bin/perl -w

my $test_string =  
"GAAAATGGTTTAATCGGAAATTGAGTAGGAGGATAAAAGTCGCATGCTATTATAAATGAGATGCACTTTC
GACACCTCGCGGAAGTATATAAATGAAAGAAGCCCTCAGAAAACTTTAAATTGGAAATAGAGGGAAAATT
ACTGATGGTTGAAATCAGACCAAAATGGGATTGAAAGAGCCTTTCAGCCCTAGTGTGAGTGTCAGGTTTA
acgtgggtttatctcaaacccacgtCTCTTGTTGAAATCAGACCAAAATGGGATTGAAAGGTTTGTTAAGGG 
CTTTGATTTGCTCCTCGGTGGCT
CTGGTTGAAATCAGACCAAAATGGGATTGAAAGTAAAGCAGTTCACCCCTGTTACTGGTTTAACTGCCTT
GTTGAAATCAGACCAAAATGGGATTGAAAGGTATTTGAATCAATGAAAAGAAATCTTACCTCGTCGTTGA
AATCAGACCAAAATGGGATTGAAAGAGTCTTCTGGATGGGTCACAAGGGAGACATCGAGGCGTTGAAATC
AGACCAAAATGGGATTGAAAGTCAGCAAGGTTACGTCGGAGATCCTCGAAGAGGGTATCAGTTGAAATCA
GACCAAAATGGGATTGAAAGCGAGGATTGCTGCCAAAGAGAGCGCCTCGTTCTTCGGTTGAAATCAGACC
AAAATGGGATTGAAAGAAAGTGAACATGCTTAAAGAAATGCTGACAGAAATTGAGTTGAAATCAGACCAA
AATGGGATTGAAAGAGCGAGGAAGAGCTTGACGAATTCTTCAAAAGCGGAGTTGAAATCAGACCAAAATG
GGATTGAAAGTTGCATTTACATCGGCAGAATTGGTCTCGTCGGAAGGCATGTTGAAATCAGACCAAAATG
tttaatatcaaAGCATgggaaaggatattCCAAaatatcctttcccGCATacatataccataGGATTGAAAG 
CGGTTCTCTTACGTACTCATGCGAGAAGTGAGACTCGCGTTGGTTGAAATCAGACCAAAA
TGGGATTGAAAGAGCAAGTCGTGAAACTGAGCAGTCAAAACAGATCGTTAGTTGAAATCAGACCAAAATG
GGATTGAAAGTTTTCCCATACAATTACGACTTCGCCGGAAAAAAAGTTGAAATCAGACCAAAATGGGATT
GAAAGAGCGAGTTCGACCACGTCGTAGGTCTGCTGTCGGCAAGTTGAAATCAGACCAAAATGGGATTGAA
AGTGTTTGAAGTAGTTGAATACACCGTTGTGCTGTTTGTTGTTGAAATCAGACCAAAATGGGATTGAAAG
AGAGGGAGTATTAGGGCCATACTGGCCGGAGTTGTGGTTGTTGAAATCAGACCAAAATGGGATTGAAAGA
TTCCAAATTGCGGAAAAAGATTCGAGGGCAGTTACTTCCCGTTGAAATCAGACCAAAATGGGATTGAAAG
ccttgtgtacacccttACGTCGTTTATTGCCGTAACGCTAACACCATACTCAAGAGTTGAAATCAGACCAAA 
ATGGGATTGAAAGA
AAGCCGTCCAGCGATTGTTTTCATCCGCACCGATAATAGGTTGAAATCAGACCAAAATGGGATTGAAAGG
GTTTAGACTTCCAGCAGGTAAGACATTCAAGGTTCGTTGAAATCAGACCAAAATGGGATTGAAAGGAGGT
AATAGCTGCGAGGGTCAAGCAGGTTTACGAGAAGTTGAAATCAGACCAAAATGGGATTGAAAGGAGCAAT";

# arbitrarily insist on direct and inverted repeats of at least 4 bases  
long
while ( (length $test_string) > 15 ) {
     $seq = lc $test_string;
     # find direct repeats and work on the sequence between them
     $seq =~ m/([acgt]{4,})(?=([acgtn]+)\1)/;
     my $direct = $1;
     my $middle_stuff = my $reverse_complement = $2;
     if ($direct && $middle_stuff) {
         $reverse_complement = reverse $reverse_complement;
         $reverse_complement =~ tr/acgtn/tgcan/;
         my $inverted = "";
         my $char = "";
         # starting from the position next to the direct repeat, build  
up a string
         # from the matching characters of the original sequence and its  
rev_compl
         # don't bother looking past mid_point of string
         my $mid_point = (length $middle_stuff) / 2;
         while ( ((length $middle_stuff) > $mid_point) &&
                 (($char = chop $middle_stuff) eq (chop  
$reverse_complement)) ) {
             $inverted = $inverted . $char;
         }
	   if ( (length $inverted) > 3) {
             if ($inverted =~ m/n/) {
                 print "possible inverted repeat found:  
$inverted\nbetween $direct\n";
             } else {
                 print "inverted repeat found: $inverted\nbetween  
$direct\n";
             }
             print "substring length = ", length $test_string, "\n\n";
#            last;
         }
         # step through the original string from the 2nd position of the
	   # current direct repeat
         $seq =~ m/$direct/g;
         my $newstart = pos($seq) - (length $direct) + 1;
         $test_string = substr $test_string, $newstart;
     } else {
         last;
     }
}

From ch01ph14 at uohyd.ernet.in  Sat Jan 22 23:11:51 2005
From: ch01ph14 at uohyd.ernet.in (Sunil Kumar Panigahi)
Date: Sat Jan 22 23:07:57 2005
Subject: [Bioperl-l] Perl Script for Hydrogen Bonding
Message-ID: <1049.202.41.85.161.1106453511.squirrel@uohmail.uohyd.ernet.in>


Hi,

Can any body provide me the script for hydrogen bonding. I want to
calculate the hydrogen bond in Pdb(Protein data bank file).

Thanks in advance

Sunil


-----------------------------------------
This email was sent using UOH MAIL SERVER.
" Confidential Information!"
http://www.uohyd.ernet.in/
From rob at salmonella.org  Sat Jan 22 23:22:12 2005
From: rob at salmonella.org (Rob Edwards)
Date: Sat Jan 22 23:19:40 2005
Subject: what about the speed on longer seq? Re: [Bioperl-l] regular 
In-Reply-To: <D6A9C704-6CF3-11D9-BE31-000A95B139D2@bioinfo.ca>
References: <D6A9C704-6CF3-11D9-BE31-000A95B139D2@bioinfo.ca>
Message-ID: <5AFC029A-6CF6-11D9-A47D-000A959E1622@salmonella.org>

There is also a module I wrote  a while back to go into Bio::Tools that 
will find direct and indirect, exact and some imperfect repeats. I have 
not benchmarked this against other sequences, it does what I need and 
slow is not always bad (it gives you time for coffee...)

You can pass in a sequence object and get back a sequence object with 
the repeats annotated in (so that you can just write them out or get 
their sequences or other bioperly kinds of things).

There is one dependency on Tie::RefHash.

YMMV, but take a look: http://salmonella.org/bioperl/RepeatFinder.pm

Rob

From talcon at iastate.edu  Sun Jan 23 15:02:18 2005
From: talcon at iastate.edu (Tim Alcon)
Date: Sun Jan 23 15:44:40 2005
Subject: [Bioperl-l] bioperl-run for windows?
In-Reply-To: <F75705B9-6D49-11D9-9F52-000393C44276@duke.edu>
References: <41F2F687.4040303@iastate.edu>
	<F75705B9-6D49-11D9-9F52-000393C44276@duke.edu>
Message-ID: <41F402CA.3020905@iastate.edu>

If I just grab it off CPAN, will it work on Windows, or does it use Unix 
system calls?

Tim


Jason Stajich wrote:

>  Is there a PPM on the bioperl site?
>   No
>
>  Can you install bioperl-run on windows?
>   Yes - but you'll have to do it manually, or learn how to build PPMs 
> (quite simple really), or encourage someone to produce a PPM for 
> bioperl-run.
>
> -jason
> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote:
>
>> Does a Windows version of bioperl-run exitst?  If so, how do I get it?
>>
>> Tim
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
>

From jason.stajich at duke.edu  Sun Jan 23 09:20:43 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Jan 23 16:13:19 2005
Subject: [Bioperl-l] bioperl-run for windows?
In-Reply-To: <41F2F687.4040303@iastate.edu>
References: <41F2F687.4040303@iastate.edu>
Message-ID: <F75705B9-6D49-11D9-9F52-000393C44276@duke.edu>

  Is there a PPM on the bioperl site?
   No

  Can you install bioperl-run on windows?
   Yes - but you'll have to do it manually, or learn how to build PPMs 
(quite simple really), or encourage someone to produce a PPM for 
bioperl-run.

-jason
On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote:

> Does a Windows version of bioperl-run exitst?  If so, how do I get it?
>
> Tim
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Sun Jan 23 09:19:13 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Jan 23 16:13:26 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
In-Reply-To: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu>
References: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu>
Message-ID: <C17FE5DD-6D49-11D9-9F52-000393C44276@duke.edu>

I'm not familiar with the script.

Bio::Align::Utilities does protein to DNA mapping for an alignment with 
the aa_to_dna_aln function.

-jason
On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:

> Hi.
> I'm trying to use the protal2dna script (downloaded from Pasteur site) 
> to convert protein alignments back to DNA alignments. It works in some 
> cases but not in others.  In the cases where it doesn't work, it pulls 
> out the same sequence twice instead of pulling out seq1 and seq2 from 
> my protein alignment.  Then when it tries to match it up with the 
> corresponding DNA sequence, it doesn't work - it matches prot1 with 
> dna1 (correctly) and prot1 with dna2 (incorrectly).
>
> I suspect this might be related to the name,start,end (nse) method in 
> Bio::SimpleAlign.  Any suggestions?
>
> Thanks,
> Maureen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Sun Jan 23 15:11:07 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Jan 23 16:14:21 2005
Subject: [Bioperl-l] bioperl-run for windows?
In-Reply-To: <41F402CA.3020905@iastate.edu>
References: <41F2F687.4040303@iastate.edu>
	<F75705B9-6D49-11D9-9F52-000393C44276@duke.edu>
	<41F402CA.3020905@iastate.edu>
Message-ID: <EAB5A722-6D7A-11D9-9F52-000393C44276@duke.edu>

It uses perl system calls to execute a program - usually just the 
backticks approach.

Honestly I have no idea how much success you will have.  If you use 
Cygwin it will probably do okay for most programs, but I have never 
made any attempt to run any of it myself under windows. You'll have to 
see if any other list members have experiences.  We tried to make the 
code flexible (adding .exe to the names of executables, etc) when the 
program is being run on windows.

You'll just have to give it a try and report in with problems.

-jason

On Jan 23, 2005, at 3:02 PM, Tim Alcon wrote:

> If I just grab it off CPAN, will it work on Windows, or does it use 
> Unix system calls?
>
> Tim
>
>
>
> Jason Stajich wrote:
>
>>  Is there a PPM on the bioperl site?
>>   No
>>
>>  Can you install bioperl-run on windows?
>>   Yes - but you'll have to do it manually, or learn how to build PPMs 
>> (quite simple really), or encourage someone to produce a PPM for 
>> bioperl-run.
>>
>> -jason
>> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote:
>>
>>> Does a Windows version of bioperl-run exitst?  If so, how do I get 
>>> it?
>>>
>>> Tim
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> -- 
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>>
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From nathanhaigh at ukonline.co.uk  Sun Jan 23 16:43:55 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Sun Jan 23 16:40:43 2005
Subject: [Bioperl-l] bioperl-run for windows?
In-Reply-To: <41F2F687.4040303@iastate.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAcA0Zs/q6ikCwvbbULbghKAEAAAAA@ukonline.co.uk>

If you mean does a ppm version of bioperl-run exist, I don't think it does. When bioperl 1.5 is released I plan to make a ppd file
for both bioperl-1.5 and bioperl-run available so people can install them easily under windows.

You can however, get the bioperl-run 1.4 file from the bioperl website: http://www.bioperl.org/Core/Latest/index.shtml OR get the
latest CVS version from the bioperl website: http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-run/?cvsroot=bioperl

To install it, you will need nmake 1.5: http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe

Unpack the downloaded bioperl file, and run from within that directory:
"perl Makefile.PL"
"nmake test"
"nmake install"

If all this sounds to difficult, wait a couple of weeks for the ppm version of v1.5 to become available

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Tim Alcon
> Sent: 23 January 2005 00:58
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] bioperl-run for windows?
> 
> Does a Windows version of bioperl-run exitst?  If so, how do I get it?
> 
> Tim
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0503-2, 21/01/2005
> Tested on: 23/01/2005 21:34:15
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0503-2, 21/01/2005
Tested on: 23/01/2005 21:43:36
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From jason.stajich at duke.edu  Sun Jan 23 21:55:47 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Jan 23 21:51:50 2005
Subject: [Bioperl-l] 1.5.0 release
Message-ID: <72D911D2-6DB3-11D9-9F52-000393C44276@duke.edu>

If there are some more outstanding commits before we roll 1.5.0 out, 
please let me know.  Otherwise I'll tag and release the 1.5.0 tarball 
Monday.  This is a developer's release so it does not have to be 
completely perfect.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From Marc.Logghe at devgen.com  Mon Jan 24 05:05:12 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jan 24 05:01:31 2005
Subject: [Bioperl-l] struggling with Bio::FeatureIO and
	Bio::SeqFeature::Annotated
Message-ID: <BEE28BF86078B6429D6C780635718E21905114@morelia.be.devgen.com>

Hi all,
I have some problems with Bio::FeatureIO and Bio::SeqFeature::Annotated. But maybe these modules are not designed for the things I had in mind.
My initial goal seemed pretty straightforward. It turned out differently.
I have a gff file containing features of bunch of bioentries sitting in BioSQL.
I wanted to turn the gff into feature objects, add them to the bioentries, and save them back into the database.
As a test I fetch a genbank record, strip the features and convert them to gff. The gff is again converted to features and added to the stripped seq object.
The test script looks like this:
========================================================
#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::Tools::GFF;
use Bio::FeatureIO;
use IO::String;
use Bio::DB::GenBank;

use Data::Dumper;

*Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags;

my $gff;
my $gffio = IO::String->new($gff);

my $db = Bio::DB::GenBank->new;
my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank');
my $seq = $db->get_Seq_by_acc('Z50755');

my @feat = $seq->remove_SeqFeatures;

# writing option 1
my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3);
# writing option 2
my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3);

$fout->write_feature(@feat);

$gffio = IO::String->new($gff);

my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3);

while (my $feat = $fin->next_feature)
{
 $seq->add_SeqFeature($feat);
}
print Data::Dumper->Dump([$seq],['seq']);

$sout->write_seq($seq);
========================================================

First, I had an issue when writing the features to gff using Bio::FeatureIO (writing option 2):

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: only Bio::SeqFeature::Annotated objects are writeable
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328
STACK: Bio::FeatureIO::gff::write_feature /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259
STACK: ./test.pl:25
-----------------------------------------------------------

Therefore, I used Bio::Tools::GFF to write (writing option 1). But then, I run into troubles when it comes to dumping the sequence into genbank format:
Can't locate object method "all_tags" via package "Bio::SeqFeature::Annotated" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm line 212, <GEN1> line 52.

I tried to fix this by adding the line
*Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags;
 
But in vain:
Can't locate object method "get_all_tags" via package "Bio::Annotation::Collection" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm line 547, <GEN1> line 52.

Regards,
Marc


From grassi.e at virgilio.it  Mon Jan 24 06:55:15 2005
From: grassi.e at virgilio.it (grassi.e@virgilio.it)
Date: Mon Jan 24 06:51:20 2005
Subject: [Bioperl-l] Nearly OT question(s) - across databases
Message-ID: <415382EC00123E04@ims5c.cp.tin.it>

Hello everybody,

first of all I'd like to apologize for my poor english and the not very
 "bioperlic" question.
I've got a list of ests from the stanford database and I need to obtain
their unigene cluster and possibly gene-id (the stanford database doesn't
supply this informations for all the ests).
My question is: is there a quick way to do this using bioperl? 
I'd prefer to download the databases that are needed rather than connecting
to them remotely, because it would be too time-consuming.
As long as I usually use plain perl I'm looking around the entrez gene databases
to understand the better way to gain the data that I need; but I was wondering
if using bioperl would help me. 

Thank you,
Elena Grassi


From sdavis2 at mail.nih.gov  Mon Jan 24 08:57:02 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon Jan 24 08:53:07 2005
Subject: [Bioperl-l] Nearly OT question(s) - across databases
References: <415382EC00123E04@ims5c.cp.tin.it>
Message-ID: <001001c5021c$96ef4450$7d75f345@WATSON>

Elena,

By Stanford database, I assume you mean the Stanford SOURCE batch query web 
page?  If that is so, then you have already used the data available via 
Entrez.  SOURCE uses the unigene build from NCBI to map clones or genbank 
accessions to unigene and entrez gene.  In other words, using the entrez 
database will not help you get more information.  Unfortunately, it is not 
at all uncommon to have ESTs that do not map to a gene_id or unigene 
cluster, so those have to remain orphans.  This sounds like a 
microarray-type project, and if it is, what I tend to do is to find the ESTs 
that are "interesting" for followup and that are not annotated via other 
means and blast those against transcript libraries like refseq and ensembl 
transcripts to find the "best" match.  In some cases, this "best" match will 
not be very good, but in others it will be perfectly adequate to tell you 
what you are looking at.  So, in short, there are not other databases at 
NCBI that are likely to be helpful and your best bet is to blast the 
remaining ESTs against refseq for your genome of interest.

Sean

----- Original Message ----- 
From: <grassi.e@virgilio.it>
To: "Bioperl (E-mail)" <bioperl-l@bioperl.org>
Sent: Monday, January 24, 2005 6:55 AM
Subject: [Bioperl-l] Nearly OT question(s) - across databases


> Hello everybody,
>
> first of all I'd like to apologize for my poor english and the not very
> "bioperlic" question.
> I've got a list of ests from the stanford database and I need to obtain
> their unigene cluster and possibly gene-id (the stanford database doesn't
> supply this informations for all the ests).
> My question is: is there a quick way to do this using bioperl?
> I'd prefer to download the databases that are needed rather than 
> connecting
> to them remotely, because it would be too time-consuming.
> As long as I usually use plain perl I'm looking around the entrez gene 
> databases
> to understand the better way to gain the data that I need; but I was 
> wondering
> if using bioperl would help me.
>
> Thank you,
> Elena Grassi
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Mon Jan 24 09:54:35 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 24 09:50:41 2005
Subject: [Bioperl-l] 1.5.0 release
In-Reply-To: <00d701c5021f$23f57de0$7d75f345@WATSON>
References: <72D911D2-6DB3-11D9-9F52-000393C44276@duke.edu>
	<00d701c5021f$23f57de0$7d75f345@WATSON>
Message-ID: <DD083054-6E17-11D9-ABB8-000393C44276@duke.edu>

I'll make this on-list to hopefully clear some of this up.

Aside, I don't have a lot of time to debate the philosophy here.   I 
really never intended to be release weenie for this release so if folks 
have a strong opinion about how it should be done, please also consider 
being the next release-master.


1.5.0 (and all the release-candidates 1.5.0-RC1 and 1.5.0-RC2) are made 
off the HEAD of the CVS tree. Remember HEAD is the default branch you 
get when you do a CVS checkout.  You can type 'cvs log SOME-FILE' to 
see all the tags that have been applied to a file.  The webcvs - 
http://cvs.open-bio.org/ - also shows this graphically if you prefer.

So 1.5.0 will have all the changes made on the HEAD - this means all 
the changes since we branched for 1.4 sometime last year.  If we 
release another developer release before doing 1.6 I would think we'd 
still name it 1.5.1 (scenario B below) just organize things, but it 
might have a separate tag.  Scenario A  is what we do with stable 
releases where we make a branch tag and then all releases on that 
branch are derived from the 1.4 timepoint.

A)
--------------------------HEAD---->
            \
             1.4 tag (branch)
               \
                1.4.1 tag

B)
--------------------------HEAD---->
            \        \
             1.5   1.5.1


WRT specifics of the next releases. The idea is that 1.5.0 goes out and 
people play with it, but we haven't christened it as a 'stable' 
release, meaning not all the functionality in releases after 1.5.0 will 
behave the same.  While we pledge that when 1.6.0 is ready, it will be 
stable and around for a while, and IDEALLY not break  any code you 
wrote against the 1.4.0 library.   Some recent commits to the HEAD of 
bioperl negate this so we will probably have them backed out of the 
stable release.

The developer release is just one step easier than CVS checking out so 
if you already run your code from a CVS checked out directory this 
isn't going to make your life better.  If you are unable to run things 
in-house unless they have a on official "RELEASE" stamp or if you can't 
use CVS, then 1.5.0 is for you.  If you want the latest functionality 
in order to use the latest GBrowse release, 1.5.0 will be for you.  If 
you are happy with your current Gbrowse setup, don't run this on your 
production server until you have tested things.    If you are very 
conservative, run an important pipeline of analysis in-house that 
relies on Bioperl, and it is currently working great - you DON'T need 
to update to 1.5.0.  In fact, don't do it in a production environment, 
but give the developer release a spin in a, well, development 
environment.

Most importantly we want people to try out the release and really use 
it like they would in production environment, then tell us what breaks. 
  Doing this lets the 1.6.0 release really be what you hope and yearn 
for.... =)

A policy.
What I would like to do in the future when preparing for a stable 
release is start to branch early - say 1 to 2 months before - and 
require changes on the branch to be only bugfixing or Well-Thought Out, 
and in general no API changes that remove functionality, GainOfFunction 
(GOF) probably allowed.  Also remember the even numbered releases are 
stable releases, odd numbered are developer releases and don't go out 
to CPAN.  We don't make API changes on the branch so that 1.4.1 is 
completely compatible with 1.4.0.   I care less about GOF on the branch 
as long as it doesn't break anything.


Instructions and Mechanics of how to do all of this with CVS.

See this page:  http://bioperl.org/UserInfo/CVShelp.shtml

How do you get things on the different branches.
Let's see what branches are around:
[jason@lugano core]$ cvs log README

RCS file: /home/repository/bioperl/bioperl-live/README,v
Working file: README
head: 1.36
branch:
locks: strict
access list:
symbolic names:
         bioperl-release-1-5-0-rc2: 1.36
         bioperl-release-1-5-0-rc1: 1.36
         branch-1-4: 1.34.0.2
         bioperl-release-1-4-0: 1.34
         bioperl-devel-1-3-04: 1.34
         bioperl-devel-1-3-03: 1.34
         bioperl-devel-1-3-02: 1.33
         bioperl-devel-1-3-01: 1.33
         bioperl-release-1-2-3: 1.30.2.4
         bioperl-release-1-2-2: 1.30.2.3
         bioperl-run-release-1-2-0: 1.32
         bioperl-release-1-2-1: 1.30
         bioperl-1-2-1-rc1: 1.30
         branch-1-2-collection: 1.30.0.6
         bioperl-release-1-2-0: 1.30
         branch-1-2: 1.30.0.2
         bioperl-devel-1-1-1: 1.27
         bioperl-release-1-1-0: 1.23
         bioperl-release-1-0-2: 1.20.2.7
         bioperl-release-1-0-1: 1.20.2.7
         bioperl-release-1-0-0: 1.20.2.6
         bioperl-1-0-alpha2-rc: 1.20.2.1

We name branches with a 'branch-' prefix.  The releases have the word 
'release' in them.  Hopefully that is clear!
So if you check out from a branch it means you get the most up-to-date 
code from the branch (if there were additional commits after the point 
in time when you made this tag, you'll get them) while a release-tag 
gets code at particular finite point in time.

Try to get the 1.4 release (with a CVS account).
The cmd line options we are using after 'checkout'
-r BRANCH-NAME OR TAG-NAME)
-d DIRECTORY-NAME
REPOSITORY-NAME
% cvs -d:ext:YOURNAME@pub.open-bio.org:/home/repository/bioperl 
checkout -r branch-1-4 -d bioperl-1.4 bioperl-live

If you want do this via anonymous CVS (no read-write access)
% cvs -d:pserver:cvs@cvs.open-bio.org:/home/repository/bioperl checkout 
-r branch-1-4 -d branch-1.4 bioperl-live


Now if you want to make a change on the branch you HAVE to make those 
changes in that directory we checked out: "bioperl-1.4"
When you check them in you do the normal CVS commit.

If you want to merge your changes back onto the main trunk after you've 
made changes on the branch (or vice-versa, flip-flop the directory 
names)
1. check in your changes on the branch
2. Go to the OTHER directory you have where the HEAD code is checked 
out (called 'bioperl-live' in this example)
% cvs -d:ext:YOURNAME@pub.open-bio.org:/home/repository/bioperl 
checkout bioperl-live
3. do an update to merge the changes from the branch, let's merge 
changes in Bio/SeqIO/swiss.pm to the HEAD from the branch
% cd bioperl-live
% cvs update -j branch-1-4 Bio/SeqIO/swiss.pm
RUN THE TESTS
% perl -I. -w t/SeqIO.t
.... all tests pass ...
% cvs commit -m "merged changes from 1.4 branch regarding 
this-and-that" Bio/SeqIO/swiss.pm


Done.  Reverse the directory and branch names to merge from the HEAD to 
the BRANCH
% cd bioperl-1.4
% cvs update -j HEAD Bio/SeqIO/swiss.pm
% cvs commit -m "merged changes from HEAD to branch regarding 
this-and-that" Bio/SeqIO/swiss.pm


Hope that helps some.
-jason


On Jan 24, 2005, at 9:15 AM, Sean Davis wrote:

> Jason,
>
> I'm sorry to bother, but what is the current CVS tag system with 
> regard to bioperl?  For example, if I do a CO on Monday of 
> bioperl-live, what do I get?  I have always worked from bioperl CVS 
> code, so just wanted to make sure that the tags weren't going to 
> change and, if so, what is going to be what.  I wasn't sure if this 
> should go to the list....
>
> Thanks,
> Sean
>
> ----- Original Message ----- From: "Jason Stajich" 
> <jason.stajich@duke.edu>
> To: <bioperl-l@bioperl.org>
> Sent: Sunday, January 23, 2005 9:55 PM
> Subject: [Bioperl-l] 1.5.0 release
>
>
>> If there are some more outstanding commits before we roll 1.5.0 out, 
>> please let me know.  Otherwise I'll tag and release the 1.5.0 tarball 
>> Monday.  This is a developer's release so it does not have to be 
>> completely perfect.
>>
>> -jason
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From glim at mycybernet.net  Mon Jan 24 00:08:16 2005
From: glim at mycybernet.net (Gerard Lim)
Date: Mon Jan 24 10:02:14 2005
Subject: [Bioperl-l] Yet Another Perl Conference North America 2005
	announces call-for-papers
Message-ID: <200501240008.16906.glim@mycybernet.net>

   YAPC::NA 2005 (Yet Another Perl Conference, North America) has just
   released its call-for-papers; potential and aspiring speakers can
   submit a presentation proposal via:

         http://yapc.org/America/cfp-2005.shtml

   The dates of the conference are Monday - Wednesday 27-29 June 2005.
   The location will be in downtown Toronto, Ontario, Canada. (Note that
   a different date block was previously announced, but has been moved to
   accomodate venue availability.)

   The close of the call-for-papers is April 18, 2005 at 11:59 pm.

   If you have any questions regarding the call-for-papers or speaking at
   YAPC::NA 2005 please email na-author@yapc.org

   We would love to hear from potential sponsors. Please contact the
   organizers at na-sponsor@yapc.org to learn about the benefits of
   sponsorship.

   Other information regarding the conference (e.g. venue, registration
   specifics) will be announced soon.

   We look forward to your submissions and a great conference!
From letondal at pasteur.fr  Mon Jan 24 10:28:27 2005
From: letondal at pasteur.fr (Catherine Letondal)
Date: Mon Jan 24 10:21:39 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
In-Reply-To: <C17FE5DD-6D49-11D9-9F52-000393C44276@duke.edu>
References: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu>
	<C17FE5DD-6D49-11D9-9F52-000393C44276@duke.edu>
Message-ID: <98709246-6E1C-11D9-894E-000D93B0BD32@pasteur.fr>


On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:

> I'm not familiar with the script.

Web:
http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
Man:
http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
Ftp:
ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna

>
> Bio::Align::Utilities does protein to DNA mapping for an alignment 
> with the aa_to_dna_aln function.

The problem with this function aa_to_dna_aln is that  is restricted to 
frame 1 and to the standard genetic code, right?
        aa_to_dna_aln

         Title   : aa_to_dna_aln
         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
         Function: Will convert an AA alignment to DNA space given the
                   corresponding DNA sequences.  Note that this method 
expects
                   the DNA sequences to be in frame +1 (GFF frame 0) as 
it will
                   start to project into coordinates starting at the 
first base of
                   the DNA sequence, if this alignment represents a 
different
                   frame for the cDNA you will need to edit the DNA 
sequences
                   to remove the 1st or 2nd bases (and revcom if things 
should be).
         Returns : Bio::Align::AlignI object
         Args    : 2 arguments, the alignment and a hashref.
                   Alignment is a Bio::Align::AlignI of amino acid 
sequences.
                   The hash reference should have keys which are
                   the display_ids for the aa
                   sequences in the alignment and the values are a
                   Bio::PrimarySeqI object for the corresponding
                   spliced cDNA sequence.


The other problem when using tools offering several genetic code (these 
sequences need a bacterial genetic code), is that the start codon of 
this code is not the right one. These sequences need: GTG=M (and not 
V).

>
> -jason
> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>
>> Hi.
>> I'm trying to use the protal2dna script (downloaded from Pasteur 
>> site) to convert protein alignments back to DNA alignments. It works 
>> in some cases but not in others.  In the cases where it doesn't work, 
>> it pulls out the same sequence twice instead of pulling out seq1 and 
>> seq2 from my protein alignment.  Then when it tries to match it up 
>> with the corresponding DNA sequence, it doesn't work - it matches 
>> prot1 with dna1 (correctly) and prot1 with dna2 (incorrectly).
>>
>> I suspect this might be related to the name,start,end (nse) method in 
>> Bio::SimpleAlign.  Any suggestions?
>>
>> Thanks,
>> Maureen
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From jason.stajich at duke.edu  Mon Jan 24 10:41:44 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 24 10:38:03 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
In-Reply-To: <98709246-6E1C-11D9-894E-000D93B0BD32@pasteur.fr>
References: <9628A3F4-6CB9-11D9-8EF9-000A95E515DC@mit.edu>
	<C17FE5DD-6D49-11D9-9F52-000393C44276@duke.edu>
	<98709246-6E1C-11D9-894E-000D93B0BD32@pasteur.fr>
Message-ID: <72EB0F5C-6E1E-11D9-ABB8-000393C44276@duke.edu>


On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:

>
> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
>
>> I'm not familiar with the script.
>
> Web:
> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
> Man:
> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
> Ftp:
> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
>
>>
>> Bio::Align::Utilities does protein to DNA mapping for an alignment 
>> with the aa_to_dna_aln function.
>
> The problem with this function aa_to_dna_aln is that  is restricted to 
> frame 1 and to the standard genetic code, right?
>        aa_to_dna_aln
>
This is an alignment mapper routine not an alignment routine itsself.  
So I think I was just being stupid and not looking at what protal2dna 
really was doing.

You provide it the protein multiple sequence alignment alignment and 
the coding sequence which gave rise to it.  It maps the gaps back in so 
you have a CDS alignment.  Very basic iterating through the alignment.

So it has to all be in-frame and already spliced, it should have been 
called aa_to_cds_aln.

The method is intended for getting ready to do Ka/Ks type stuff so that 
you have aligned  the sequences on codon boundaries and with knowledge 
about conservative aa replacements.

apologies for inciting confusion...
-j

>         Title   : aa_to_dna_aln
>         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
>         Function: Will convert an AA alignment to DNA space given the
>                   corresponding DNA sequences.  Note that this method 
> expects
>                   the DNA sequences to be in frame +1 (GFF frame 0) as 
> it will
>                   start to project into coordinates starting at the 
> first base of
>                   the DNA sequence, if this alignment represents a 
> different
>                   frame for the cDNA you will need to edit the DNA 
> sequences
>                   to remove the 1st or 2nd bases (and revcom if things 
> should be).
>         Returns : Bio::Align::AlignI object
>         Args    : 2 arguments, the alignment and a hashref.
>                   Alignment is a Bio::Align::AlignI of amino acid 
> sequences.
>                   The hash reference should have keys which are
>                   the display_ids for the aa
>                   sequences in the alignment and the values are a
>                   Bio::PrimarySeqI object for the corresponding
>                   spliced cDNA sequence.
>
>
> The other problem when using tools offering several genetic code 
> (these sequences need a bacterial genetic code), is that the start 
> codon of this code is not the right one. These sequences need: GTG=M 
> (and not V).
>
>>
>> -jason
>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>>
>>> Hi.
>>> I'm trying to use the protal2dna script (downloaded from Pasteur 
>>> site) to convert protein alignments back to DNA alignments. It works 
>>> in some cases but not in others.  In the cases where it doesn't 
>>> work, it pulls out the same sequence twice instead of pulling out 
>>> seq1 and seq2 from my protein alignment.  Then when it tries to 
>>> match it up with the corresponding DNA sequence, it doesn't work - 
>>> it matches prot1 with dna1 (correctly) and prot1 with dna2 
>>> (incorrectly).
>>>
>>> I suspect this might be related to the name,start,end (nse) method 
>>> in Bio::SimpleAlign.  Any suggestions?
>>>
>>> Thanks,
>>> Maureen
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From Marc.Logghe at devgen.com  Mon Jan 24 10:46:44 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jan 24 10:43:00 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
Message-ID: <BEE28BF86078B6429D6C780635718E2190511E@morelia.be.devgen.com>

Guess, this is the bioperl implementation of EMBOSS tranalign ?
http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/tranalign.html

ML

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of 
> Jason Stajich
> Sent: Monday, January 24, 2005 4:42 PM
> To: Catherine Letondal
> Cc: bioperl-l@portal.open-bio.org; Maureen L Coleman
> Subject: Re: [Bioperl-l] protal2dna and Bio::SimpleAlign
> 
> 
> 
> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:
> 
> >
> > On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
> >
> >> I'm not familiar with the script.
> >
> > Web:
> > http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
> > Man:
> > http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
> > Ftp:
> > ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
> >
> >>
> >> Bio::Align::Utilities does protein to DNA mapping for an alignment 
> >> with the aa_to_dna_aln function.
> >
> > The problem with this function aa_to_dna_aln is that  is 
> restricted to 
> > frame 1 and to the standard genetic code, right?
> >        aa_to_dna_aln
> >
> This is an alignment mapper routine not an alignment routine 
> itsself.  
> So I think I was just being stupid and not looking at what protal2dna 
> really was doing.
> 
> You provide it the protein multiple sequence alignment alignment and 
> the coding sequence which gave rise to it.  It maps the gaps 
> back in so 
> you have a CDS alignment.  Very basic iterating through the alignment.
> 
> So it has to all be in-frame and already spliced, it should have been 
> called aa_to_cds_aln.
> 
> The method is intended for getting ready to do Ka/Ks type 
> stuff so that 
> you have aligned  the sequences on codon boundaries and with 
> knowledge 
> about conservative aa replacements.
> 
> apologies for inciting confusion...
> -j
> 
> >         Title   : aa_to_dna_aln
> >         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
> >         Function: Will convert an AA alignment to DNA space 
> given the
> >                   corresponding DNA sequences.  Note that 
> this method 
> > expects
> >                   the DNA sequences to be in frame +1 (GFF 
> frame 0) as 
> > it will
> >                   start to project into coordinates starting at the 
> > first base of
> >                   the DNA sequence, if this alignment represents a 
> > different
> >                   frame for the cDNA you will need to edit the DNA 
> > sequences
> >                   to remove the 1st or 2nd bases (and 
> revcom if things 
> > should be).
> >         Returns : Bio::Align::AlignI object
> >         Args    : 2 arguments, the alignment and a hashref.
> >                   Alignment is a Bio::Align::AlignI of amino acid 
> > sequences.
> >                   The hash reference should have keys which are
> >                   the display_ids for the aa
> >                   sequences in the alignment and the values are a
> >                   Bio::PrimarySeqI object for the corresponding
> >                   spliced cDNA sequence.
> >
> >
> > The other problem when using tools offering several genetic code 
> > (these sequences need a bacterial genetic code), is that the start 
> > codon of this code is not the right one. These sequences 
> need: GTG=M 
> > (and not V).
> >
> >>
> >> -jason
> >> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
> >>
> >>> Hi.
> >>> I'm trying to use the protal2dna script (downloaded from Pasteur 
> >>> site) to convert protein alignments back to DNA 
> alignments. It works 
> >>> in some cases but not in others.  In the cases where it doesn't 
> >>> work, it pulls out the same sequence twice instead of pulling out 
> >>> seq1 and seq2 from my protein alignment.  Then when it tries to 
> >>> match it up with the corresponding DNA sequence, it 
> doesn't work - 
> >>> it matches prot1 with dna1 (correctly) and prot1 with dna2 
> >>> (incorrectly).
> >>>
> >>> I suspect this might be related to the name,start,end 
> (nse) method 
> >>> in Bio::SimpleAlign.  Any suggestions?
> >>>
> >>> Thanks,
> >>> Maureen
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >> --
> >> Jason Stajich
> >> jason.stajich at duke.edu
> >> http://www.duke.edu/~jes12/
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From sdavis2 at mail.nih.gov  Mon Jan 24 10:59:59 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon Jan 24 10:55:54 2005
Subject: [Bioperl-l] 1.5.0 release
References: <72D911D2-6DB3-11D9-9F52-000393C44276@duke.edu>
	<00d701c5021f$23f57de0$7d75f345@WATSON>
	<DD083054-6E17-11D9-ABB8-000393C44276@duke.edu>
Message-ID: <001a01c5022d$c205c8b0$7d75f345@WATSON>

Jason,

As usual, thanks for the fantastic, extensive reply and the brief philosophy 
lesson.  It was more than I imagined could have been said on the subject--it 
helps immensely to be reminded of some of these details and the roadmap.

Sean

>> Jason,
>>
>> I'm sorry to bother, but what is the current CVS tag system with regard 
>> to bioperl?  For example, if I do a CO on Monday of bioperl-live, what do 
>> I get?  I have always worked from bioperl CVS code, so just wanted to 
>> make sure that the tags weren't going to change and, if so, what is going 
>> to be what.  I wasn't sure if this should go to the list....
> 


From colemanm at MIT.EDU  Mon Jan 24 11:01:59 2005
From: colemanm at MIT.EDU (Maureen L Coleman)
Date: Mon Jan 24 10:57:43 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
In-Reply-To: <72EB0F5C-6E1E-11D9-ABB8-000393C44276@duke.edu>
Message-ID: <4766FF3C-6E21-11D9-B1D1-000A95E515DC@mit.edu>

Thanks for the responses.  The problem (with both protal2dna and 
tranalign), as Catherine recognized, is that even when I specify 
Bacterial translation, it doesn't recognize my alternative start codons 
(gtg,ctg,ttg can all be Met).

As the quickest route, I went through and changed all my alternative 
start codons in the alignments to their "normal" translation.  Then 
protal2dna and tranalign seem to work fine.  aa_to_dna_aln should work 
for me too, since I already have the coding DNA sequences pulled out.

thanks again,
maureen

On Monday, January 24, 2005, at 10:41  AM, Jason Stajich wrote:

>
> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:
>
>>
>> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
>>
>>> I'm not familiar with the script.
>>
>> Web:
>> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
>> Man:
>> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
>> Ftp:
>> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
>>
>>>
>>> Bio::Align::Utilities does protein to DNA mapping for an alignment 
>>> with the aa_to_dna_aln function.
>>
>> The problem with this function aa_to_dna_aln is that  is restricted 
>> to frame 1 and to the standard genetic code, right?
>>        aa_to_dna_aln
>>
> This is an alignment mapper routine not an alignment routine itsself.  
> So I think I was just being stupid and not looking at what protal2dna 
> really was doing.
>
> You provide it the protein multiple sequence alignment alignment and 
> the coding sequence which gave rise to it.  It maps the gaps back in 
> so you have a CDS alignment.  Very basic iterating through the 
> alignment.
>
> So it has to all be in-frame and already spliced, it should have been 
> called aa_to_cds_aln.
>
> The method is intended for getting ready to do Ka/Ks type stuff so 
> that you have aligned  the sequences on codon boundaries and with 
> knowledge about conservative aa replacements.
>
> apologies for inciting confusion...
> -j
>
>>         Title   : aa_to_dna_aln
>>         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
>>         Function: Will convert an AA alignment to DNA space given the
>>                   corresponding DNA sequences.  Note that this method 
>> expects
>>                   the DNA sequences to be in frame +1 (GFF frame 0) 
>> as it will
>>                   start to project into coordinates starting at the 
>> first base of
>>                   the DNA sequence, if this alignment represents a 
>> different
>>                   frame for the cDNA you will need to edit the DNA 
>> sequences
>>                   to remove the 1st or 2nd bases (and revcom if 
>> things should be).
>>         Returns : Bio::Align::AlignI object
>>         Args    : 2 arguments, the alignment and a hashref.
>>                   Alignment is a Bio::Align::AlignI of amino acid 
>> sequences.
>>                   The hash reference should have keys which are
>>                   the display_ids for the aa
>>                   sequences in the alignment and the values are a
>>                   Bio::PrimarySeqI object for the corresponding
>>                   spliced cDNA sequence.
>>
>>
>> The other problem when using tools offering several genetic code 
>> (these sequences need a bacterial genetic code), is that the start 
>> codon of this code is not the right one. These sequences need: GTG=M 
>> (and not V).
>>
>>>
>>> -jason
>>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>>>
>>>> Hi.
>>>> I'm trying to use the protal2dna script (downloaded from Pasteur 
>>>> site) to convert protein alignments back to DNA alignments. It 
>>>> works in some cases but not in others.  In the cases where it 
>>>> doesn't work, it pulls out the same sequence twice instead of 
>>>> pulling out seq1 and seq2 from my protein alignment.  Then when it 
>>>> tries to match it up with the corresponding DNA sequence, it 
>>>> doesn't work - it matches prot1 with dna1 (correctly) and prot1 
>>>> with dna2 (incorrectly).
>>>>
>>>> I suspect this might be related to the name,start,end (nse) method 
>>>> in Bio::SimpleAlign.  Any suggestions?
>>>>
>>>> Thanks,
>>>> Maureen
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>> --
>>> Jason Stajich
>>> jason.stajich at duke.edu
>>> http://www.duke.edu/~jes12/
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>

From jason.stajich at duke.edu  Mon Jan 24 11:08:50 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 24 11:05:35 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
In-Reply-To: <4766FF3C-6E21-11D9-B1D1-000A95E515DC@mit.edu>
References: <4766FF3C-6E21-11D9-B1D1-000A95E515DC@mit.edu>
Message-ID: <3C3349C2-6E22-11D9-ABB8-000393C44276@duke.edu>

cool - I assume you know you can change the translation table used when 
you call the 'translate' function in bioperl.

So if you start the whole thing from a set of CDS sequences, you 
shouldn't have to do much messing around.  The aa_to_dna_aln doesn't do 
any fancy checking to insure that your codon actually can translate 
into the protein you specified.  That might be a good sanity check to 
put in.


Title   : translate
  Usage   : $protein_seq_obj = $dna_seq_obj->translate
            #if full CDS expected:
            $protein_seq_obj = 
$cds_seq_obj->translate(undef,undef,undef,undef,1);
  Function:

            Provides the translation of the DNA sequence using full
            IUPAC ambiguities in DNA/RNA and amino acid codes.

            The full CDS translation is identical to EMBL/TREMBL
            database translation. Note that the trailing terminator
            character is removed before returning the translation
            object.

            Note: if you set $dna_seq_obj->verbose(1) you will get a
            warning if the first codon is not a valid initiator.

            Added way of translating using a custom codon table.  This
            has to be the final addition to this overloaded interface!

  Returns : A Bio::PrimarySeqI implementing object
  Args    : character for terminator (optional) defaults to '*'
            character for unknown amino acid (optional) defaults to 'X'
            frame (optional) valid values 0, 1, 2, defaults to 0
            codon table id (optional) defaults to 1
            complete coding sequence expected, defaults to 0 (false)
            boolean, throw exception if not complete CDS (true) or 
defaults to warning (false)
            codontable, a custom Bio::Tools::CodonTable object, optional
-jason

On Jan 24, 2005, at 11:01 AM, Maureen L Coleman wrote:

> Thanks for the responses.  The problem (with both protal2dna and 
> tranalign), as Catherine recognized, is that even when I specify 
> Bacterial translation, it doesn't recognize my alternative start 
> codons (gtg,ctg,ttg can all be Met).
>
> As the quickest route, I went through and changed all my alternative 
> start codons in the alignments to their "normal" translation.  Then 
> protal2dna and tranalign seem to work fine.  aa_to_dna_aln should work 
> for me too, since I already have the coding DNA sequences pulled out.
>
> thanks again,
> maureen
>
> On Monday, January 24, 2005, at 10:41  AM, Jason Stajich wrote:
>
>>
>> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:
>>
>>>
>>> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
>>>
>>>> I'm not familiar with the script.
>>>
>>> Web:
>>> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
>>> Man:
>>> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
>>> Ftp:
>>> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
>>>
>>>>
>>>> Bio::Align::Utilities does protein to DNA mapping for an alignment 
>>>> with the aa_to_dna_aln function.
>>>
>>> The problem with this function aa_to_dna_aln is that  is restricted 
>>> to frame 1 and to the standard genetic code, right?
>>>        aa_to_dna_aln
>>>
>> This is an alignment mapper routine not an alignment routine itsself. 
>>  So I think I was just being stupid and not looking at what 
>> protal2dna really was doing.
>>
>> You provide it the protein multiple sequence alignment alignment and 
>> the coding sequence which gave rise to it.  It maps the gaps back in 
>> so you have a CDS alignment.  Very basic iterating through the 
>> alignment.
>>
>> So it has to all be in-frame and already spliced, it should have been 
>> called aa_to_cds_aln.
>>
>> The method is intended for getting ready to do Ka/Ks type stuff so 
>> that you have aligned  the sequences on codon boundaries and with 
>> knowledge about conservative aa replacements.
>>
>> apologies for inciting confusion...
>> -j
>>
>>>         Title   : aa_to_dna_aln
>>>         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
>>>         Function: Will convert an AA alignment to DNA space given the
>>>                   corresponding DNA sequences.  Note that this 
>>> method expects
>>>                   the DNA sequences to be in frame +1 (GFF frame 0) 
>>> as it will
>>>                   start to project into coordinates starting at the 
>>> first base of
>>>                   the DNA sequence, if this alignment represents a 
>>> different
>>>                   frame for the cDNA you will need to edit the DNA 
>>> sequences
>>>                   to remove the 1st or 2nd bases (and revcom if 
>>> things should be).
>>>         Returns : Bio::Align::AlignI object
>>>         Args    : 2 arguments, the alignment and a hashref.
>>>                   Alignment is a Bio::Align::AlignI of amino acid 
>>> sequences.
>>>                   The hash reference should have keys which are
>>>                   the display_ids for the aa
>>>                   sequences in the alignment and the values are a
>>>                   Bio::PrimarySeqI object for the corresponding
>>>                   spliced cDNA sequence.
>>>
>>>
>>> The other problem when using tools offering several genetic code 
>>> (these sequences need a bacterial genetic code), is that the start 
>>> codon of this code is not the right one. These sequences need: GTG=M 
>>> (and not V).
>>>
>>>>
>>>> -jason
>>>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>>>>
>>>>> Hi.
>>>>> I'm trying to use the protal2dna script (downloaded from Pasteur 
>>>>> site) to convert protein alignments back to DNA alignments. It 
>>>>> works in some cases but not in others.  In the cases where it 
>>>>> doesn't work, it pulls out the same sequence twice instead of 
>>>>> pulling out seq1 and seq2 from my protein alignment.  Then when it 
>>>>> tries to match it up with the corresponding DNA sequence, it 
>>>>> doesn't work - it matches prot1 with dna1 (correctly) and prot1 
>>>>> with dna2 (incorrectly).
>>>>>
>>>>> I suspect this might be related to the name,start,end (nse) method 
>>>>> in Bio::SimpleAlign.  Any suggestions?
>>>>>
>>>>> Thanks,
>>>>> Maureen
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at duke.edu
>>>> http://www.duke.edu/~jes12/
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From letondal at pasteur.fr  Mon Jan 24 11:13:00 2005
From: letondal at pasteur.fr (Catherine Letondal)
Date: Mon Jan 24 11:05:56 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
In-Reply-To: <BEE28BF86078B6429D6C780635718E2190511E@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E2190511E@morelia.be.devgen.com>
Message-ID: <D1767E2D-6E22-11D9-894E-000D93B0BD32@pasteur.fr>


On Jan 24, 2005, at 4:46 PM, Marc Logghe wrote:

> Guess, this is the bioperl implementation of EMBOSS tranalign ?
> http://www.rfcgr.mrc.ac.uk/Software/EMBOSS/Apps/tranalign.html

This old script is indeed very similar to tranalign, except that it 
offers some quite useful features:
  - you can specifiy a different genetic code for each DNA sequence (-G 
option)
  - you can ask for a mapping of prot/dna sequences by their names 
instead of their position in the file (-i option)

What is now missing is a feature to specify alternate start codons.

BTW, I forgot to mention that the script uses the bioperl translate 
method, to which the code is being passed:
	my $trans = $dna->translate(undef, undef, $frame, $code);

and of course, $dna is a bioperl sequence loaded with the standard 
Seqio methods:

$in_dna_seqs = Bio::SeqIO->newFh (-file => $dna_file, 				              
                                    -format => 
$dna_file_format);http://javascript.internet.com/foldertree/


>
> ML
>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org
>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of
>> Jason Stajich
>> Sent: Monday, January 24, 2005 4:42 PM
>> To: Catherine Letondal
>> Cc: bioperl-l@portal.open-bio.org; Maureen L Coleman
>> Subject: Re: [Bioperl-l] protal2dna and Bio::SimpleAlign
>>
>>
>>
>> On Jan 24, 2005, at 10:28 AM, Catherine Letondal wrote:
>>
>>>
>>> On Jan 23, 2005, at 3:19 PM, Jason Stajich wrote:
>>>
>>>> I'm not familiar with the script.
>>>
>>> Web:
>>> http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html
>>> Man:
>>> http://bioweb.pasteur.fr/docs/man/man/protal2dna.1.html
>>> Ftp:
>>> ftp://ftp.pasteur.fr/pub/GenSoft/unix/alignment/protal2dna
>>>
>>>>
>>>> Bio::Align::Utilities does protein to DNA mapping for an alignment
>>>> with the aa_to_dna_aln function.
>>>
>>> The problem with this function aa_to_dna_aln is that  is
>> restricted to
>>> frame 1 and to the standard genetic code, right?
>>>        aa_to_dna_aln
>>>
>> This is an alignment mapper routine not an alignment routine
>> itsself.
>> So I think I was just being stupid and not looking at what protal2dna
>> really was doing.
>>
>> You provide it the protein multiple sequence alignment alignment and
>> the coding sequence which gave rise to it.  It maps the gaps
>> back in so
>> you have a CDS alignment.  Very basic iterating through the alignment.
>>
>> So it has to all be in-frame and already spliced, it should have been
>> called aa_to_cds_aln.
>>
>> The method is intended for getting ready to do Ka/Ks type
>> stuff so that
>> you have aligned  the sequences on codon boundaries and with
>> knowledge
>> about conservative aa replacements.
>>
>> apologies for inciting confusion...
>> -j
>>
>>>         Title   : aa_to_dna_aln
>>>         Usage   : my $dnaaln = aa_to_dna_aln($aa_aln, \%seqs);
>>>         Function: Will convert an AA alignment to DNA space
>> given the
>>>                   corresponding DNA sequences.  Note that
>> this method
>>> expects
>>>                   the DNA sequences to be in frame +1 (GFF
>> frame 0) as
>>> it will
>>>                   start to project into coordinates starting at the
>>> first base of
>>>                   the DNA sequence, if this alignment represents a
>>> different
>>>                   frame for the cDNA you will need to edit the DNA
>>> sequences
>>>                   to remove the 1st or 2nd bases (and
>> revcom if things
>>> should be).
>>>         Returns : Bio::Align::AlignI object
>>>         Args    : 2 arguments, the alignment and a hashref.
>>>                   Alignment is a Bio::Align::AlignI of amino acid
>>> sequences.
>>>                   The hash reference should have keys which are
>>>                   the display_ids for the aa
>>>                   sequences in the alignment and the values are a
>>>                   Bio::PrimarySeqI object for the corresponding
>>>                   spliced cDNA sequence.
>>>
>>>
>>> The other problem when using tools offering several genetic code
>>> (these sequences need a bacterial genetic code), is that the start
>>> codon of this code is not the right one. These sequences
>> need: GTG=M
>>> (and not V).
>>>
>>>>
>>>> -jason
>>>> On Jan 22, 2005, at 4:07 PM, Maureen L Coleman wrote:
>>>>
>>>>> Hi.
>>>>> I'm trying to use the protal2dna script (downloaded from Pasteur
>>>>> site) to convert protein alignments back to DNA
>> alignments. It works
>>>>> in some cases but not in others.  In the cases where it doesn't
>>>>> work, it pulls out the same sequence twice instead of pulling out
>>>>> seq1 and seq2 from my protein alignment.  Then when it tries to
>>>>> match it up with the corresponding DNA sequence, it
>> doesn't work -
>>>>> it matches prot1 with dna1 (correctly) and prot1 with dna2
>>>>> (incorrectly).
>>>>>
>>>>> I suspect this might be related to the name,start,end
>> (nse) method
>>>>> in Bio::SimpleAlign.  Any suggestions?
>>>>>
>>>>> Thanks,
>>>>> Maureen
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at duke.edu
>>>> http://www.duke.edu/~jes12/
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>

From Marc.Logghe at devgen.com  Mon Jan 24 11:14:03 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jan 24 11:10:14 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
Message-ID: <BEE28BF86078B6429D6C780635718E2190511F@morelia.be.devgen.com>

 
> yep - except you aren't required to have sequences in the 
> same order - 
> but require the sequence names to be the same in both (or you do the 
> mapping of names up-front in the hash you give to the 
> routine).
Hope people don't mind going a little off topic here ;-)
The order used to be a problem because most multiple alignment applications, like clustalw, don't preserve the order of the aligned sequences. However, this is possible now by the more recent version of clustalw where you can pass the option -outorder=input.
Peter Rice learned me how to cheat emboss' emma:
setenv EMBOSS_CLUSTALW "clustalw -outorder=input"

Marc

From Marc.Logghe at devgen.com  Mon Jan 24 11:20:07 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Mon Jan 24 11:19:47 2005
Subject: [Bioperl-l] protal2dna and Bio::SimpleAlign
Message-ID: <BEE28BF86078B6429D6C780635718E21905120@morelia.be.devgen.com>

> This old script is indeed very similar to tranalign, except that it 
> offers some quite useful features:
>   - you can specifiy a different genetic code for each DNA 
You can also do that in tranalign with the -table option.
BTW, table 1 is standard but with alternative initiation codons.
Maureen, does it help when you use that option ?
HTH,
Marc

From gyang at plantbio.uga.edu  Mon Jan 24 14:02:33 2005
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon Jan 24 13:58:47 2005
Subject: [Bioperl-l] help on large sequence with Bio::Index::Fasta!
In-Reply-To: <5AFC029A-6CF6-11D9-A47D-000A959E1622@salmonella.org>
Message-ID: <20050124140233.49ac48b6@dogwood.plantbio.uga.edu>

Hi, everybody,
I got another difficult situation:
I am running a local blast and sequence retrieval. The following sub works OK for one of my local DB1, but not for my local DB2. DB1 contains sequences of PACs and BACs (I believe the average size is ~100 or 200 kb), but DB2 contains entries of contigs as large as 30Mb. The error says the $seq object is undefined! I believe the problem is the size of the large entries in DB2. Can we use LargeSeq when we do retrieval? Can anybody help me on how we can use it with Bio::Index::Fasta?. Thank you for your comments in advance!
Yang


sub getseq {
my $id=$_[0];
my $file_name = $_[1];
my $inx=Bio::Index::Fasta->new (-filename => $file_name.".idx",
                                -write_flag => 1);
$inx->id_parser(\&get_id);
$inx->make_index($file_name);
$seq = $inx->fetch($id);  
return $seq;
	    }


From jdw at ou.edu  Mon Jan 24 14:48:40 2005
From: jdw at ou.edu (James D. White)
Date: Mon Jan 24 14:44:41 2005
Subject: what about the speed on longer seq? Re: [Bioperl-l] regular
	expression help!
In-Reply-To: <20050121223155.bd16abb4@dogwood.plantbio.uga.edu>
References: <20050121223155.bd16abb4@dogwood.plantbio.uga.edu>
Message-ID: <41F55118.4010800@ou.edu>

Your original regex would have required O(n**4) attempts to match, 
because the unanchored starting position, "\S+", "(\S+)", and ".*" each 
involve O(n) possibilities.

The unanchored starting position and each occurrence of ".*", "\S+", or 
a repeat range with no upper limit (e.g., "\S{4,}") can multiply by O(n) 
possible matches to be tested.

My regex is O(n**3) for the unanchored starting position, "(\S+)", and 
".*".  Adding the minimum 4 bases for the first repeat, my regex becomes:

=~ m:(\S{4,})(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; 
reverse($rev);})\1:i;

Your new regex is also O(n**4) for the unanchored starting position, 
"(\S{4,})", "(\S{10,})", and ".+", but the new regex does recognize 
longer inverted repeats.  The initial uncaptured \S+ is no longer there, 
but "(\S{10,})" which matches longer inverted repeats has replaced it in 
bringing the number of matches to be tested back up.  I might change
".*" to ".*?" to reduce the need for backtracking, resulting in:

=~ /(\S{4,})(\S{10,}).+?(??{sub($2)})\1/i;

but this is still O(n**4).

If my version is modified to find longer inverted repeats by using 
(\S{10,}), then it would also become O(n**4), but, I suspect that not 
calling the sub() avoids some extra overhead not present in your 
version, but I have not tried to examine any generated code nor run any 
tests to be sure.

Knowledge of upper limits for each repeat length can greatly reduce the 
number of unproductive matches by changing "*" to "{0,max}", "+" to 
"{1,max}", and "{min,}" to "{min,max}".  But I do not know if any 
reasonable limits are known for your data.

In order to bring the order back down to O(n**3) and still find the 
longer inverted repeats, let's break the problem up into finding the 
original 10 base inverted repeat and then extending it.  Using \1 and \2 
to represent the repeated substrings and revcomp() as the reverse 
complement of its argument, the matched sequence is "\1\2.*revcomp(\2)\1".

If the full \2 is longer than the minimum 10 bases, let's call the first 
10 bases \2a and the rest \2b.  The matched sequence is now 
"\1\2a\2b.*revcomp(\2b)revcomp(\2a)\1".  Now searching for only a 10 
base inverted repeat simplifies to "\1\2a.*revcomp(\2a)\1", which is an 
O(n**3) operation using my regex.  Extending the inverted repeat is 
O(n), but the combined process is not O(n**3)*O(n), but O(n**3)+O(n) 
which is still O(n**3).

Unfortunately the same process does not work for \1, because 
"\1a\1b\2.*revcomp(\2)\1a\1b" becomes either "\1a.*\2.*revcomp(\2)\1a" 
or "\1b\2.*revcomp(\2).*\1b", which is still O(n**4) in either case.

So the best I can come up with is (not tested):

# add parens to capture .*?
$string =~ m:(\S{4,})(\S{10})(.*?)(??{$rev = $2; $rev =~ 
tr/ATCGatcg/TAGCtagc/; reverse($rev);})\1:i;
# Save repeats and middle.  You can also look at @- and @+
# to find the positions of the matching substrings.
$direct = $1;
$inverted = $2;
$middle = $3;
# find the length of the extension for the inverted repeat
$low = 0;
$high = length($middle) - 1;
while ($low < $high) {
	$rc = lc(substr($middle, $high, 1));
	$rc =~ tr/atcg/tagc/;
	last if lc(substr($middle, $low, 1)) ne $rc;
	$low++;
	$high--;
	}
# extend the repeat, if necessary
if ($low) {
	$inverted .= substr($middle, 0, $low, '');
	substr($middle, -$low) = '';
	}

I hope this is helpful.  If the process is still too slow, then you can 
extend these ideas by finding fixed length direct and inverted repeats 
separately using O(n**2) regexes:

=~ m/(\S{4})(.*?)\1/i;		# to find direct repeats

=~ m:(\S{10})(.*?)(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/; 
reverse($rev);}):i;		# to find inverted repeats

Then extend them using techniques similar to the above.  Once you have 
the separate direct and inverted repeat locations, then you have to 
match neighboring repeats to find what you want.  This technique is more 
general and allows you, for example, to find sequences which have a few 
mismatched bases between the repeat pairs without exploding the 
complexity of the search.  There is however some increase in programming 
effort.

Good luck,

Jim White

Guojun Yang wrote:
> Thank you James for your detailed info. An earlier solution given is to use 
> =~ /(\S{4,})(\S{10,}).+(??{sub($2)})\1/i; the sub is to do the transliteration and reversion of $2. It works greatly on ~80 bp seq. However, on a seq ~500 bp, it takes forever to do. Is there any similarity in processing time for the regex? I will definitely try it.
> Have a great one,
> Yang
> ----- Original Message -----
> From: James D. White <jdw@ou.edu>
> To: bioperl-l@portal.open-bio.org
> Sent: Fri, 21 Jan 2005 11:54:37 -0500
> Subject: Re: [Bioperl-l] regular expression help!
> 
> 
> 
>>Sorry about double posting, but I forgot to change the subject before
>>sending the first message.
>>
>>
>>>Starting with:
>>>
>>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~ tr/ATCG/TAGC/i);})\1.*/i;
>>>
>>>The slashes in tr/// confused the Perl parser.  You need to use
>>>different delimiters for the m// operator (the m is implied by //)
>>>and the tr/// operator.  Also the tr/// operator does not use the
>>>i flag, so lower case needs to be handled explicitly.  So let's
>>>try the following:
>>>
>>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~
>>
>>tr/ATCGatcg/TAGCtagc/);})\1.*:i;
>>
>>>This gives the error:
>>>Can't modify constant item in transliteration (tr///) at (re_eval 1)
>>>line 1, near "tr/ATCGatcg/TAGCtagc/)"
>>>
>>>Inside the (??{ CODE }) sequence, use $1, $2, ..., instead of
>>>\1, \2, ... (See Programming Perl, 3rd Edition, "Match-time pattern
>>>interpolation", p. 213) Inside the evaluated CODE, \2 is a
>>>constant, not the value of the second captured substring.  Also I'm
>>>not sure what modifying $2 would do, so let's try:
>>>
>>>$regex =~ m:\S+(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/;
>>
>>reverse($rev);})\1.*:i;
>>
>>>This works, but I would get rid of the leading "\S+" and trailing
>>>".*".  The ".*" adds nothing useful, so just drop it.  You
>>>probably don't need the leading "\S+", because the pattern is not
>>>anchored to the beginning of the string with "^".  The leading
>>>"\S+" gobbles up the entire string, forcing the match to backtrack
>>>character by character from the end.  It also forces the substring
>>>match saved in $1 to occur after the first character.  Unless you
>>>never want $1 to consider the first character, just drop the
>>>leading "\S+".  If you don't want to search the first character,
>>>then just use "\S".  This results in:
>>>
>>>$regex =~ m:(\S+)(\S{10}).*(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/;
>>
>>reverse($rev);})\1:i;
>>
>>>Finally I would probably change the remaining ".*" to ".*?".  If
>>>you search with ".*" on a long sequence which could contain
>>>multiple sequences of interest, the ".*" pattern will match the rest
>>>of the sequence and force backtracking to match the first occurrence
>>>of "$1$2" with the last occurrence of "revcomp($2)$1".  If you use
>>>".*?", you match the first occurrence of "$1$2" with the nearest
>>>occurrence of "revcomp($2)$1".  This results in the final regular
>>>expression:
>>>
>>>$regex =~ m:(\S+)(\S{10}).*?(??{$rev = $2; $rev =~ tr/ATCGatcg/TAGCtagc/;
>>
>>reverse($rev);})\1:i;
>>
>>>>Date: Fri, 14 Jan 2005 14:12:46 -0500
>>>>From: Guojun Yang <gyang@plantbio.uga.edu>
>>>>Subject: [Bioperl-l] regular expression help!
>>>>To: bioperl-l@portal.open-bio.org
>>>>Message-ID: <20050114141246.94c7cb46@dogwood.plantbio.uga.edu>
>>>>Content-Type: text/plain;       charset="us-ascii"
>>>>
>>>>Hi, Everybody,
>>>>I was trying to use a regex recognizing a patter of inverted repeat DNA seq
>>
>>flanked by direct repeats (see below), it returns errors saying "(?{...}) not
>>terminated or {...} not balanced. Can anybody help me sorting this out?
>>
>>>>The regex I have is:
>>>>$regex =~ /\S+(\S+)(\S{10}).*(??{$rev=reverse(\2 =~
>>
>>tr/ATCG/TAGC/i);})\1.*/i;
>>
>>>>Thank you,
>>>>Yang
>>>>
>>>
>>>--
>>>James D. White   (jdw@ou.edu)
>>>Director of Bioinformatics
>>>Department of Chemistry and Biochemistry/ACGT
>>>University of Oklahoma
>>>101 David L. Boren Blvd., SRTC 2100
>>>Norman, OK 73019
>>>Phone: (405) 325-4912, FAX: (405) 325-7762
>>
>>--
>>James D. White   (jdw@ou.edu)
>>Director of Bioinformatics
>>Department of Chemistry and Biochemistry/ACGT
>>University of Oklahoma
>>101 David L. Boren Blvd., SRTC 2100
>>Norman, OK 73019
>>Phone: (405) 325-4912, FAX: (405) 325-7762
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> 


From barry.moore at genetics.utah.edu  Mon Jan 24 17:18:39 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon Jan 24 17:15:15 2005
Subject: [Fwd: Re: [Bioperl-l] bioperl-run for windows?]
Message-ID: <41F5743F.10201@genetics.utah.edu>

Some will work, and some won't. I've installed it on Windows, and used it
a bit there. One problem you'll run into is that you can't use bioperl
to run a program that can't be installed on Windows (EMBOSS for example) 
so you'll
be limited that way, but check out the Pise interface for any of that
software. You should be able to get access to alot of non-windows
software via bioperl by using the Pise interface (

Bio::Tools::Run::AnalysisFactory::Pise

http://www.pasteur.fr/recherche/unites/sis/Pise/


Barry

Tim Alcon wrote:

 > If I just grab it off CPAN, will it work on Windows, or does it use
 > Unix system calls?
 >
 > Tim
 >
 >
 >
 > Jason Stajich wrote:
 >
 >> Is there a PPM on the bioperl site?
 >> No
 >>
 >> Can you install bioperl-run on windows?
 >> Yes - but you'll have to do it manually, or learn how to build PPMs
 >> (quite simple really), or encourage someone to produce a PPM for
 >> bioperl-run.
 >>
 >> -jason
 >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote:
 >>
 >>> Does a Windows version of bioperl-run exitst? If so, how do I get it?
 >>>
 >>> Tim
 >>>
 >>>
 >>> _______________________________________________
 >>> Bioperl-l mailing list
 >>> Bioperl-l@portal.open-bio.org
 >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
 >>>
 >>>
 >> --
 >> Jason Stajich
 >> jason.stajich at duke.edu
 >> http://www.duke.edu/~jes12/
 >>
 >>
 >
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l@portal.open-bio.org
 > http://portal.open-bio.org/mailman/listinfo/bioperl-l


-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From allenday at ucla.edu  Mon Jan 24 18:54:42 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon Jan 24 18:50:38 2005
Subject: [Bioperl-l] struggling with Bio::FeatureIO and
	Bio::SeqFeature::Annotated
In-Reply-To: <BEE28BF86078B6429D6C780635718E21905114@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E21905114@morelia.be.devgen.com>
Message-ID: <Pine.LNX.4.58.0501241545250.17342@sumo.ctrl.ucla.edu>

Marc,

The problem was that Bio::SeqIO::FTHelper was making calls assuming it had 
a Bio::SeqFeature::Generic instance.  I've updated it to make calls 
compliant with the Bio::SeqFeatureI interface, and the script below now 
at least runs using "option 1".

"option 2" will not work, at least for now, because Bio::DB::GenBank is
creating a SeqIO that holds Bio::SeqFeature::Generic objects, and these
difficult to deal with because the internal data structures are different
than a Bio::SeqFeature::Annotated.  I like the technique used below to
bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very
clever.

You'll also notice that the GenBank-formatted file output by the script 
doesn't look quite right, the FEATURES section looks kind of like:

FEATURES             Location/Qualifiers
     Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975
                     /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)"
                     /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)"
                     /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)"
                     /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
                     /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)"
                     /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)"
                     /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
                     /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)"
                     /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)"
                     /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)"
                     /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)"
                     /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)"
                     /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)"
                     /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)"
                     /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)"

because Bio::SeqFeautre::Annotated holds annotations as objects pointers
rather than strings.  We can fix this with a stringification overload, but
I noticed that the code exists to do this in the Bio::Annotation::*
classes but is commented out, and I'm not sure why.  Maybe Hilmar can shed
some light on this.

-Allen


On Mon, 24 Jan 2005, Marc Logghe wrote:

> Hi all,
> I have some problems with Bio::FeatureIO and Bio::SeqFeature::Annotated. But maybe these modules are not designed for the things I had in mind.
> My initial goal seemed pretty straightforward. It turned out differently.
> I have a gff file containing features of bunch of bioentries sitting in BioSQL.
> I wanted to turn the gff into feature objects, add them to the bioentries, and save them back into the database.
> As a test I fetch a genbank record, strip the features and convert them to gff. The gff is again converted to features and added to the stripped seq object.
> The test script looks like this:
> ========================================================
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Tools::GFF;
> use Bio::FeatureIO;
> use IO::String;
> use Bio::DB::GenBank;
> 
> use Data::Dumper;
> 
> *Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags;
> 
> my $gff;
> my $gffio = IO::String->new($gff);
> 
> my $db = Bio::DB::GenBank->new;
> my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank');
> my $seq = $db->get_Seq_by_acc('Z50755');
> 
> my @feat = $seq->remove_SeqFeatures;
> 
> # writing option 1
> my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3);
> # writing option 2
> my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3);
> 
> $fout->write_feature(@feat);
> 
> $gffio = IO::String->new($gff);
> 
> my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 'gff', -version => 3);
> 
> while (my $feat = $fin->next_feature)
> {
>  $seq->add_SeqFeature($feat);
> }
> print Data::Dumper->Dump([$seq],['seq']);
> 
> $sout->write_seq($seq);
> ========================================================
> 
> First, I had an issue when writing the features to gff using Bio::FeatureIO (writing option 2):
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: only Bio::SeqFeature::Annotated objects are writeable
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328
> STACK: Bio::FeatureIO::gff::write_feature /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259
> STACK: ./test.pl:25
> -----------------------------------------------------------
> 
> Therefore, I used Bio::Tools::GFF to write (writing option 1). But then, I run into troubles when it comes to dumping the sequence into genbank format:
> Can't locate object method "all_tags" via package "Bio::SeqFeature::Annotated" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm line 212, <GEN1> line 52.
> 
> I tried to fix this by adding the line
> *Bio::SeqFeature::Annotated::all_tags = \*Bio::SeqFeature::Annotated::get_all_tags;
>  
> But in vain:
> Can't locate object method "get_all_tags" via package "Bio::Annotation::Collection" at /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.pm line 547, <GEN1> line 52.
> 
> Regards,
> Marc
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From jason.stajich at duke.edu  Mon Jan 24 21:36:39 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Jan 24 21:33:36 2005
Subject: [Bioperl-l] bioperl-1.5.0 released
Message-ID: <F0EFADC3-6E79-11D9-9143-000393C44276@duke.edu>

Bioperl 1.5.0 Developer's release is available for download.
===============================================

  http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2   
425ac55ecbb4339b7b532ba6d429bb40
  http://bioperl.org/DIST/bioperl-1.5.0.tar.gz     
172472f0675de9a583432e21c9b1b5fc
  http://bioperl.org/DIST/bioperl-1.5.0.zip         
3febcd2445a7393c65981a6f9f13a9ed

We'll update the website to reflect this new release.

The odd-numbered releases are called developer releases and are not 
deposited on CPAN.  Please note that the API in 1.5.0 may change before 
the 1.6.0 release. which will be consider a stable API.  We may do 
another developer release before 1.6.0 goes out.

Lots of people have contributed to this release, I apologize for not 
naming them all.  I'll try to cover some: thanks to Aaron Mackey for 
getting this release started, Brian Osborne for extensive documentation 
improvements, Nathan Haigh for volunteering to make a PPM of the 
release and Barry Moore and Nathan answering many of the windows 
related questions, Allen Day & Scott Cain & Steffen Grossmann for the 
work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for 
the work with Unflattener to merge GenBank annotations into GFF3 
objects.

Please see the AUTHORS file for a complete list of contributors.

Jason Stajich on behalf of the Bioperl developers.


Here is the info from the Changes file.
1.5 Developer release

     o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics
       provide Jukes-Cantor and Kimura pairwise distance methods,
       respectively.

     o Bio::AlignIO support for "po" format of POA, and "maf";
       Bio::AlignIO::largemultifasta is a new alternative to
       Bio::AlignIO::fasta for temporary file-based manipulation of
       particularly large multiple sequence alignments.

     o Bio::Assembly::Singlet allows orphan, unassembled sequences to
       be treated similarly as an assembled contig.

     o Bio::CodonUsage provides new rare_codon() and probable_codons()
       methods for identifying particular codons that encode a given
       amino acid.

     o Bio::Coordinate::Utils provides new from_align() method to build
       a Bio::Coordinate pair directly from a
       Bio::Align::AlignI-conforming object.

     o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils.
       Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's
       web service using standard Pubmed query syntax, and retrieve
       results as XML.

     o Bio::DB::GFF has various sundry bug fixes.

     o Bio::FeatureIO is a new SeqIO-style subsystem for
       writing/reading genomic features to/from files.  I/O classes
       exist for BED, GTF (aka GFF v2.5), and GFF v3.  Bio::FeatureIO
       classes only read/write Bio::SeqFeature::Annotated objects.
       Notably, the GFF v3 class requires features to be typed into the
       Sequence Ontology.

     o Bio::Graph namespace contains new modules for manipulation and
       analysis of protein interaction graphs.

     o Bio::Graphics has many bug fixes and shiny new glyphs.

     o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file
       indexing for HMMER reports and FASTA qual files, respectively.

     o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are
       new objects that can be placed within a Bio::Map::MapI-compliant
       genetic/physical map; Bio::Map::Physical provides a new physical
       map type; Bio::MapIO::fpc provides finger-printed clone mapping
       import.

     o Bio::Matrix::PSM provide new support for postion-specific
       (scoring) matrices (e.g. profiles, or "possums").

     o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now
       be instantiated without explicitly using Bio::OntologyIO.  This
       is possible through changes to Bio::Ontology::OntologyStore to
       download ontology files from the web as necessary.  Locations of
       ontology files are hard-coded into
       Bio::Ontology::DocumentRegistry.

     o Bio::PopGen includes many new methods and data types for
       population genetics analyses.

     o New constructor to Bio::Range, unions().  Given a list of
       ranges, returns another list of "flattened" ranges --
       overlapping ranges are merged into a single range with the
       mininum and maximum coordinates of the entire overlapping group.

     o Bio::Root::IO now supports -url, in addition to -file and -fh.
       The new -url argument allows one to specify the network address
       of a file for input.  -url currently only works for GET
       requests, and thus is read-only.

     o Bio::SearchIO::hmmer now returns individual Hit objects for each
       domain alignment (thus containing only one HSP); previously
       separate alignments would be merged into one hit if the domain
       involved in the alignments was the same, but this only worked
       when the repeated domain occured without interruption by any
       other domain, leading to a confusing mixture of Hit and HSP
       objects.

     o Bio::Search::Result::ResultI-compliant report objects now
       implement the "get_statistics" method to access
       Bio::Search::StatisticsI objects that encapsulate any
       statistical parameters associated with the search (e.g. Karlin's
       lambda for BLAST/FASTA).

     o Bio::Seq::LargeLocatableSeq combines the functionality already
       found in Bio::Seq::LargeSeq and Bio::LocatableSeq.

     o Bio::SeqFeature::Annotated is a replacement for
       Bio::SeqFeature::Generic.  It breaks compliance with the
       Bio::SeqFeatureI interface because the author was sick of
       dealing with untyped annotation tags.  All
       Bio::SeqFeature::Annotated annotations are Bio::AnnotationI
       compliant, and accessible through Bio::Annotation::Collection.

     o Bio::SeqFeature::Primer implements a Tm() method for primer
       melting point predictions.

     o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML,
       InterProScan-XML, TIGR-XML, and NCBI TinySeq formats.

     o Bio::Taxonomy::Node now implements the methods necessary for
       Bio::Species interoperability.

     o Bio::Tools::CodonTable has new reverse_translate_all() and
       make_iupac_string() methods.

     o Bio::Tools::dpAlign now provides sequence profile alignments.

     o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF).

     o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report
       parsers.

     o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl)
       for designing small inhibitory RNA.

     o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building
       methods based on a distance matrix.

     o Bio::Tree::Statistics provides an assess_bootstrap() method to
       calculate bootstrap support values on a guide tree topology,
       based on provided bootstrap tree topologies.

     o Bio::TreeIO now supports the Pagel (PAG) tree format.

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050124/502b810f/PGP.bin
From Marc.Logghe at devgen.com  Tue Jan 25 04:17:29 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Tue Jan 25 04:15:24 2005
Subject: [Bioperl-l] struggling with Bio::FeatureIO and
	Bio::SeqFeature::Annotated
Message-ID: <BEE28BF86078B6429D6C780635718E21905123@morelia.be.devgen.com>

Hi Allen,
Thanks for the fixes !
Like you suggested, I got the tag values when using stringification overload, so that is solved (I don't want to commit that myself though, seems too tricky to me ;-).
What is not so nice is that I loose my splitted features:
     gene            join(8311..8422,8852..8887,8940..9090,9142..9233,
                     9721..9848,10296..10714,10835..10934,11584..11706)
                     /gene="R12H7.1"
     CDS             join(8311..8422,8852..8887,8940..9090,9142..9233,
                     9721..9848,10296..10714,10835..10934,11584..11706)


becomes now:

     gene            8311..8422
                     /note="frame=."
                     /gene="R12H7.1"
     CDS             8311..8422

I tried to solve this issue by using the unflattener, but that did not work out quite well neither :-(
My actual question is now: is there a way, using whatever system, to preserve the split feature structure ? That was actually what I was trying to do in the first place: reconstruct the original feature object starting from gff. Any ideas on that ?


Also, do you think it will be possible to convert the Bio::SeqFeature::Annotated features into persistent ones so that these can be stored in BioSQL ? I'll try to test that out today.
Cheers,
Marc


> -----Original Message-----
> From: Allen Day [mailto:allenday@ucla.edu]
> Sent: Tuesday, January 25, 2005 12:55 AM
> To: Marc Logghe
> Cc: Bioperl (E-mail)
> Subject: Re: [Bioperl-l] struggling with Bio::FeatureIO and
> Bio::SeqFeature::Annotated
> 
> 
> Marc,
> 
> The problem was that Bio::SeqIO::FTHelper was making calls 
> assuming it had 
> a Bio::SeqFeature::Generic instance.  I've updated it to make calls 
> compliant with the Bio::SeqFeatureI interface, and the script 
> below now 
> at least runs using "option 1".
> 
> "option 2" will not work, at least for now, because 
> Bio::DB::GenBank is
> creating a SeqIO that holds Bio::SeqFeature::Generic objects, 
> and these
> difficult to deal with because the internal data structures 
> are different
> than a Bio::SeqFeature::Annotated.  I like the technique used below to
> bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very
> clever.
> 
> You'll also notice that the GenBank-formatted file output by 
> the script 
> doesn't look quite right, the FEATURES section looks kind of like:
> 
> FEATURES             Location/Qualifiers
>      Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975
>                      
> /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)"
>                      
> /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)"
>                      
> /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)"
>                      
> /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
>                      
> /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)"
>                      /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)"
>                      
> /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
>                      
> /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)"
>                      
> /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)"
>                      
> /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)"
>                      
> /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)"
>                      
> /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)"
>                      
> /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)"
>                      
> /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)"
>                      
> /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)"
> 
> because Bio::SeqFeautre::Annotated holds annotations as 
> objects pointers
> rather than strings.  We can fix this with a stringification 
> overload, but
> I noticed that the code exists to do this in the Bio::Annotation::*
> classes but is commented out, and I'm not sure why.  Maybe 
> Hilmar can shed
> some light on this.
> 
> -Allen
> 
> 
> 
> On Mon, 24 Jan 2005, Marc Logghe wrote:
> 
> > Hi all,
> > I have some problems with Bio::FeatureIO and 
> Bio::SeqFeature::Annotated. But maybe these modules are not 
> designed for the things I had in mind.
> > My initial goal seemed pretty straightforward. It turned 
> out differently.
> > I have a gff file containing features of bunch of 
> bioentries sitting in BioSQL.
> > I wanted to turn the gff into feature objects, add them to 
> the bioentries, and save them back into the database.
> > As a test I fetch a genbank record, strip the features and 
> convert them to gff. The gff is again converted to features 
> and added to the stripped seq object.
> > The test script looks like this:
> > ========================================================
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GFF;
> > use Bio::FeatureIO;
> > use IO::String;
> > use Bio::DB::GenBank;
> > 
> > use Data::Dumper;
> > 
> > *Bio::SeqFeature::Annotated::all_tags = 
> \*Bio::SeqFeature::Annotated::get_all_tags;
> > 
> > my $gff;
> > my $gffio = IO::String->new($gff);
> > 
> > my $db = Bio::DB::GenBank->new;
> > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank');
> > my $seq = $db->get_Seq_by_acc('Z50755');
> > 
> > my @feat = $seq->remove_SeqFeatures;
> > 
> > # writing option 1
> > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3);
> > # writing option 2
> > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 
> 'gff', -version => 3);
> > 
> > $fout->write_feature(@feat);
> > 
> > $gffio = IO::String->new($gff);
> > 
> > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 
> 'gff', -version => 3);
> > 
> > while (my $feat = $fin->next_feature)
> > {
> >  $seq->add_SeqFeature($feat);
> > }
> > print Data::Dumper->Dump([$seq],['seq']);
> > 
> > $sout->write_seq($seq);
> > ========================================================
> > 
> > First, I had an issue when writing the features to gff 
> using Bio::FeatureIO (writing option 2):
> > 
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: only Bio::SeqFeature::Annotated objects are writeable
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw 
> /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328
> > STACK: Bio::FeatureIO::gff::write_feature 
> /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259
> > STACK: ./test.pl:25
> > -----------------------------------------------------------
> > 
> > Therefore, I used Bio::Tools::GFF to write (writing option 
> 1). But then, I run into troubles when it comes to dumping 
> the sequence into genbank format:
> > Can't locate object method "all_tags" via package 
> "Bio::SeqFeature::Annotated" at 
> /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm 
> line 212, <GEN1> line 52.
> > 
> > I tried to fix this by adding the line
> > *Bio::SeqFeature::Annotated::all_tags = 
> \*Bio::SeqFeature::Annotated::get_all_tags;
> >  
> > But in vain:
> > Can't locate object method "get_all_tags" via package 
> "Bio::Annotation::Collection" at 
> /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.
> pm line 547, <GEN1> line 52.
> > 
> > Regards,
> > Marc
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 

From allenday at ucla.edu  Tue Jan 25 04:45:28 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue Jan 25 04:41:30 2005
Subject: [Bioperl-l] struggling with Bio::FeatureIO and
	Bio::SeqFeature::Annotated
In-Reply-To: <BEE28BF86078B6429D6C780635718E21905123@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E21905123@morelia.be.devgen.com>
Message-ID: <Pine.LNX.4.58.0501250141240.22458@sumo.ctrl.ucla.edu>

On Tue, 25 Jan 2005, Marc Logghe wrote:

> Hi Allen,
> Thanks for the fixes !

no problem.  let me know if you find more stuff like this, i'm trying to
clean up all the calls to SeqFeatureI inheritors to use the interface
methods rather than subclass-specific methods.

> Like you suggested, I got the tag values when using stringification overload, so that is solved (I don't want to commit that myself though, seems too tricky to me ;-).
> What is not so nice is that I loose my splitted features:
>      gene            join(8311..8422,8852..8887,8940..9090,9142..9233,
>                      9721..9848,10296..10714,10835..10934,11584..11706)
>                      /gene="R12H7.1"
>      CDS             join(8311..8422,8852..8887,8940..9090,9142..9233,
>                      9721..9848,10296..10714,10835..10934,11584..11706)
> 
> 
> becomes now:
> 
>      gene            8311..8422
>                      /note="frame=."
>                      /gene="R12H7.1"
>      CDS             8311..8422
> 
> I tried to solve this issue by using the unflattener, but that did not work out quite well neither :-(
> My actual question is now: is there a way, using whatever system, to preserve the split feature structure ? That was actually what I was trying to do in the first place: reconstruct the original feature object starting from gff. Any ideas on that ?

oh.  i don't know anything about this.  never had to deal with split
locations before.  is this concept equivalent to a GFF3 Target attribute?  
maybe Scott Cain or Chris Mungall have something to say here.  i think
Scott is back from vacation tomorrow.

> 
> Also, do you think it will be possible to convert the Bio::SeqFeature::Annotated features into persistent ones so that these can be stored in BioSQL ? I'll try to test that out today.

no idea.  my guess is not without substantial effort.

-allen

> Cheers,
> Marc
> 
> 
> 
> 
> > -----Original Message-----
> > From: Allen Day [mailto:allenday@ucla.edu]
> > Sent: Tuesday, January 25, 2005 12:55 AM
> > To: Marc Logghe
> > Cc: Bioperl (E-mail)
> > Subject: Re: [Bioperl-l] struggling with Bio::FeatureIO and
> > Bio::SeqFeature::Annotated
> > 
> > 
> > Marc,
> > 
> > The problem was that Bio::SeqIO::FTHelper was making calls 
> > assuming it had 
> > a Bio::SeqFeature::Generic instance.  I've updated it to make calls 
> > compliant with the Bio::SeqFeatureI interface, and the script 
> > below now 
> > at least runs using "option 1".
> > 
> > "option 2" will not work, at least for now, because 
> > Bio::DB::GenBank is
> > creating a SeqIO that holds Bio::SeqFeature::Generic objects, 
> > and these
> > difficult to deal with because the internal data structures 
> > are different
> > than a Bio::SeqFeature::Annotated.  I like the technique used below to
> > bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very
> > clever.
> > 
> > You'll also notice that the GenBank-formatted file output by 
> > the script 
> > doesn't look quite right, the FEATURES section looks kind of like:
> > 
> > FEATURES             Location/Qualifiers
> >      Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975
> >                      
> > /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)"
> >                      
> > /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)"
> >                      
> > /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)"
> >                      
> > /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
> >                      
> > /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)"
> >                      /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)"
> >                      
> > /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
> >                      
> > /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)"
> >                      
> > /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)"
> >                      
> > /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)"
> >                      
> > /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)"
> >                      
> > /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)"
> >                      
> > /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)"
> >                      
> > /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)"
> >                      
> > /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)"
> > 
> > because Bio::SeqFeautre::Annotated holds annotations as 
> > objects pointers
> > rather than strings.  We can fix this with a stringification 
> > overload, but
> > I noticed that the code exists to do this in the Bio::Annotation::*
> > classes but is commented out, and I'm not sure why.  Maybe 
> > Hilmar can shed
> > some light on this.
> > 
> > -Allen
> > 
> > 
> > 
> > On Mon, 24 Jan 2005, Marc Logghe wrote:
> > 
> > > Hi all,
> > > I have some problems with Bio::FeatureIO and 
> > Bio::SeqFeature::Annotated. But maybe these modules are not 
> > designed for the things I had in mind.
> > > My initial goal seemed pretty straightforward. It turned 
> > out differently.
> > > I have a gff file containing features of bunch of 
> > bioentries sitting in BioSQL.
> > > I wanted to turn the gff into feature objects, add them to 
> > the bioentries, and save them back into the database.
> > > As a test I fetch a genbank record, strip the features and 
> > convert them to gff. The gff is again converted to features 
> > and added to the stripped seq object.
> > > The test script looks like this:
> > > ========================================================
> > > #!/usr/bin/perl
> > > use strict;
> > > use Bio::SeqIO;
> > > use Bio::Tools::GFF;
> > > use Bio::FeatureIO;
> > > use IO::String;
> > > use Bio::DB::GenBank;
> > > 
> > > use Data::Dumper;
> > > 
> > > *Bio::SeqFeature::Annotated::all_tags = 
> > \*Bio::SeqFeature::Annotated::get_all_tags;
> > > 
> > > my $gff;
> > > my $gffio = IO::String->new($gff);
> > > 
> > > my $db = Bio::DB::GenBank->new;
> > > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank');
> > > my $seq = $db->get_Seq_by_acc('Z50755');
> > > 
> > > my @feat = $seq->remove_SeqFeatures;
> > > 
> > > # writing option 1
> > > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3);
> > > # writing option 2
> > > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 
> > 'gff', -version => 3);
> > > 
> > > $fout->write_feature(@feat);
> > > 
> > > $gffio = IO::String->new($gff);
> > > 
> > > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 
> > 'gff', -version => 3);
> > > 
> > > while (my $feat = $fin->next_feature)
> > > {
> > >  $seq->add_SeqFeature($feat);
> > > }
> > > print Data::Dumper->Dump([$seq],['seq']);
> > > 
> > > $sout->write_seq($seq);
> > > ========================================================
> > > 
> > > First, I had an issue when writing the features to gff 
> > using Bio::FeatureIO (writing option 2):
> > > 
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: only Bio::SeqFeature::Annotated objects are writeable
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw 
> > /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328
> > > STACK: Bio::FeatureIO::gff::write_feature 
> > /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259
> > > STACK: ./test.pl:25
> > > -----------------------------------------------------------
> > > 
> > > Therefore, I used Bio::Tools::GFF to write (writing option 
> > 1). But then, I run into troubles when it comes to dumping 
> > the sequence into genbank format:
> > > Can't locate object method "all_tags" via package 
> > "Bio::SeqFeature::Annotated" at 
> > /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm 
> > line 212, <GEN1> line 52.
> > > 
> > > I tried to fix this by adding the line
> > > *Bio::SeqFeature::Annotated::all_tags = 
> > \*Bio::SeqFeature::Annotated::get_all_tags;
> > >  
> > > But in vain:
> > > Can't locate object method "get_all_tags" via package 
> > "Bio::Annotation::Collection" at 
> > /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.
> > pm line 547, <GEN1> line 52.
> > > 
> > > Regards,
> > > Marc
> > > 
> > > 
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > 
> > 
> 
From jrm at compbio.dundee.ac.uk  Tue Jan 25 06:52:58 2005
From: jrm at compbio.dundee.ac.uk (Jon manning)
Date: Tue Jan 25 06:52:55 2005
Subject: [Bioperl-l] Bio::Seq objects from BLAST hits
Message-ID: <1106653979.3777.8.camel@tick.compbio.dundee.ac.uk>

Hi all,

I have previously used Bio::DB::SwissProt to retrieve sequence objects
using accession numbers derived from a BLAST search, which I
subsequently aligned. Now I'm using BLAST to search the PDB (though I'm
only interested in sequences), so don't have Bio::DB module for that.
What would be the best way to derive a Bio::Seq object from the
Bio::Hit::BlastHit objects?

Thanks,

Jon 

From nathanhaigh at ukonline.co.uk  Tue Jan 25 07:42:19 2005
From: nathanhaigh at ukonline.co.uk (Nathan Spencer Haigh)
Date: Tue Jan 25 07:38:24 2005
Subject: [Bioperl-l] Bioperl CVS release differences
In-Reply-To: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop>
References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop>
Message-ID: <1106656939.41f63eab97590@webmail.ukonline.net>

I was wondering if it is at all possible to do the following with cvs:
I would like to obtain a copy of all the files that are new/changed in
bioperl-1.5 compared to the 1.4 release. The reason i want to do this is that
i'd like to package up a perl program with (some of) these files so i only need
to request that bioperl-1.4 be installed on the clients computer.

Thanks
Nathan

----------------------------------------------
This mail sent through http://www.ukonline.net
From cjfields at uiuc.edu  Tue Jan 25 09:40:17 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue Jan 25 09:38:17 2005
Subject: [Fwd: Re: [Bioperl-l] bioperl-run for windows?]
In-Reply-To: <41F5743F.10201@genetics.utah.edu>
References: <41F5743F.10201@genetics.utah.edu>
Message-ID: <6.1.1.1.2.20050125083723.01a6d358@express.cites.uiuc.edu>

There is an EMBOSS release for Windows, believe it or not.  It is currently 
at v. 2.7.1 and can be found at:

http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html

I have no idea if it will work with Bioperl, though.  Might be interesting 
to try at some point.

Chris

At 04:18 PM 1/24/2005, you wrote:
>Some will work, and some won't. I've installed it on Windows, and used it
>a bit there. One problem you'll run into is that you can't use bioperl
>to run a program that can't be installed on Windows (EMBOSS for example) 
>so you'll
>be limited that way, but check out the Pise interface for any of that
>software. You should be able to get access to alot of non-windows
>software via bioperl by using the Pise interface (
>
>Bio::Tools::Run::AnalysisFactory::Pise
>
>http://www.pasteur.fr/recherche/unites/sis/Pise/
>
>
>Barry
>
>Tim Alcon wrote:
>
> > If I just grab it off CPAN, will it work on Windows, or does it use
> > Unix system calls?
> >
> > Tim
> >
> >
> >
> > Jason Stajich wrote:
> >
> >> Is there a PPM on the bioperl site?
> >> No
> >>
> >> Can you install bioperl-run on windows?
> >> Yes - but you'll have to do it manually, or learn how to build PPMs
> >> (quite simple really), or encourage someone to produce a PPM for
> >> bioperl-run.
> >>
> >> -jason
> >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote:
> >>
> >>> Does a Windows version of bioperl-run exitst? If so, how do I get it?
> >>>
> >>> Tim
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >> --
> >> Jason Stajich
> >> jason.stajich at duke.edu
> >> http://www.duke.edu/~jes12/
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>--
>Barry Moore
>Dept. of Human Genetics
>University of Utah
>Salt Lake City, UT
>
>
>
>--
>Barry Moore
>Dept. of Human Genetics
>University of Utah
>Salt Lake City, UT
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

From palmeida at igc.gulbenkian.pt  Tue Jan 25 12:20:14 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Jan 25 12:17:30 2005
Subject: [Bioperl-l] Bioperl CVS release differences
In-Reply-To: <1106656939.41f63eab97590@webmail.ukonline.net>
References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop>
	<1106656939.41f63eab97590@webmail.ukonline.net>
Message-ID: <20050125172014.GA6071@bioinf.igc.gulbenkian.pt>

I don't know if it's possible with CVS, but you could do something like:

diff -rq ~/Test/bioperl-1.5.0-RC2/Bio /usr/share/perl5/Bio

(where those directories are the location of BioPerl 1.4 and 1.5) and
feed the output to a script that copies the files that are new, or
different, to a new directory.

-Paulo

On Tue, Jan 25, 2005 at 12:42:19PM +0000, Nathan Spencer Haigh wrote:
> I was wondering if it is at all possible to do the following with cvs:
> I would like to obtain a copy of all the files that are new/changed in
> bioperl-1.5 compared to the 1.4 release. The reason i want to do this is that
> i'd like to package up a perl program with (some of) these files so i only need
> to request that bioperl-1.4 be installed on the clients computer.
> 
> Thanks
> Nathan

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From palmeida at igc.gulbenkian.pt  Tue Jan 25 12:20:14 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Jan 25 12:17:51 2005
Subject: [Bioperl-l] Bioperl CVS release differences
In-Reply-To: <1106656939.41f63eab97590@webmail.ukonline.net>
References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop>
	<1106656939.41f63eab97590@webmail.ukonline.net>
Message-ID: <20050125172014.GA6071@bioinf.igc.gulbenkian.pt>

I don't know if it's possible with CVS, but you could do something like:

diff -rq ~/Test/bioperl-1.5.0-RC2/Bio /usr/share/perl5/Bio

(where those directories are the location of BioPerl 1.4 and 1.5) and
feed the output to a script that copies the files that are new, or
different, to a new directory.

-Paulo

On Tue, Jan 25, 2005 at 12:42:19PM +0000, Nathan Spencer Haigh wrote:
> I was wondering if it is at all possible to do the following with cvs:
> I would like to obtain a copy of all the files that are new/changed in
> bioperl-1.5 compared to the 1.4 release. The reason i want to do this is that
> i'd like to package up a perl program with (some of) these files so i only need
> to request that bioperl-1.4 be installed on the clients computer.
> 
> Thanks
> Nathan

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From garrettsorensen at gmail.com  Tue Jan 25 12:45:35 2005
From: garrettsorensen at gmail.com (Garrett Sorensen)
Date: Tue Jan 25 12:41:29 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
Message-ID: <d8fb9af905012509451df2406e@mail.gmail.com>

Hello,  I'm new to the mailing list.  Thanks in advance for any help.

I'm really stumped by the following error when running restriction
analysis on large numbers of seq objects.  This only occurs sometimes
when dealing with large numbers of sequences.


------------- EXCEPTION  -------------
MSG: Bad start,end parameters. Start [2002] has to be less than end [2001]
STACK Bio::PrimarySeq::subseq
/home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
STACK Bio::Seq::subseq
/home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
STACK Bio::Restriction::Analysis::fragment_maps
/home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/Analysis.pm:552
STACK toplevel Restriction_analyser_multi_CpG_a.pl:182


Incase it helps here is what my program is doing:
-Reads in multiple fasta sequences (~7kb average size) on at a time
and creates a SeqIO object for each.
-Restriction sites for a particular enzyme are determined for each
SeqIO object and then a fragment 2kb in size is created around that
site.
-A new Seq object is created using the above fragment using
"$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);"
-This new Seq object is fed into Restriction Analysis to generate
fragments for another enzyme.

Thanks so much for any help, best regards,
Garrett
From jason.stajich at duke.edu  Tue Jan 25 15:21:01 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jan 25 15:17:11 2005
Subject: [Bioperl-l] bioperl-1.5.0 released
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAC5SfTUrEgkK7rzGlpAdpOgEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAC5SfTUrEgkK7rzGlpAdpOgEAAAAA@ukonline.co.uk>
Message-ID: <A1D74A2E-6F0E-11D9-90EA-000393C44276@duke.edu>

I just don't have the time to do this right now. It has not really been 
tested for all tests passing.

if someone else wants to volunteer to work on validating it for release 
that would be great.

-jason
On Jan 25, 2005, at 12:59 PM, Nathan Haigh wrote:

> Will there also be 1.5 releases for bioperl-run etc?
>
> Nathan
>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org 
>> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason 
>> Stajich
>> Sent: 25 January 2005 02:37
>> To: 'bioperl-l@bioperl.org' List; bioperl-announce-l@bioperl.org
>> Subject: [Bioperl-l] bioperl-1.5.0 released
>>
>> Bioperl 1.5.0 Developer's release is available for download.
>> ===============================================
>>
>>   http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2
>> 425ac55ecbb4339b7b532ba6d429bb40
>>   http://bioperl.org/DIST/bioperl-1.5.0.tar.gz
>> 172472f0675de9a583432e21c9b1b5fc
>>   http://bioperl.org/DIST/bioperl-1.5.0.zip
>> 3febcd2445a7393c65981a6f9f13a9ed
>>
>> We'll update the website to reflect this new release.
>>
>> The odd-numbered releases are called developer releases and are not
>> deposited on CPAN.  Please note that the API in 1.5.0 may change 
>> before
>> the 1.6.0 release. which will be consider a stable API.  We may do
>> another developer release before 1.6.0 goes out.
>>
>> Lots of people have contributed to this release, I apologize for not
>> naming them all.  I'll try to cover some: thanks to Aaron Mackey for
>> getting this release started, Brian Osborne for extensive 
>> documentation
>> improvements, Nathan Haigh for volunteering to make a PPM of the
>> release and Barry Moore and Nathan answering many of the windows
>> related questions, Allen Day & Scott Cain & Steffen Grossmann for the
>> work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for
>> the work with Unflattener to merge GenBank annotations into GFF3
>> objects.
>>
>> Please see the AUTHORS file for a complete list of contributors.
>>
>> Jason Stajich on behalf of the Bioperl developers.
>>
>>
>> Here is the info from the Changes file.
>> 1.5 Developer release
>>
>>      o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics
>>        provide Jukes-Cantor and Kimura pairwise distance methods,
>>        respectively.
>>
>>      o Bio::AlignIO support for "po" format of POA, and "maf";
>>        Bio::AlignIO::largemultifasta is a new alternative to
>>        Bio::AlignIO::fasta for temporary file-based manipulation of
>>        particularly large multiple sequence alignments.
>>
>>      o Bio::Assembly::Singlet allows orphan, unassembled sequences to
>>        be treated similarly as an assembled contig.
>>
>>      o Bio::CodonUsage provides new rare_codon() and probable_codons()
>>        methods for identifying particular codons that encode a given
>>        amino acid.
>>
>>      o Bio::Coordinate::Utils provides new from_align() method to 
>> build
>>        a Bio::Coordinate pair directly from a
>>        Bio::Align::AlignI-conforming object.
>>
>>      o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils.
>>        Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's
>>        web service using standard Pubmed query syntax, and retrieve
>>        results as XML.
>>
>>      o Bio::DB::GFF has various sundry bug fixes.
>>
>>      o Bio::FeatureIO is a new SeqIO-style subsystem for
>>        writing/reading genomic features to/from files.  I/O classes
>>        exist for BED, GTF (aka GFF v2.5), and GFF v3.  Bio::FeatureIO
>>        classes only read/write Bio::SeqFeature::Annotated objects.
>>        Notably, the GFF v3 class requires features to be typed into 
>> the
>>        Sequence Ontology.
>>
>>      o Bio::Graph namespace contains new modules for manipulation and
>>        analysis of protein interaction graphs.
>>
>>      o Bio::Graphics has many bug fixes and shiny new glyphs.
>>
>>      o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file
>>        indexing for HMMER reports and FASTA qual files, respectively.
>>
>>      o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are
>>        new objects that can be placed within a 
>> Bio::Map::MapI-compliant
>>        genetic/physical map; Bio::Map::Physical provides a new 
>> physical
>>        map type; Bio::MapIO::fpc provides finger-printed clone mapping
>>        import.
>>
>>      o Bio::Matrix::PSM provide new support for postion-specific
>>        (scoring) matrices (e.g. profiles, or "possums").
>>
>>      o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now
>>        be instantiated without explicitly using Bio::OntologyIO.  This
>>        is possible through changes to Bio::Ontology::OntologyStore to
>>        download ontology files from the web as necessary.  Locations 
>> of
>>        ontology files are hard-coded into
>>        Bio::Ontology::DocumentRegistry.
>>
>>      o Bio::PopGen includes many new methods and data types for
>>        population genetics analyses.
>>
>>      o New constructor to Bio::Range, unions().  Given a list of
>>        ranges, returns another list of "flattened" ranges --
>>        overlapping ranges are merged into a single range with the
>>        mininum and maximum coordinates of the entire overlapping 
>> group.
>>
>>      o Bio::Root::IO now supports -url, in addition to -file and -fh.
>>        The new -url argument allows one to specify the network address
>>        of a file for input.  -url currently only works for GET
>>        requests, and thus is read-only.
>>
>>      o Bio::SearchIO::hmmer now returns individual Hit objects for 
>> each
>>        domain alignment (thus containing only one HSP); previously
>>        separate alignments would be merged into one hit if the domain
>>        involved in the alignments was the same, but this only worked
>>        when the repeated domain occured without interruption by any
>>        other domain, leading to a confusing mixture of Hit and HSP
>>        objects.
>>
>>      o Bio::Search::Result::ResultI-compliant report objects now
>>        implement the "get_statistics" method to access
>>        Bio::Search::StatisticsI objects that encapsulate any
>>        statistical parameters associated with the search (e.g. 
>> Karlin's
>>        lambda for BLAST/FASTA).
>>
>>      o Bio::Seq::LargeLocatableSeq combines the functionality already
>>        found in Bio::Seq::LargeSeq and Bio::LocatableSeq.
>>
>>      o Bio::SeqFeature::Annotated is a replacement for
>>        Bio::SeqFeature::Generic.  It breaks compliance with the
>>        Bio::SeqFeatureI interface because the author was sick of
>>        dealing with untyped annotation tags.  All
>>        Bio::SeqFeature::Annotated annotations are Bio::AnnotationI
>>        compliant, and accessible through Bio::Annotation::Collection.
>>
>>      o Bio::SeqFeature::Primer implements a Tm() method for primer
>>        melting point predictions.
>>
>>      o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML,
>>        InterProScan-XML, TIGR-XML, and NCBI TinySeq formats.
>>
>>      o Bio::Taxonomy::Node now implements the methods necessary for
>>        Bio::Species interoperability.
>>
>>      o Bio::Tools::CodonTable has new reverse_translate_all() and
>>        make_iupac_string() methods.
>>
>>      o Bio::Tools::dpAlign now provides sequence profile alignments.
>>
>>      o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF).
>>
>>      o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report
>>        parsers.
>>
>>      o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl)
>>        for designing small inhibitory RNA.
>>
>>      o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building
>>        methods based on a distance matrix.
>>
>>      o Bio::Tree::Statistics provides an assess_bootstrap() method to
>>        calculate bootstrap support values on a guide tree topology,
>>        based on provided bootstrap tree topologies.
>>
>>      o Bio::TreeIO now supports the Pagel (PAG) tree format.
>>
>> --
>> Jason Stajich
>> jason.stajich at duke.edu
>> http://www.duke.edu/~jes12/
>> ---
>> avast! Antivirus: Inbound message clean.
>> Virus Database (VPS): 0503-2, 21/01/2005
>> Tested on: 25/01/2005 17:41:57
>> avast! is copyright (c) 2000-2003 ALWIL Software.
>> http://www.avast.com
>>
>>
>>
>>
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0504-0, 25/01/2005
> Tested on: 25/01/2005 17:59:00
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Tue Jan 25 16:09:50 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Jan 25 16:06:55 2005
Subject: [Bioperl-l] Bioperl CVS release differences
In-Reply-To: <1106656939.41f63eab97590@webmail.ukonline.net>
References: <000d01c4b76c$5bb3fd90$3cf4cdd9@Desktop>
	<1106656939.41f63eab97590@webmail.ukonline.net>
Message-ID: <7379A361-6F15-11D9-90EA-000393C44276@duke.edu>

$ cvs -dYADDAYADDA co -r bioperl-release-1-5-0 -d bioperl-1.5.0 
bioperl-live
$ cd bioperl-1.5.0

  -- see what is different from the 1.4 branch (this includes bugs fixes 
that were made on that branch when we thought we were going to release 
a 1.4.1)
$ cvs diff -r branch-1.4

  -- see what is difference since the 1.4.0 release
$ cvs diff -r bioperl-release-1-4-0

The FeatureIO and SeqFeature::Annotated is a BIG difference between 1.4 
and may not necessarily be part of the stable 1.6.0 depending on the 
backwards compatibility and different views on how to develop.

-jason

On Jan 25, 2005, at 7:42 AM, Nathan Spencer Haigh wrote:

> I was wondering if it is at all possible to do the following with cvs:
> I would like to obtain a copy of all the files that are new/changed in
> bioperl-1.5 compared to the 1.4 release. The reason i want to do this 
> is that
> i'd like to package up a perl program with (some of) these files so i 
> only need
> to request that bioperl-1.4 be installed on the clients computer.
>
> Thanks
> Nathan
>
> ----------------------------------------------
> This mail sent through http://www.ukonline.net
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From e.mugerwa at mtc.com.bh  Tue Jan 25 09:37:23 2005
From: e.mugerwa at mtc.com.bh (Edward Mugerwa Buyondo)
Date: Tue Jan 25 17:34:41 2005
Subject: [Bioperl-l] Volunteers needed !!
Message-ID: <41F659A3.5060308@mtc.com.bh>

Dear bperl;

Iam interested in volunteering in testing and otherwise.

Edward Mugerwa
Manama,
Bahrain


             -----------------------------Disclaimer------------------------------

This communication is intended for the above named person and is confidential and / or legally  privileged.
Any opinion(s) expressed in this communication are not necessarily those of the MTC Vodafone Bahrain. If it
has come to you in error you must take no action based upon it, nor must you print it, copy it, forward it,
or show it to anyone. Please delete and destroy the e-mail and any attachments and inform the sender 
immediately.  Thank you.

MTC Vodafone Bahrain is not responsible for the political, religious, racial or partisan opinion in any
correspondence conducted by its domain users. Therefore, any such opinion expressed, whether explicitly or
implicitly, in any said correspondence is not to be interpreted as that of MTC Vodafone Bahrain.

MTC Vodafone Bahrain may monitor all incoming and outgoing e-mails in line with MTC Vodafone Bahrain business
practice. Although MTC Vodafone Bahrain has taken steps to ensure that e-mails and attachments are free  from
any virus, we advise that, in keeping with best business practice, the recipient must ensure they are actually
virus free.

             ---------------------------------------------------------------------- 

From florian.iragne at labri.fr  Tue Jan 25 13:17:04 2005
From: florian.iragne at labri.fr (Florian)
Date: Tue Jan 25 17:35:17 2005
Subject: [Bioperl-l] bug in bl2seq parser?
Message-ID: <41F68D20.8010708@labri.fr>

Hello everybody,

i've searched in the archives to find the solution to my problem, and 
couldn't find a solution, so i post...

ok, here is the part of my code that doesn't work:
######################################################################
my $bl2temp = "/tmp/bl2seq.$$.out";
     use Bio::Tools::Run::StandAloneBlast;
     my $factory = Bio::Tools::Run::StandAloneBlast->new( 'outfile' => 
"$bl2temp",
							 'program' => 'blastp',
							 'REPORT_TYPE' => 'BLASTP' );

     my $bl2 = $factory->bl2seq( $seqaa1, $seqaa2 );

     my $str = Bio::AlignIO->new( '-file'   => "$bl2temp",
				 '-format' => 'bl2seq',
				 '-report_type' => 'blastp' );
     my $aln = $str->next_aln();
#######################################################################

the program crash on the line : "my $aln = $str->next_aln();" with the 
following message :
Can't call method "querySeq" on an undefined value at 
/usr/lib/perl5/site_perl/5.8.0/Bio/AlignIO/bl2seq.pm line 137

This error happens each time the alignment between my 2 sequences is not 
possible. I can't figure out how to test this case, since the script 
crash on the method that is supposed to get the alignment.

I expected that this method ( next_align() ) would return an empty 
object if there is no alignment, but it seems not to be the case.

Does anybody have a solution for this kind of problem?

thanks

Florian
From lstein at cshl.edu  Tue Jan 25 16:18:11 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue Jan 25 17:35:32 2005
Subject: [Bioperl-l] help on large sequence with Bio::Index::Fasta!
In-Reply-To: <20050124140233.49ac48b6@dogwood.plantbio.uga.edu>
References: <20050124140233.49ac48b6@dogwood.plantbio.uga.edu>
Message-ID: <200501251618.12231.lstein@cshl.edu>

As far as I know Bio::Index::Fasta works fine with large sequences.  
I've used it with worm chromosomes up to 20 MB.  You might try 
Bio::DB::Fasta in a pinch, since it stores the data differently.

Lincoln

On Monday 24 January 2005 02:02 pm, Guojun Yang wrote:
> Hi, everybody,
> I got another difficult situation:
> I am running a local blast and sequence retrieval. The following
> sub works OK for one of my local DB1, but not for my local DB2. DB1
> contains sequences of PACs and BACs (I believe the average size is
> ~100 or 200 kb), but DB2 contains entries of contigs as large as
> 30Mb. The error says the $seq object is undefined! I believe the
> problem is the size of the large entries in DB2. Can we use
> LargeSeq when we do retrieval? Can anybody help me on how we can
> use it with Bio::Index::Fasta?. Thank you for your comments in
> advance! Yang
>
>
>
> sub getseq {
> my $id=$_[0];
> my $file_name = $_[1];
> my $inx=Bio::Index::Fasta->new (-filename => $file_name.".idx",
>                                 -write_flag => 1);
> $inx->id_parser(\&get_id);
> $inx->make_index($file_name);
> $seq = $inx->fetch($id);
> return $seq;
> 	    }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050125/b53329ac/attachment.bin
From barry.moore at genetics.utah.edu  Tue Jan 25 18:15:52 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Tue Jan 25 18:13:15 2005
Subject: [Bioperl-l] load_seqdatabase.pl running SLOW!
Message-ID: <41F6D328.7090402@genetics.utah.edu>

Hilmar (or others)-

I've set up a biosql based database using PostgreSQL 7.2 on a PC with an 
Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus.  1 GB of RAM, and 
Linux (2.2 kernel - Debian woody distro).  Onto that I am loading 
~352,000 sequences from RefSeq complete rna collection using 
load_seqdatabase.pl.  It's running kind of slow - loding on average 
about 1 sequence every 2-5 seconds.  In the archives I've read your 
comments to a previous question like this suggesting two fast 
processors, a couple gigs of memory and 2-3 drives to really make things 
fly and while my system isn't that good, it seems like I should be doing 
better.  I got to experimenting on another (slower) system while waiting 
for things to load, and found that running the same script to load the 
same file goes about 3X faster on a 266MHz Intel processor with 192 Mb 
RAM.  Same installation of PostgreSQL (both installed from deb package 
with defaults), and same installation of Debian Linux (except that the 
kernel on the older slow machine has been updated to 2.4)  Another 
difference I noticed between the two is that the old 266 MHz machine is 
using about 75% CPU resources for perl and about 25% for postmaster 
whereas the faster 3 GHz machine (but slower running 
load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster 
and about 3% for perl.  Both systems are using up most of their memory, 
but little to no swap.  Could the kernel upgrade really be making the 
difference?  Any thoughts?  As it's going now I can wait over a week for 
all these sequences to load, or build the database on our dinosaur 
server in a couple of days and dump it across to our sexy new 3 GHz 
server.  Talk about bass ackwards!

Barry

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From smarkel at scitegic.com  Tue Jan 25 18:25:44 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Tue Jan 25 18:22:27 2005
Subject: [Fwd: Re: [Bioperl-l] bioperl-run for windows?]
In-Reply-To: <6.1.1.1.2.20050125083723.01a6d358@express.cites.uiuc.edu>
References: <41F5743F.10201@genetics.utah.edu>
	<6.1.1.1.2.20050125083723.01a6d358@express.cites.uiuc.edu>
Message-ID: <41F6D578.4010206@scitegic.com>

Chris,

I use EMBOSSwin with BioPerl.  Mostly it runs fine.  Two
things to watch out for.  The first is the use of /dev/null
for stderr.  The second is that Bio::Factory::EMBOSS::_program_list
specifically fails if the OS is MSWin or Mac.

Scott

Chris Fields wrote:

> There is an EMBOSS release for Windows, believe it or not.  It is 
> currently at v. 2.7.1 and can be found at:
> 
> http://perso.wanadoo.fr/ablavier/embosswin/embosswin.html
> 
> I have no idea if it will work with Bioperl, though.  Might be 
> interesting to try at some point.
> 
> Chris
> 
> At 04:18 PM 1/24/2005, you wrote:
> 
>> Some will work, and some won't. I've installed it on Windows, and used it
>> a bit there. One problem you'll run into is that you can't use bioperl
>> to run a program that can't be installed on Windows (EMBOSS for 
>> example) so you'll
>> be limited that way, but check out the Pise interface for any of that
>> software. You should be able to get access to alot of non-windows
>> software via bioperl by using the Pise interface (
>>
>> Bio::Tools::Run::AnalysisFactory::Pise
>>
>> http://www.pasteur.fr/recherche/unites/sis/Pise/
>>
>>
>> Barry
>>
>> Tim Alcon wrote:
>>
>> > If I just grab it off CPAN, will it work on Windows, or does it use
>> > Unix system calls?
>> >
>> > Tim
>> >
>> >
>> >
>> > Jason Stajich wrote:
>> >
>> >> Is there a PPM on the bioperl site?
>> >> No
>> >>
>> >> Can you install bioperl-run on windows?
>> >> Yes - but you'll have to do it manually, or learn how to build PPMs
>> >> (quite simple really), or encourage someone to produce a PPM for
>> >> bioperl-run.
>> >>
>> >> -jason
>> >> On Jan 22, 2005, at 7:57 PM, Tim Alcon wrote:
>> >>
>> >>> Does a Windows version of bioperl-run exitst? If so, how do I get it?
>> >>>
>> >>> Tim
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l@portal.open-bio.org
>> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> >>>
>> >>>
>> >> --
>> >> Jason Stajich
>> >> jason.stajich at duke.edu
>> >> http://www.duke.edu/~jes12/
>> >>
>> >>
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l@portal.open-bio.org
>> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> -- 
>> Barry Moore
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT
>>
>>
>>
>> -- 
>> Barry Moore
>> Dept. of Human Genetics
>> University of Utah
>> Salt Lake City, UT
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> __________________________________
> 
> Chris Fields - Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> 
> Address:
> 
> University of Illinois at Urbana-Champaign
> Dept. of Biochemistry - 323 RAL
> 600 S. Mathews Ave.
> Urbana, IL 61801
> 
> Phone : (217) 333-7098
> Fax : (217) 244-5858
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com

From hlapp at gnf.org  Tue Jan 25 20:04:49 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Tue Jan 25 20:00:54 2005
Subject: [Bioperl-l] RE: load_seqdatabase.pl running SLOW!
Message-ID: <BB3BF516F698804298FE224B1F7A39814942C3@EXCHCLUSTER01.lj.gnf.org>

To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do).
 
I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ...
 
Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed.
 
Not sure this is very helpful ...
 
 -hilmar

	-----Original Message----- 
	From: Barry Moore [mailto:barry.moore@genetics.utah.edu] 
	Sent: Tue 1/25/2005 3:15 PM 
	To: Bioperl list; Hilmar Lapp 
	Cc: 
	Subject: load_seqdatabase.pl running SLOW!
	
	
	Hilmar (or others)-
	
	I've set up a biosql based database using PostgreSQL 7.2 on a PC with an
	Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus.  1 GB of RAM, and
	Linux (2.2 kernel - Debian woody distro).  Onto that I am loading
	~352,000 sequences from RefSeq complete rna collection using
	load_seqdatabase.pl.  It's running kind of slow - loding on average
	about 1 sequence every 2-5 seconds.  In the archives I've read your
	comments to a previous question like this suggesting two fast
	processors, a couple gigs of memory and 2-3 drives to really make things
	fly and while my system isn't that good, it seems like I should be doing
	better.  I got to experimenting on another (slower) system while waiting
	for things to load, and found that running the same script to load the
	same file goes about 3X faster on a 266MHz Intel processor with 192 Mb
	RAM.  Same installation of PostgreSQL (both installed from deb package
	with defaults), and same installation of Debian Linux (except that the
	kernel on the older slow machine has been updated to 2.4)  Another
	difference I noticed between the two is that the old 266 MHz machine is
	using about 75% CPU resources for perl and about 25% for postmaster
	whereas the faster 3 GHz machine (but slower running
	load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster
	and about 3% for perl.  Both systems are using up most of their memory,
	but little to no swap.  Could the kernel upgrade really be making the
	difference?  Any thoughts?  As it's going now I can wait over a week for
	all these sequences to load, or build the database on our dinosaur
	server in a couple of days and dump it across to our sexy new 3 GHz
	server.  Talk about bass ackwards!
	
	Barry
	
	--
	Barry Moore
	Dept. of Human Genetics
	University of Utah
	Salt Lake City, UT
	
	
From rob at salmonella.org  Tue Jan 25 20:20:47 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Jan 25 20:17:53 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
In-Reply-To: <d8fb9af905012509451df2406e@mail.gmail.com>
References: <d8fb9af905012509451df2406e@mail.gmail.com>
Message-ID: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>

It is hard to locate the exact error without more information. The  
error is caused because at some point you are trying to get a sequence  
that starts at position 2002, but the sequence is only 2001 nt long  
(hence the error that 2002 must be < 2001). I would suggest that the  
error is at some point where you are taking the 2kb fragment around the  
site. The most obvious thing to start with is what are the start/end  
coordinates that are called immediately before the error, and do they  
make sense given the length of the sequence?

Rob


On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote:

> Hello,  I'm new to the mailing list.  Thanks in advance for any help.
>
> I'm really stumped by the following error when running restriction
> analysis on large numbers of seq objects.  This only occurs sometimes
> when dealing with large numbers of sequences.
>
>
> ------------- EXCEPTION  -------------
> MSG: Bad start,end parameters. Start [2002] has to be less than end  
> [2001]
> STACK Bio::PrimarySeq::subseq
> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
> STACK Bio::Seq::subseq
> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
> STACK Bio::Restriction::Analysis::fragment_maps
> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/ 
> Analysis.pm:552
> STACK toplevel Restriction_analyser_multi_CpG_a.pl:182
>
>
> Incase it helps here is what my program is doing:
> -Reads in multiple fasta sequences (~7kb average size) on at a time
> and creates a SeqIO object for each.
> -Restriction sites for a particular enzyme are determined for each
> SeqIO object and then a fragment 2kb in size is created around that
> site.
> -A new Seq object is created using the above fragment using
> "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);"
> -This new Seq object is fed into Restriction Analysis to generate
> fragments for another enzyme.
>
> Thanks so much for any help, best regards,
> Garrett
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From billk at iinet.net.au  Tue Jan 25 20:40:50 2005
From: billk at iinet.net.au (William Kenworthy)
Date: Tue Jan 25 20:36:47 2005
Subject: [Bioperl-l] Something I am confused about and have not seen
	explained in the docs:
Message-ID: <1106703650.19499.2.camel@rattus.Localdomain>

Something I am confused about and have not seen explained in the docs:

Is bioperl-run complimentary, a subset of or a self-contained package of
separate functions compared to bioperl?  And what is, and how does
bioperl-live fit into this picture?

BillK

-- 
William Kenworthy <billk@iinet.net.au>
Home!

From jiangs at mail.nih.gov  Tue Jan 25 20:57:09 2005
From: jiangs at mail.nih.gov (Jiang, Shan (NIH/NCI))
Date: Tue Jan 25 20:53:02 2005
Subject: [Bioperl-l] BIOperl release plan for 2005?
Message-ID: <16A0583FB1644E4DB8C0A0265028B6FDFC9196@nihexchange13.nih.gov>


Hi! Is there a release schedule for BIOperl for 2005? Can someone also give
me an overview on how far ahead I should have the code completed in order
for it to make a certain release?

I am planning on contributing code to BIOperl to integrate with the Perl
version of caBIO, a cancer bioinformatices application here at the National
Cancer Institue Center for Bioinformatics (NCICB) in NIH
(http://ncicb.nci.nih.gov/core/caBIO).

Thanks a lot for your help!
Shan Jiang
(Contractor)

From garrettsorensen at gmail.com  Tue Jan 25 21:12:37 2005
From: garrettsorensen at gmail.com (Garrett Sorensen)
Date: Tue Jan 25 21:08:49 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
In-Reply-To: <820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>
References: <d8fb9af905012509451df2406e@mail.gmail.com>
	<820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>
Message-ID: <d8fb9af90501251812672aba54@mail.gmail.com>

Thanks for the suggestion Rob but still having issues.  I've boiled
the code down to just reading in fasta sequences and digesting the
whole sequences - opposed to digesting a subsequence as done
initially.  It digests a few hundred sequences without issue and then
runs into the same error, but reports different corrdinates of course.

If it is only reading in a fasta sequence and digesting it how is it
calculating wrong start/end coordinates for itself?  And the fact that
it will work great on many sequences but then calculates wrong
coordinates for one seems strange...  Any ideas?

I've tried feeding Restriction::Analysis both a SeqIO object and a
PrimarySeq with the same result.

Here is the code:

use strict;
use Bio::SeqIO;
use Bio::PrimarySeq;
use Bio::Restriction::Analysis;
use Bio::PrimarySeq;
use Data::Dumper;

my $in  = Bio::SeqIO->new(-file => "$file",  -format => 'fasta');
while ( my $seq = $in->next_seq ) {
    my $ra=Bio::Restriction::Analysis->new(-seq=>$seq);
    my @fragments = $ra->fragments('NlaIII');
    print join ("\n\n", @fragments);
    }
exit


Thanks for any help or suggestions, best regards,
Garrett


On Tue, 25 Jan 2005 17:20:47 -0800, Rob Edwards <rob@salmonella.org> wrote:
> It is hard to locate the exact error without more information. The
> error is caused because at some point you are trying to get a sequence
> that starts at position 2002, but the sequence is only 2001 nt long
> (hence the error that 2002 must be < 2001). I would suggest that the
> error is at some point where you are taking the 2kb fragment around the
> site. The most obvious thing to start with is what are the start/end
> coordinates that are called immediately before the error, and do they
> make sense given the length of the sequence?
> 
> Rob
> 
> On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote:
> 
> > Hello,  I'm new to the mailing list.  Thanks in advance for any help.
> >
> > I'm really stumped by the following error when running restriction
> > analysis on large numbers of seq objects.  This only occurs sometimes
> > when dealing with large numbers of sequences.
> >
> >
> > ------------- EXCEPTION  -------------
> > MSG: Bad start,end parameters. Start [2002] has to be less than end
> > [2001]
> > STACK Bio::PrimarySeq::subseq
> > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
> > STACK Bio::Seq::subseq
> > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
> > STACK Bio::Restriction::Analysis::fragment_maps
> > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/
> > Analysis.pm:552
> > STACK toplevel Restriction_analyser_multi_CpG_a.pl:182
> >
> >
> > Incase it helps here is what my program is doing:
> > -Reads in multiple fasta sequences (~7kb average size) on at a time
> > and creates a SeqIO object for each.
> > -Restriction sites for a particular enzyme are determined for each
> > SeqIO object and then a fragment 2kb in size is created around that
> > site.
> > -A new Seq object is created using the above fragment using
> > "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);"
> > -This new Seq object is fed into Restriction Analysis to generate
> > fragments for another enzyme.
> >
> > Thanks so much for any help, best regards,
> > Garrett
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
>
From allenday at ucla.edu  Wed Jan 26 01:27:00 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 01:22:56 2005
Subject: [Bioperl-l] RPMs for bioperl
Message-ID: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>

Hi,

I've put together a set of RPMs for Bioperl, Bioperl-DB, Bioperl-Run, and
GBrowse.  It's still a work in progress, but you can see the current state
here:
  http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/
There are some related directories rooted here: 
  http://sumo.genetics.ucla.edu/~allenday/flute/

The RPMs don't install clean.  This is because I'm using an automated tool
to build the RPMs, and it looks through each downloaded tarball from CPAN
to see what that tarball depends on.  Sometimes there are dependencies on
libraries that don't exist on CPAN, or might be altogether non-existent.  
These are the problem libraries and binaries:

% rpm -Uvh --test *.rpm
error: Failed dependencies:
        perl(Ace::Browser::LocalSiteDefs) is needed by perl-AcePerl-1.87-allenday
        perl(Bio::Das::ProServer::SourceHydra) is needed by perl-Bio-Das-0.99-allenday
        perl(IndexSupport) is needed by perl-Bio-Das-0.99-allenday
        perl(srsperl) is needed by perl-bioperl-1.5.0-allenday
        perl(Bio::DB::BioDB) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(Bio::DB::Query::BioQuery) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(GuessDirectories) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(MOBY::Client::Central) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(MOBY::Client::Service) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(MOBY::CommonSubs) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(MOBY::MobyXMLConstants) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(PPM::Archive) is needed by perl-Generic-Genome-Browser-1.62-allenday
        perl(MQClient::MQSeries) is needed by perl-SOAP-Lite-0.60-allenday
        perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday
        perl(MQSeries::Message) is needed by perl-SOAP-Lite-0.60-allenday
        perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday
        perl(MQSeries::QueueManager) is needed by perl-SOAP-Lite-0.60-allenday
        /bin/perl is needed by perl-Tk-804.027-allenday
        /usr/local/bin/perl is needed by perl-Tk-804.027-allenday
        perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday
        perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday
        perl(XML::LibXML) >= 1.57 is needed by perl-XML-LibXSLT-1.57-allenday
        perl(XML::SAX::PurePerl::DTDDecls) is needed by perl-XML-SAX-0.12-allenday
        perl(XML::SAX::PurePerl::DocType) is needed by perl-XML-SAX-0.12-allenday
        perl(XML::SAX::PurePerl::EncodingDetect) is needed by perl-XML-SAX-0.12-allenday
        perl(XML::SAX::PurePerl::XMLDecl) is needed by perl-XML-SAX-0.12-allenday

Lincoln, I'm guessing you can help me with:

* Ace::Browser::LocalSiteDefs
* Bio::Das::ProServer::SourceHydra
* GuessDirectories
* IndexSupport
* MOBY::*

Hilmar, do you know about:

* Bio::DB::BioDB
* Bio::DB::Query::BioQuery

I'm sure someone on this list knows where to get

* srsperl
* PPM::Archive

If anyone can shed light on where any of these libraries can be found, I'd
appreciate it.  Thanks.

-Allen
From nathanhaigh at ukonline.co.uk  Wed Jan 26 03:58:06 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 26 03:54:05 2005
Subject: [Bioperl-l] Something I am confused about and have not
	seenexplained in the docs:
In-Reply-To: <1106703650.19499.2.camel@rattus.Localdomain>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAACbFToIPj4UmxQElAja2XmgEAAAAA@ukonline.co.uk>

The bioperl-core modules distributed as bioperl-1.4, bioperl-1.5 etc consist of the core modules and if additional functionality is
required, you can install one/more of the following bioperl packages: the run package (bioperl-run), Ext (bioperl-ext), microarray
(bioperl-microarray) etc. They all depend on the core bioperl package to be installed, but add additional functionality. They can be
seen at:
http://www.bioperl.org/Core/Latest/index.shtml

bioperl-live is the name given to the cutting-edge versions of all the bioperl files, available via CVS. Bioperl is open-source many
different people contribute to its development from just reporting bugs/errors to writing entirely new modules that extend bioperl's
functionality. As a result, the Concurrent Versions System (CVS) is used to track all the modifications of all the bioperl files, so
a developer can make a bugfix etc to a file and commit it to CVS. This results in the continual evolution of bioperl even after an
official release of bioperl; for example, v1.4 and 1.5 once released do not change - EVER) but updates to files are recorded using
CVS and would be included in future releases i.e. 1.5.1 or 1.6. Access to this cutting-edge code is available for those who want it
using CVS or by navigating the links at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/?cvsroot=bioperl and selecting the
Download Tarball at the appropriate page.

Hope this helps
Nathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of William Kenworthy
> Sent: 26 January 2005 01:41
> To: BioPerl List
> Subject: [Bioperl-l] Something I am confused about and have not seenexplained in the docs:
> 
> Something I am confused about and have not seen explained in the docs:
> 
> Is bioperl-run complimentary, a subset of or a self-contained package of
> separate functions compared to bioperl?  And what is, and how does
> bioperl-live fit into this picture?
> 
> BillK
> 
> --
> William Kenworthy <billk@iinet.net.au>
> Home!
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 26/01/2005 08:55:51
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From nathanhaigh at ukonline.co.uk  Wed Jan 26 03:59:54 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 26 03:55:53 2005
Subject: [Bioperl-l] bioperl-1.5.0 released
In-Reply-To: <A1D74A2E-6F0E-11D9-90EA-000393C44276@duke.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAA+9vcyuMYjka5dkwLTZEVZAEAAAAA@ukonline.co.uk>

Would it be helpful for me to make the ppd file for bioperl-run 1.4 so people can install it easily using PPM?


Nathan


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 25 January 2005 20:21
> To: nathanhaigh@ukonline.co.uk
> Cc: Bioperl list
> Subject: Re: [Bioperl-l] bioperl-1.5.0 released
> 
> I just don't have the time to do this right now. It has not really been
> tested for all tests passing.
> 
> if someone else wants to volunteer to work on validating it for release
> that would be great.
> 
> -jason
> On Jan 25, 2005, at 12:59 PM, Nathan Haigh wrote:
> 
> > Will there also be 1.5 releases for bioperl-run etc?
> >
> > Nathan
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces@portal.open-bio.org
> >> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason
> >> Stajich
> >> Sent: 25 January 2005 02:37
> >> To: 'bioperl-l@bioperl.org' List; bioperl-announce-l@bioperl.org
> >> Subject: [Bioperl-l] bioperl-1.5.0 released
> >>
> >> Bioperl 1.5.0 Developer's release is available for download.
> >> ===============================================
> >>
> >>   http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2
> >> 425ac55ecbb4339b7b532ba6d429bb40
> >>   http://bioperl.org/DIST/bioperl-1.5.0.tar.gz
> >> 172472f0675de9a583432e21c9b1b5fc
> >>   http://bioperl.org/DIST/bioperl-1.5.0.zip
> >> 3febcd2445a7393c65981a6f9f13a9ed
> >>
> >> We'll update the website to reflect this new release.
> >>
> >> The odd-numbered releases are called developer releases and are not
> >> deposited on CPAN.  Please note that the API in 1.5.0 may change
> >> before
> >> the 1.6.0 release. which will be consider a stable API.  We may do
> >> another developer release before 1.6.0 goes out.
> >>
> >> Lots of people have contributed to this release, I apologize for not
> >> naming them all.  I'll try to cover some: thanks to Aaron Mackey for
> >> getting this release started, Brian Osborne for extensive
> >> documentation
> >> improvements, Nathan Haigh for volunteering to make a PPM of the
> >> release and Barry Moore and Nathan answering many of the windows
> >> related questions, Allen Day & Scott Cain & Steffen Grossmann for the
> >> work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for
> >> the work with Unflattener to merge GenBank annotations into GFF3
> >> objects.
> >>
> >> Please see the AUTHORS file for a complete list of contributors.
> >>
> >> Jason Stajich on behalf of the Bioperl developers.
> >>
> >>
> >> Here is the info from the Changes file.
> >> 1.5 Developer release
> >>
> >>      o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics
> >>        provide Jukes-Cantor and Kimura pairwise distance methods,
> >>        respectively.
> >>
> >>      o Bio::AlignIO support for "po" format of POA, and "maf";
> >>        Bio::AlignIO::largemultifasta is a new alternative to
> >>        Bio::AlignIO::fasta for temporary file-based manipulation of
> >>        particularly large multiple sequence alignments.
> >>
> >>      o Bio::Assembly::Singlet allows orphan, unassembled sequences to
> >>        be treated similarly as an assembled contig.
> >>
> >>      o Bio::CodonUsage provides new rare_codon() and probable_codons()
> >>        methods for identifying particular codons that encode a given
> >>        amino acid.
> >>
> >>      o Bio::Coordinate::Utils provides new from_align() method to
> >> build
> >>        a Bio::Coordinate pair directly from a
> >>        Bio::Align::AlignI-conforming object.
> >>
> >>      o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils.
> >>        Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's
> >>        web service using standard Pubmed query syntax, and retrieve
> >>        results as XML.
> >>
> >>      o Bio::DB::GFF has various sundry bug fixes.
> >>
> >>      o Bio::FeatureIO is a new SeqIO-style subsystem for
> >>        writing/reading genomic features to/from files.  I/O classes
> >>        exist for BED, GTF (aka GFF v2.5), and GFF v3.  Bio::FeatureIO
> >>        classes only read/write Bio::SeqFeature::Annotated objects.
> >>        Notably, the GFF v3 class requires features to be typed into
> >> the
> >>        Sequence Ontology.
> >>
> >>      o Bio::Graph namespace contains new modules for manipulation and
> >>        analysis of protein interaction graphs.
> >>
> >>      o Bio::Graphics has many bug fixes and shiny new glyphs.
> >>
> >>      o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file
> >>        indexing for HMMER reports and FASTA qual files, respectively.
> >>
> >>      o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are
> >>        new objects that can be placed within a
> >> Bio::Map::MapI-compliant
> >>        genetic/physical map; Bio::Map::Physical provides a new
> >> physical
> >>        map type; Bio::MapIO::fpc provides finger-printed clone mapping
> >>        import.
> >>
> >>      o Bio::Matrix::PSM provide new support for postion-specific
> >>        (scoring) matrices (e.g. profiles, or "possums").
> >>
> >>      o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now
> >>        be instantiated without explicitly using Bio::OntologyIO.  This
> >>        is possible through changes to Bio::Ontology::OntologyStore to
> >>        download ontology files from the web as necessary.  Locations
> >> of
> >>        ontology files are hard-coded into
> >>        Bio::Ontology::DocumentRegistry.
> >>
> >>      o Bio::PopGen includes many new methods and data types for
> >>        population genetics analyses.
> >>
> >>      o New constructor to Bio::Range, unions().  Given a list of
> >>        ranges, returns another list of "flattened" ranges --
> >>        overlapping ranges are merged into a single range with the
> >>        mininum and maximum coordinates of the entire overlapping
> >> group.
> >>
> >>      o Bio::Root::IO now supports -url, in addition to -file and -fh.
> >>        The new -url argument allows one to specify the network address
> >>        of a file for input.  -url currently only works for GET
> >>        requests, and thus is read-only.
> >>
> >>      o Bio::SearchIO::hmmer now returns individual Hit objects for
> >> each
> >>        domain alignment (thus containing only one HSP); previously
> >>        separate alignments would be merged into one hit if the domain
> >>        involved in the alignments was the same, but this only worked
> >>        when the repeated domain occured without interruption by any
> >>        other domain, leading to a confusing mixture of Hit and HSP
> >>        objects.
> >>
> >>      o Bio::Search::Result::ResultI-compliant report objects now
> >>        implement the "get_statistics" method to access
> >>        Bio::Search::StatisticsI objects that encapsulate any
> >>        statistical parameters associated with the search (e.g.
> >> Karlin's
> >>        lambda for BLAST/FASTA).
> >>
> >>      o Bio::Seq::LargeLocatableSeq combines the functionality already
> >>        found in Bio::Seq::LargeSeq and Bio::LocatableSeq.
> >>
> >>      o Bio::SeqFeature::Annotated is a replacement for
> >>        Bio::SeqFeature::Generic.  It breaks compliance with the
> >>        Bio::SeqFeatureI interface because the author was sick of
> >>        dealing with untyped annotation tags.  All
> >>        Bio::SeqFeature::Annotated annotations are Bio::AnnotationI
> >>        compliant, and accessible through Bio::Annotation::Collection.
> >>
> >>      o Bio::SeqFeature::Primer implements a Tm() method for primer
> >>        melting point predictions.
> >>
> >>      o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML,
> >>        InterProScan-XML, TIGR-XML, and NCBI TinySeq formats.
> >>
> >>      o Bio::Taxonomy::Node now implements the methods necessary for
> >>        Bio::Species interoperability.
> >>
> >>      o Bio::Tools::CodonTable has new reverse_translate_all() and
> >>        make_iupac_string() methods.
> >>
> >>      o Bio::Tools::dpAlign now provides sequence profile alignments.
> >>
> >>      o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF).
> >>
> >>      o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report
> >>        parsers.
> >>
> >>      o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl)
> >>        for designing small inhibitory RNA.
> >>
> >>      o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building
> >>        methods based on a distance matrix.
> >>
> >>      o Bio::Tree::Statistics provides an assess_bootstrap() method to
> >>        calculate bootstrap support values on a guide tree topology,
> >>        based on provided bootstrap tree topologies.
> >>
> >>      o Bio::TreeIO now supports the Pagel (PAG) tree format.
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at duke.edu
> >> http://www.duke.edu/~jes12/
> >> ---
> >> avast! Antivirus: Inbound message clean.
> >> Virus Database (VPS): 0503-2, 21/01/2005
> >> Tested on: 25/01/2005 17:41:57
> >> avast! is copyright (c) 2000-2003 ALWIL Software.
> >> http://www.avast.com
> >>
> >>
> >>
> >>
> > ---
> > avast! Antivirus: Outbound message clean.
> > Virus Database (VPS): 0504-0, 25/01/2005
> > Tested on: 25/01/2005 17:59:00
> > avast! is copyright (c) 2000-2003 ALWIL Software.
> > http://www.avast.com
> >
> >
> >
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0504-0, 25/01/2005
> Tested on: 26/01/2005 08:32:25
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 26/01/2005 08:59:46
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From billk at iinet.net.au  Wed Jan 26 04:34:15 2005
From: billk at iinet.net.au (William Kenworthy)
Date: Wed Jan 26 04:32:47 2005
Subject: [Bioperl-l] Something I am confused about and have not
	seenexplained in the docs:
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAACbFToIPj4UmxQElAja2XmgEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAACbFToIPj4UmxQElAja2XmgEAAAAA@ukonline.co.uk>
Message-ID: <1106732055.19499.56.camel@rattus.Localdomain>

Thanks, none of this seems to be documented anywhere and perhaps should
be.

Perhaps the best way to get a relatively bug-free version (as I am
suffering from some 1.4 bugs that apparently have been long fixed) is to
go cvs for all the packages at the same time and at least be at the
working edge, rather than have a missmatched hodge podge of "stable, but
buggy and way too old" versions.

BillK

On Wed, 2005-01-26 at 08:58 +0000, Nathan Haigh wrote:
> The bioperl-core modules distributed as bioperl-1.4, bioperl-1.5 etc consist of the core modules and if additional functionality is
> required, you can install one/more of the following bioperl packages: the run package (bioperl-run), Ext (bioperl-ext), microarray
> (bioperl-microarray) etc. They all depend on the core bioperl package to be installed, but add additional functionality. They can be
> seen at:
> http://www.bioperl.org/Core/Latest/index.shtml
> 


Home!

From palmeida at igc.gulbenkian.pt  Wed Jan 26 06:25:06 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 26 06:24:20 2005
Subject: [Bioperl-l] Get tag value into a variable
Message-ID: <20050126112506.GD6071@bioinf.igc.gulbenkian.pt>

Hi,

This is probably a perl problem, rather than a bioperl problem, but I'm
having trouble storing a tag from a feature in a variable. I do this:

print $feat->get_tag_values($tag) , "\n" if $tag eq 'coded_by';
my $coded = $feat->get_tag_values($tag);
print $coded , "\n" if $tag eq 'coded_by';

and the output is this:

AK021294.1:<1..381
1

The first line is correct, and I suppose the second takes the value '1'
because $feat->get_tag_values($tag) was successful, but how can I put
the actual tag in a variable, for later use? (my current solution is to
print the tag to a file and then read it from there, which is less than
elegant, to say the least). I'm attaching the full code, in case someone
wants to test it.

Thanks,
Paulo

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From Marc.Logghe at devgen.com  Wed Jan 26 06:30:12 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Jan 26 06:26:22 2005
Subject: [Bioperl-l] Get tag value into a variable
Message-ID: <BEE28BF86078B6429D6C780635718E21905148@morelia.be.devgen.com>


> This is probably a perl problem, rather than a bioperl 
> problem, but I'm
> having trouble storing a tag from a feature in a variable. I do this:
> 
> print $feat->get_tag_values($tag) , "\n" if $tag eq 'coded_by';
> my $coded = $feat->get_tag_values($tag);
> print $coded , "\n" if $tag eq 'coded_by';
> 
You have to do that in list context, because there might be multiple values for the same key (eg. multiple note tags)
so it should read (if you are only interested in the first one, or when there is only 1):

my ($coded) = $feat->get_tag_values($tag);
HTH,
Marc

From sanges at biogem.it  Wed Jan 26 06:34:20 2005
From: sanges at biogem.it (Remo Sanges)
Date: Wed Jan 26 06:30:37 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
In-Reply-To: <d8fb9af90501251812672aba54@mail.gmail.com>
References: <d8fb9af905012509451df2406e@mail.gmail.com>
	<820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>
	<d8fb9af90501251812672aba54@mail.gmail.com>
Message-ID: <8f683cdabefdee9bdf6630d37c008f79@biogem.it>

On Jan 26, 2005, at 3:12 AM, Garrett Sorensen wrote:

> Thanks for the suggestion Rob but still having issues.  I've boiled
> the code down to just reading in fasta sequences and digesting the
> whole sequences - opposed to digesting a subsequence as done
> initially.  It digests a few hundred sequences without issue and then
> runs into the same error, but reports different corrdinates of course.
>
> If it is only reading in a fasta sequence and digesting it how is it
> calculating wrong start/end coordinates for itself?  And the fact that
> it will work great on many sequences but then calculates wrong
> coordinates for one seems strange...  Any ideas?
>
> I've tried feeding Restriction::Analysis both a SeqIO object and a
> PrimarySeq with the same result.
>
> Here is the code:
>
> use strict;
> use Bio::SeqIO;
> use Bio::PrimarySeq;
> use Bio::Restriction::Analysis;
> use Bio::PrimarySeq;
> use Data::Dumper;
>
> my $in  = Bio::SeqIO->new(-file => "$file",  -format => 'fasta');
> while ( my $seq = $in->next_seq ) {
>     my $ra=Bio::Restriction::Analysis->new(-seq=>$seq);
>     my @fragments = $ra->fragments('NlaIII');
>     print join ("\n\n", @fragments);
>     }
> exit
>

Garrett,

this code that you post isn' t really very helpful...
You should have a problem at a certain point in your code
where you ask for a ->subseq with the start bigger than end...
The problem should come from your calculations, not from
module's errors....

HTH

Remo

> On Tue, 25 Jan 2005 17:20:47 -0800, Rob Edwards <rob@salmonella.org> 
> wrote:
>> It is hard to locate the exact error without more information. The
>> error is caused because at some point you are trying to get a sequence
>> that starts at position 2002, but the sequence is only 2001 nt long
>> (hence the error that 2002 must be < 2001). I would suggest that the
>> error is at some point where you are taking the 2kb fragment around 
>> the
>> site. The most obvious thing to start with is what are the start/end
>> coordinates that are called immediately before the error, and do they
>> make sense given the length of the sequence?
>>
>> Rob
>>
>> On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote:
>>
>>> Hello,  I'm new to the mailing list.  Thanks in advance for any help.
>>>
>>> I'm really stumped by the following error when running restriction
>>> analysis on large numbers of seq objects.  This only occurs sometimes
>>> when dealing with large numbers of sequences.
>>>
>>>
>>> ------------- EXCEPTION  -------------
>>> MSG: Bad start,end parameters. Start [2002] has to be less than end
>>> [2001]
>>> STACK Bio::PrimarySeq::subseq
>>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
>>> STACK Bio::Seq::subseq
>>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
>>> STACK Bio::Restriction::Analysis::fragment_maps
>>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/
>>> Analysis.pm:552
>>> STACK toplevel Restriction_analyser_multi_CpG_a.pl:182
>>>
>>>
>>> Incase it helps here is what my program is doing:
>>> -Reads in multiple fasta sequences (~7kb average size) on at a time
>>> and creates a SeqIO object for each.
>>> -Restriction sites for a particular enzyme are determined for each
>>> SeqIO object and then a fragment 2kb in size is created around that
>>> site.
>>> -A new Seq object is created using the above fragment using
>>> "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);"
>>> -This new Seq object is fed into Restriction Analysis to generate
>>> fragments for another enzyme.
>>>
>>> Thanks so much for any help, best regards,
>>> Garrett
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From nathanhaigh at ukonline.co.uk  Wed Jan 26 06:47:38 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 26 06:43:34 2005
Subject: [Bioperl-l] Bioperl CVS release differences
In-Reply-To: <7379A361-6F15-11D9-90EA-000393C44276@duke.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAG2g9xB8WKkWVudfjRpcvVQEAAAAA@ukonline.co.uk>

Thanks Jason

With your help and some internet searching I found the following worked brilliantly, and thought I'd share it with people in case it
is of some use!

Login to CVS:
$ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl login
PASSWD: cvs

Checkout Bioperl-1.5 into the local directory bioperl-1.5.0:
$ cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/bioperl -r bioperl-release-1-5-0 -d bioperl-1.5.0
$cd bioperl-1.5.0

Get the list of changed/new files in bioperl-1.5.0 compared with release 1.4
cvs -q diff --brief -N -r bioperl-release-1-4-0 | grep "^RCS"

However, this doesn't specify if the file was modified/added/removed. Something like the following works on WinXP:
cvs -q diff --brief -r bioperl-release-1-4-0 2>&1 | grep "^\(RCS\|cvs diff\)" | sort > files.log

    Example line from files.log of a modified file between 1.4 and 1.5 releases:
    RCS file: /home/repository/bioperl/bioperl-live/Bio/Species.pm,v

    Example line from files.log of a new file added since the 1.4 release:
    cvs diff: tag bioperl-release-1-4-0 is not in file Bio/FeatureIO.pm

    Example line from files.log of a file that was deleted since 1.4 release:
    cvs diff: doc/howto/html/e-novative.css no longer exists, no comparison available


Nathan


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich@duke.edu]
> Sent: 25 January 2005 21:10
> To: Nathan Spencer Haigh
> Cc: 'Bioperl'
> Subject: Re: [Bioperl-l] Bioperl CVS release differences
> 
> $ cvs -dYADDAYADDA co -r bioperl-release-1-5-0 -d bioperl-1.5.0
> bioperl-live
> $ cd bioperl-1.5.0
> 
>   -- see what is different from the 1.4 branch (this includes bugs fixes
> that were made on that branch when we thought we were going to release
> a 1.4.1)
> $ cvs diff -r branch-1.4
> 
>   -- see what is difference since the 1.4.0 release
> $ cvs diff -r bioperl-release-1-4-0
> 
> The FeatureIO and SeqFeature::Annotated is a BIG difference between 1.4
> and may not necessarily be part of the stable 1.6.0 depending on the
> backwards compatibility and different views on how to develop.
> 
> -jason
> 
> On Jan 25, 2005, at 7:42 AM, Nathan Spencer Haigh wrote:
> 
> > I was wondering if it is at all possible to do the following with cvs:
> > I would like to obtain a copy of all the files that are new/changed in
> > bioperl-1.5 compared to the 1.4 release. The reason i want to do this
> > is that
> > i'd like to package up a perl program with (some of) these files so i
> > only need
> > to request that bioperl-1.4 be installed on the clients computer.
> >
> > Thanks
> > Nathan
> >
> > ----------------------------------------------
> > This mail sent through http://www.ukonline.net
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0504-0, 25/01/2005
> Tested on: 26/01/2005 08:32:25
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 

---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 26/01/2005 11:41:29
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 26/01/2005 11:46:31
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From palmeida at igc.gulbenkian.pt  Wed Jan 26 07:51:56 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 26 07:46:53 2005
Subject: [Bioperl-l] Get tag value into a variable
In-Reply-To: <BEE28BF86078B6429D6C780635718E21905148@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E21905148@morelia.be.devgen.com>
Message-ID: <20050126125156.GF6071@bioinf.igc.gulbenkian.pt>

Thanks! That did it.

-Paulo
On Wed, Jan 26, 2005 at 12:30:12PM +0100, Marc Logghe wrote:
> 
> > This is probably a perl problem, rather than a bioperl 
> > problem, but I'm
> > having trouble storing a tag from a feature in a variable. I do this:
> > 
> > print $feat->get_tag_values($tag) , "\n" if $tag eq 'coded_by';
> > my $coded = $feat->get_tag_values($tag);
> > print $coded , "\n" if $tag eq 'coded_by';
> > 
> You have to do that in list context, because there might be multiple values for the same key (eg. multiple note tags)
> so it should read (if you are only interested in the first one, or when there is only 1):
> 
> my ($coded) = $feat->get_tag_values($tag);
> HTH,
> Marc
From garrettsorensen at gmail.com  Wed Jan 26 09:57:51 2005
From: garrettsorensen at gmail.com (Garrett Sorensen)
Date: Wed Jan 26 09:53:56 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
In-Reply-To: <8f683cdabefdee9bdf6630d37c008f79@biogem.it>
References: <d8fb9af905012509451df2406e@mail.gmail.com>
	<820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>
	<d8fb9af90501251812672aba54@mail.gmail.com>
	<8f683cdabefdee9bdf6630d37c008f79@biogem.it>
Message-ID: <d8fb9af905012606575771aafc@mail.gmail.com>

Thanks Remo..   To test this module that is the only code I'm using
right now...  I'm no longer grabbing a subsequence so it can't be
calculation error.  To test all I'm trying to do is read in sequences
from a fasta file and digest them.  It runs fine for a few hundred
sequences generating fragments as it should, then out of nowhere it
will run into the same error, but with different coordinates.

Possibly this module isn't working properly for me?

 ------------- EXCEPTION  -------------
 MSG: Bad start,end parameters. Start [2002] has to be less than end
 [2001]
STACK Bio::PrimarySeq::subseq
/home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
STACK Bio::Seq::subseq
 /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
STACK Bio::Restriction::Analysis::fragment_maps
 /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/
 Analysis.pm:552
 STACK toplevel Restriction_analyser_multi_CpG_a.pl:182


On Wed, 26 Jan 2005 12:34:20 +0100, Remo Sanges <sanges@biogem.it> wrote:
> On Jan 26, 2005, at 3:12 AM, Garrett Sorensen wrote:
> 
> > Thanks for the suggestion Rob but still having issues.  I've boiled
> > the code down to just reading in fasta sequences and digesting the
> > whole sequences - opposed to digesting a subsequence as done
> > initially.  It digests a few hundred sequences without issue and then
> > runs into the same error, but reports different corrdinates of course.
> >
> > If it is only reading in a fasta sequence and digesting it how is it
> > calculating wrong start/end coordinates for itself?  And the fact that
> > it will work great on many sequences but then calculates wrong
> > coordinates for one seems strange...  Any ideas?
> >
> > I've tried feeding Restriction::Analysis both a SeqIO object and a
> > PrimarySeq with the same result.
> >
> > Here is the code:
> >
> > use strict;
> > use Bio::SeqIO;
> > use Bio::PrimarySeq;
> > use Bio::Restriction::Analysis;
> > use Bio::PrimarySeq;
> > use Data::Dumper;
> >
> > my $in  = Bio::SeqIO->new(-file => "$file",  -format => 'fasta');
> > while ( my $seq = $in->next_seq ) {
> >     my $ra=Bio::Restriction::Analysis->new(-seq=>$seq);
> >     my @fragments = $ra->fragments('NlaIII');
> >     print join ("\n\n", @fragments);
> >     }
> > exit
> >
> 
> Garrett,
> 
> this code that you post isn' t really very helpful...
> You should have a problem at a certain point in your code
> where you ask for a ->subseq with the start bigger than end...
> The problem should come from your calculations, not from
> module's errors....
> 
> HTH
> 
> Remo
> 
> > On Tue, 25 Jan 2005 17:20:47 -0800, Rob Edwards <rob@salmonella.org>
> > wrote:
> >> It is hard to locate the exact error without more information. The
> >> error is caused because at some point you are trying to get a sequence
> >> that starts at position 2002, but the sequence is only 2001 nt long
> >> (hence the error that 2002 must be < 2001). I would suggest that the
> >> error is at some point where you are taking the 2kb fragment around
> >> the
> >> site. The most obvious thing to start with is what are the start/end
> >> coordinates that are called immediately before the error, and do they
> >> make sense given the length of the sequence?
> >>
> >> Rob
> >>
> >> On Jan 25, 2005, at 9:45 AM, Garrett Sorensen wrote:
> >>
> >>> Hello,  I'm new to the mailing list.  Thanks in advance for any help.
> >>>
> >>> I'm really stumped by the following error when running restriction
> >>> analysis on large numbers of seq objects.  This only occurs sometimes
> >>> when dealing with large numbers of sequences.
> >>>
> >>>
> >>> ------------- EXCEPTION  -------------
> >>> MSG: Bad start,end parameters. Start [2002] has to be less than end
> >>> [2001]
> >>> STACK Bio::PrimarySeq::subseq
> >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
> >>> STACK Bio::Seq::subseq
> >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
> >>> STACK Bio::Restriction::Analysis::fragment_maps
> >>> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/
> >>> Analysis.pm:552
> >>> STACK toplevel Restriction_analyser_multi_CpG_a.pl:182
> >>>
> >>>
> >>> Incase it helps here is what my program is doing:
> >>> -Reads in multiple fasta sequences (~7kb average size) on at a time
> >>> and creates a SeqIO object for each.
> >>> -Restriction sites for a particular enzyme are determined for each
> >>> SeqIO object and then a fragment 2kb in size is created around that
> >>> site.
> >>> -A new Seq object is created using the above fragment using
> >>> "$upStreamSeqobj = Bio::Seq->new (-seq => $upStreamSeq);"
> >>> -This new Seq object is fed into Restriction Analysis to generate
> >>> fragments for another enzyme.
> >>>
> >>> Thanks so much for any help, best regards,
> >>> Garrett
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
>
From nathanhaigh at ukonline.co.uk  Wed Jan 26 10:51:43 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 26 10:47:45 2005
Subject: [Bioperl-l] bioperl development
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAlg9Ku2LQc0CDIrtCu3/5LQEAAAAA@ukonline.co.uk>

I was wondering who plans the development of bioperl and how is this organised?

 
The reason I ask, is that I've recently opened a project at sourceforge.net and was surprised by the amount of tools that are
available for organising project development. For example you are able to organise project tasks, has CVS support, mailing lists,
discussion forum (public and private), tracker system for bugs, support requests, patches and feature requests and web space. It
seems to me that some of these features could benefit bioperl. I'm not sure about the setup of bioperl servers and websites, but
would it be possible to implement some of these development tools for bioperl?

 
Nathan

 
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 26/01/2005 15:51:33
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From hollandr at gis.a-star.edu.sg  Wed Jan 26 02:54:34 2005
From: hollandr at gis.a-star.edu.sg (Richard HOLLAND)
Date: Wed Jan 26 11:18:56 2005
Subject: [Bioperl-l] PatternHunter parsing
Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56015B2E28@BIONIC.biopolis.one-north.com>

Are there any BioJava or BioPerl modules for parsing PatternHunter output? It's very similar to Blast output, so if there isn't one already, would other people be interested in using one if I wrote one?

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199   
 
---------------------------------------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you.
---------------------------------------------


From palmeida at igc.gulbenkian.pt  Wed Jan 26 06:26:15 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 26 11:19:10 2005
Subject: [Bioperl-l] Get feature tag - forgotten attachment
Message-ID: <20050126112615.GE6071@bioinf.igc.gulbenkian.pt>


-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testget.pl
Type: text/x-perl
Size: 395 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050126/2335d3d6/testget.bin
From ak at ebi.ac.uk  Wed Jan 26 11:32:57 2005
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Wed Jan 26 11:30:24 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAlg9Ku2LQc0CDIrtCu3/5LQEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAlg9Ku2LQc0CDIrtCu3/5LQEAAAAA@ukonline.co.uk>
Message-ID: <20050126163257.GB2193@ebi.ac.uk>

Looking at the bioperl.org site I can see references to CVS,
Bugzilla, mailing lists, FAQ & HOWTOs and a lot of other things.
Do you think a move into sourceforge, away from the open-bio
foundation resources (which also hosts biopython and biojava
etc.), would be worth it and be beneficial to the development of
the project?

Sorry, but I don't think so.


Andreas

On Wed, Jan 26, 2005 at 03:51:43PM -0000, Nathan Haigh wrote:
> I was wondering who plans the development of bioperl and how is this organised?
> 
>  
> 
> The reason I ask, is that I've recently opened a project at sourceforge.net and was surprised by the amount of tools that are
> available for organising project development. For example you are able to organise project tasks, has CVS support, mailing lists,
> discussion forum (public and private), tracker system for bugs, support requests, patches and feature requests and web space. It
> seems to me that some of these features could benefit bioperl. I'm not sure about the setup of bioperl servers and websites, but
> would it be possible to implement some of these development tools for bioperl?

-- 
Andreas K?h?ri
EMBL-EBI/ensembl

1024D/C2E163CB
From nathanhaigh at ukonline.co.uk  Wed Jan 26 11:54:47 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Wed Jan 26 11:50:49 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <20050126163257.GB2193@ebi.ac.uk>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>

Sorry, I wasn't suggesting any sort of move, I was thinking more of just implementing similar tools that would allow developers to
better organise and delegate tasks to other developers/users. This way, people who would like to help could view a list of tasks
that they would like to take part in.

I just thought that the example of sourceforge towards the development of open-source software by organising and coordinating effort
from project admin/developers/users was a good one, and that certain aspects could be useful for coordinating efforts for bioperl.

Nathan


> -----Original Message-----
> From: Andreas Kahari [mailto:ak@ebi.ac.uk]
> Sent: 26 January 2005 16:33
> To: Nathan Haigh
> Cc: 'Bioperl'
> Subject: Re: [Bioperl-l] bioperl development
> 
> Looking at the bioperl.org site I can see references to CVS,
> Bugzilla, mailing lists, FAQ & HOWTOs and a lot of other things.
> Do you think a move into sourceforge, away from the open-bio
> foundation resources (which also hosts biopython and biojava
> etc.), would be worth it and be beneficial to the development of
> the project?
> 
> Sorry, but I don't think so.
> 
> 
> Andreas
> 
> On Wed, Jan 26, 2005 at 03:51:43PM -0000, Nathan Haigh wrote:
> > I was wondering who plans the development of bioperl and how is this organised?
> >
> >
> >
> > The reason I ask, is that I've recently opened a project at sourceforge.net and was surprised by the amount of tools that are
> > available for organising project development. For example you are able to organise project tasks, has CVS support, mailing
lists,
> > discussion forum (public and private), tracker system for bugs, support requests, patches and feature requests and web space. It
> > seems to me that some of these features could benefit bioperl. I'm not sure about the setup of bioperl servers and websites, but
> > would it be possible to implement some of these development tools for bioperl?
> 
> --
> Andreas K?h?ri
> EMBL-EBI/ensembl
> 
> 1024D/C2E163CB
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 26/01/2005 16:54:44
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From jason.stajich at duke.edu  Wed Jan 26 12:56:11 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 26 12:52:14 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
Message-ID: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>


On Jan 26, 2005, at 11:54 AM, Nathan Haigh wrote:

> Sorry, I wasn't suggesting any sort of move, I was thinking more of 
> just implementing similar tools that would allow developers to
> better organise and delegate tasks to other developers/users. This 
> way, people who would like to help could view a list of tasks
> that they would like to take part in.
>
Sure - even if we did a similar thing to the Mozilla site with their 
first bugs page http://www.mozilla.org/contribute/hacking/first-bugs/ & 
http://www.mozilla.org/developer/  that would be good.  If we can deal 
with a content management system which made updating these pages really 
easy then it will be actually be used (and therefore useful).

If we could get content-management and RSS feeds to be easy to update 
and edit that might make sense.  If we moved a majority of the web site 
over to something like moveable-type.  This is what in fact the 
biopython.org site is now done with and how the news.open-bio.org site 
is run.

We used to have a wikiweb setup for bioperl but it was really buggy and 
just didn't get used that much.  A new wiki would be nice I expect.  
The hard part is always have TOO many places for documentation and 
keeping it all organized.  We already have a hard enough time keeping 
the modules organized and documented.

> I just thought that the example of sourceforge towards the development 
> of open-source software by organising and coordinating effort
> from project admin/developers/users was a good one, and that certain 
> aspects could be useful for coordinating efforts for bioperl.
>

These are good thoughts - as always it take some energy and time to put 
into place a new system.  We really welcome anyone trying to make this 
a better system.  At some level it is hard for the core developers to 
be project managers, developers, and system administrators.  So any 
help is really much appreciated.

> Nathan
>
>
>> -----Original Message-----
>> From: Andreas Kahari [mailto:ak@ebi.ac.uk]
>> Sent: 26 January 2005 16:33
>> To: Nathan Haigh
>> Cc: 'Bioperl'
>> Subject: Re: [Bioperl-l] bioperl development
>>
>> Looking at the bioperl.org site I can see references to CVS,
>> Bugzilla, mailing lists, FAQ & HOWTOs and a lot of other things.
>> Do you think a move into sourceforge, away from the open-bio
>> foundation resources (which also hosts biopython and biojava
>> etc.), would be worth it and be beneficial to the development of
>> the project?
>>
>> Sorry, but I don't think so.
>>
>>
>> Andreas
>>
>> On Wed, Jan 26, 2005 at 03:51:43PM -0000, Nathan Haigh wrote:
>>> I was wondering who plans the development of bioperl and how is this 
>>> organised?
>>>
>>>
>>>
>>> The reason I ask, is that I've recently opened a project at 
>>> sourceforge.net and was surprised by the amount of tools that are
>>> available for organising project development. For example you are 
>>> able to organise project tasks, has CVS support, mailing
> lists,
>>> discussion forum (public and private), tracker system for bugs, 
>>> support requests, patches and feature requests and web space. It
>>> seems to me that some of these features could benefit bioperl. I'm 
>>> not sure about the setup of bioperl servers and websites, but
>>> would it be possible to implement some of these development tools 
>>> for bioperl?
>>
>> --
>> Andreas K?h?ri
>> EMBL-EBI/ensembl
>>
>> 1024D/C2E163CB
> ---
> avast! Antivirus: Outbound message clean.
> Virus Database (VPS): 0504-0, 25/01/2005
> Tested on: 26/01/2005 16:54:44
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


From sanges at biogem.it  Wed Jan 26 13:05:52 2005
From: sanges at biogem.it (Remo Sanges)
Date: Wed Jan 26 13:02:04 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
In-Reply-To: <d8fb9af905012606575771aafc@mail.gmail.com>
References: <d8fb9af905012509451df2406e@mail.gmail.com>
	<820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>
	<d8fb9af90501251812672aba54@mail.gmail.com>
	<8f683cdabefdee9bdf6630d37c008f79@biogem.it>
	<d8fb9af905012606575771aafc@mail.gmail.com>
Message-ID: <145446a61b72364ba730f2f89b075d99@biogem.it>


On Jan 26, 2005, at 3:57 PM, Garrett Sorensen wrote:

> Thanks Remo..   To test this module that is the only code I'm using
> right now...  I'm no longer grabbing a subsequence so it can't be
> calculation error.  To test all I'm trying to do is read in sequences
> from a fasta file and digest them.  It runs fine for a few hundred
> sequences generating fragments as it should, then out of nowhere it
> will run into the same error, but with different coordinates.
>
> Possibly this module isn't working properly for me?
>
>  ------------- EXCEPTION  -------------
>  MSG: Bad start,end parameters. Start [2002] has to be less than end
>  [2001]
> STACK Bio::PrimarySeq::subseq
> /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
> STACK Bio::Seq::subseq
>  /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
> STACK Bio::Restriction::Analysis::fragment_maps
>  /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/
>  Analysis.pm:552
>  STACK toplevel Restriction_analyser_multi_CpG_a.pl:182

OK,

it seems to be a bug, you should submit it to
http://bugzilla.bioperl.org/enter_bug.cgi?product=Bioperl

Basically it happens when you have a site for a blunt-end cutter
at the end of your sequence.

I think you are not interested in that site because
is a cut at the end of a non-circular sequence that
don' t produce fragments....

If so you can simply change line 552 in your Analysis.pm
module from this:

$seq{$start}=$self->{'_seq'}->subseq($start, $stop);

to this:

$seq{$start}=$self->{'_seq'}->subseq($start, $stop) unless $start > 
$stop;


HTH

Remo

From smarkel at scitegic.com  Wed Jan 26 13:50:35 2005
From: smarkel at scitegic.com (Scott Markel)
Date: Wed Jan 26 13:48:24 2005
Subject: [Bioperl-l] Bio::SearchIO::blast parsing problem with long hit
	scores
Message-ID: <41F7E67B.9090804@scitegic.com>

I have a blastn result with a hit score that's in exponential
notation.  Bio::SearchIO::blast truncates "2.741e+004" to "004".
I get the same result in 1.4 and 1.5.

Sequences producing significant alignments:                      (bits) Value

emb|AJ010957.1|HAAJ10957 Hippopotamus amphibius complete mitocho...  2.741e+004   0.0
gb|U31048.1|PRU31048 Pronolagus rupestris, Donkerpoort, South Af...   305   1e-080

Scott

-- 
Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel@scitegic.com
SciTegic Inc.                       mobile: +1 858 205 3653
9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
San Diego, CA 92123                 fax:    +1 858 279 8804
USA                                 web:    http://www.scitegic.com


From palmeida at igc.gulbenkian.pt  Wed Jan 26 14:23:11 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 26 14:20:59 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
	<9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
Message-ID: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt>

You might want to consider gforge (http://gforge.org), which was created
as a branch of the Sourceforge code, when that ceased to be Open Source.
That could bring SourceForge's features to the existing infrastructure,
giving you the best of both worlds. I never used it, but I wouldn't mind
finding out how it works, if you would be interested.

-Paulo

On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote:
> 
> If we could get content-management and RSS feeds to be easy to update 
> and edit that might make sense.  If we moved a majority of the web site 
> over to something like moveable-type.  This is what in fact the 
> biopython.org site is now done with and how the news.open-bio.org site 
> is run.
> 
> These are good thoughts - as always it take some energy and time to put 
> into place a new system.  We really welcome anyone trying to make this 
> a better system.  At some level it is hard for the core developers to 
> be project managers, developers, and system administrators.  So any 
> help is really much appreciated.
From palmeida at igc.gulbenkian.pt  Wed Jan 26 14:23:11 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Wed Jan 26 14:21:06 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
	<9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
Message-ID: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt>

You might want to consider gforge (http://gforge.org), which was created
as a branch of the Sourceforge code, when that ceased to be Open Source.
That could bring SourceForge's features to the existing infrastructure,
giving you the best of both worlds. I never used it, but I wouldn't mind
finding out how it works, if you would be interested.

-Paulo

On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote:
> 
> If we could get content-management and RSS feeds to be easy to update 
> and edit that might make sense.  If we moved a majority of the web site 
> over to something like moveable-type.  This is what in fact the 
> biopython.org site is now done with and how the news.open-bio.org site 
> is run.
> 
> These are good thoughts - as always it take some energy and time to put 
> into place a new system.  We really welcome anyone trying to make this 
> a better system.  At some level it is hard for the core developers to 
> be project managers, developers, and system administrators.  So any 
> help is really much appreciated.
From allenday at ucla.edu  Wed Jan 26 14:51:38 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 14:47:29 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
	<9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
	<20050126192311.GA13534@bioinf.igc.gulbenkian.pt>
Message-ID: <Pine.LNX.4.58.0501261150450.6354@sumo.ctrl.ucla.edu>

i really like having projects on sourceforge for all the reasons mentioned 
in this thread.  i'd try to use a sourceforge or gforge site, if it was 
available.

-allen


On Wed, 26 Jan 2005, Paulo Almeida wrote:

> You might want to consider gforge (http://gforge.org), which was created
> as a branch of the Sourceforge code, when that ceased to be Open Source.
> That could bring SourceForge's features to the existing infrastructure,
> giving you the best of both worlds. I never used it, but I wouldn't mind
> finding out how it works, if you would be interested.
> 
> -Paulo
> 
> On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote:
> > 
> > If we could get content-management and RSS feeds to be easy to update 
> > and edit that might make sense.  If we moved a majority of the web site 
> > over to something like moveable-type.  This is what in fact the 
> > biopython.org site is now done with and how the news.open-bio.org site 
> > is run.
> > 
> > These are good thoughts - as always it take some energy and time to put 
> > into place a new system.  We really welcome anyone trying to make this 
> > a better system.  At some level it is hard for the core developers to 
> > be project managers, developers, and system administrators.  So any 
> > help is really much appreciated.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From allenday at ucla.edu  Wed Jan 26 14:51:38 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 14:47:34 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <20050126192311.GA13534@bioinf.igc.gulbenkian.pt>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
	<9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
	<20050126192311.GA13534@bioinf.igc.gulbenkian.pt>
Message-ID: <Pine.LNX.4.58.0501261150450.6354@sumo.ctrl.ucla.edu>

i really like having projects on sourceforge for all the reasons mentioned 
in this thread.  i'd try to use a sourceforge or gforge site, if it was 
available.

-allen


On Wed, 26 Jan 2005, Paulo Almeida wrote:

> You might want to consider gforge (http://gforge.org), which was created
> as a branch of the Sourceforge code, when that ceased to be Open Source.
> That could bring SourceForge's features to the existing infrastructure,
> giving you the best of both worlds. I never used it, but I wouldn't mind
> finding out how it works, if you would be interested.
> 
> -Paulo
> 
> On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote:
> > 
> > If we could get content-management and RSS feeds to be easy to update 
> > and edit that might make sense.  If we moved a majority of the web site 
> > over to something like moveable-type.  This is what in fact the 
> > biopython.org site is now done with and how the news.open-bio.org site 
> > is run.
> > 
> > These are good thoughts - as always it take some energy and time to put 
> > into place a new system.  We really welcome anyone trying to make this 
> > a better system.  At some level it is hard for the core developers to 
> > be project managers, developers, and system administrators.  So any 
> > help is really much appreciated.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From hlapp at gnf.org  Wed Jan 26 15:10:34 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Jan 26 15:09:28 2005
Subject: [Bioperl-l] Re: RPMs for bioperl
In-Reply-To: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
Message-ID: <56220C54-6FD6-11D9-9E4D-000A95AE92B0@gnf.org>


On Jan 25, 2005, at 10:27 PM, Allen Day wrote:

> Hilmar, do you know about:
>
> * Bio::DB::BioDB
> * Bio::DB::Query::BioQuery

These come with (are modules in) bioperl-db. If you have bioperl-db the 
dependency should be satisfied.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From jason.stajich at duke.edu  Wed Jan 26 15:54:19 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 26 15:50:14 2005
Subject: [Bioperl-l] Re: [GMOD-devel] RPMs for bioperl
In-Reply-To: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
Message-ID: <84383400f40db30e628067a6867d4c75@duke.edu>

srsperl is only for people with srs you should have it be ignored.

if you are using cpan2rpm I would tell it to ignore certain 
dependancies. When I built rpms for our internal machines I just had it 
ignore the non-essential ones like Ace, etc.

-jason
On Jan 26, 2005, at 1:27 AM, Allen Day wrote:

> Hi,
>
> I've put together a set of RPMs for Bioperl, Bioperl-DB, Bioperl-Run, 
> and
> GBrowse.  It's still a work in progress, but you can see the current 
> state
> here:
>   http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/
> There are some related directories rooted here:
>   http://sumo.genetics.ucla.edu/~allenday/flute/
>
> The RPMs don't install clean.  This is because I'm using an automated 
> tool
> to build the RPMs, and it looks through each downloaded tarball from 
> CPAN
> to see what that tarball depends on.  Sometimes there are dependencies 
> on
> libraries that don't exist on CPAN, or might be altogether 
> non-existent.
> These are the problem libraries and binaries:
>
> % rpm -Uvh --test *.rpm
> error: Failed dependencies:
>         perl(Ace::Browser::LocalSiteDefs) is needed by 
> perl-AcePerl-1.87-allenday
>         perl(Bio::Das::ProServer::SourceHydra) is needed by 
> perl-Bio-Das-0.99-allenday
>         perl(IndexSupport) is needed by perl-Bio-Das-0.99-allenday
>         perl(srsperl) is needed by perl-bioperl-1.5.0-allenday
>         perl(Bio::DB::BioDB) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(Bio::DB::Query::BioQuery) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(GuessDirectories) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(MOBY::Client::Central) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(MOBY::Client::Service) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(MOBY::CommonSubs) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(MOBY::MobyXMLConstants) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(PPM::Archive) is needed by 
> perl-Generic-Genome-Browser-1.62-allenday
>         perl(MQClient::MQSeries) is needed by 
> perl-SOAP-Lite-0.60-allenday
>         perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday
>         perl(MQSeries::Message) is needed by 
> perl-SOAP-Lite-0.60-allenday
>         perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday
>         perl(MQSeries::QueueManager) is needed by 
> perl-SOAP-Lite-0.60-allenday
>         /bin/perl is needed by perl-Tk-804.027-allenday
>         /usr/local/bin/perl is needed by perl-Tk-804.027-allenday
>         perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday
>         perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday
>         perl(XML::LibXML) >= 1.57 is needed by 
> perl-XML-LibXSLT-1.57-allenday
>         perl(XML::SAX::PurePerl::DTDDecls) is needed by 
> perl-XML-SAX-0.12-allenday
>         perl(XML::SAX::PurePerl::DocType) is needed by 
> perl-XML-SAX-0.12-allenday
>         perl(XML::SAX::PurePerl::EncodingDetect) is needed by 
> perl-XML-SAX-0.12-allenday
>         perl(XML::SAX::PurePerl::XMLDecl) is needed by 
> perl-XML-SAX-0.12-allenday
>
> Lincoln, I'm guessing you can help me with:
>
> * Ace::Browser::LocalSiteDefs
> * Bio::Das::ProServer::SourceHydra
> * GuessDirectories
> * IndexSupport
> * MOBY::*
>
> Hilmar, do you know about:
>
> * Bio::DB::BioDB
> * Bio::DB::Query::BioQuery
>
> I'm sure someone on this list knows where to get
>
> * srsperl
> * PPM::Archive
>
> If anyone can shed light on where any of these libraries can be found, 
> I'd
> appreciate it.  Thanks.
>
> -Allen
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> Gmod-devel mailing list
> Gmod-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From jason.stajich at duke.edu  Wed Jan 26 15:55:06 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Jan 26 15:50:58 2005
Subject: [Bioperl-l] Bio::SearchIO::blast parsing problem with long hit
	scores
In-Reply-To: <41F7E67B.9090804@scitegic.com>
References: <41F7E67B.9090804@scitegic.com>
Message-ID: <28d24e7f297bd7a1c6aaf2a14fa7ce8c@duke.edu>

Can you put an example report with the bug report on bugzilla?  I think 
I have a fix but want to test it out on the real data.

-jason
On Jan 26, 2005, at 1:50 PM, Scott Markel wrote:

> I have a blastn result with a hit score that's in exponential
> notation.  Bio::SearchIO::blast truncates "2.741e+004" to "004".
> I get the same result in 1.4 and 1.5.
>
> Sequences producing significant alignments:                      
> (bits) Value
>
> emb|AJ010957.1|HAAJ10957 Hippopotamus amphibius complete mitocho...  
> 2.741e+004   0.0
> gb|U31048.1|PRU31048 Pronolagus rupestris, Donkerpoort, South Af...   
> 305   1e-080
>
> Scott
>
> -- 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel@scitegic.com
> SciTegic Inc.                       mobile: +1 858 205 3653
> 9665 Chesapeake Drive, Suite 401    voice:  +1 858 279 8800, ext. 253
> San Diego, CA 92123                 fax:    +1 858 279 8804
> USA                                 web:    http://www.scitegic.com
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

From garrettsorensen at gmail.com  Wed Jan 26 16:12:36 2005
From: garrettsorensen at gmail.com (Garrett Sorensen)
Date: Wed Jan 26 16:09:36 2005
Subject: [Bioperl-l] Restriction::Analysis strange error - please help
In-Reply-To: <145446a61b72364ba730f2f89b075d99@biogem.it>
References: <d8fb9af905012509451df2406e@mail.gmail.com>
	<820FADCE-6F38-11D9-A47D-000A959E1622@salmonella.org>
	<d8fb9af90501251812672aba54@mail.gmail.com>
	<8f683cdabefdee9bdf6630d37c008f79@biogem.it>
	<d8fb9af905012606575771aafc@mail.gmail.com>
	<145446a61b72364ba730f2f89b075d99@biogem.it>
Message-ID: <d8fb9af90501261312e228b90@mail.gmail.com>

Thanks so much Remo, the modification to Analysis.pm worked
beautifully.  I will submit the bug report as you suggested.

Many thanks,
Garrett


On Wed, 26 Jan 2005 19:05:52 +0100, Remo Sanges <sanges@biogem.it> wrote:
> 
> On Jan 26, 2005, at 3:57 PM, Garrett Sorensen wrote:
> 
> > Thanks Remo..   To test this module that is the only code I'm using
> > right now...  I'm no longer grabbing a subsequence so it can't be
> > calculation error.  To test all I'm trying to do is read in sequences
> > from a fasta file and digest them.  It runs fine for a few hundred
> > sequences generating fragments as it should, then out of nowhere it
> > will run into the same error, but with different coordinates.
> >
> > Possibly this module isn't working properly for me?
> >
> >  ------------- EXCEPTION  -------------
> >  MSG: Bad start,end parameters. Start [2002] has to be less than end
> >  [2001]
> > STACK Bio::PrimarySeq::subseq
> > /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:362
> > STACK Bio::Seq::subseq
> >  /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:636
> > STACK Bio::Restriction::Analysis::fragment_maps
> >  /home/garrett/Perl/lib/perl5/site_perl/5.8.5/Bio/Restriction/
> >  Analysis.pm:552
> >  STACK toplevel Restriction_analyser_multi_CpG_a.pl:182
> 
> OK,
> 
> it seems to be a bug, you should submit it to
> http://bugzilla.bioperl.org/enter_bug.cgi?product=Bioperl
> 
> Basically it happens when you have a site for a blunt-end cutter
> at the end of your sequence.
> 
> I think you are not interested in that site because
> is a cut at the end of a non-circular sequence that
> don' t produce fragments....
> 
> If so you can simply change line 552 in your Analysis.pm
> module from this:
> 
> $seq{$start}=$self->{'_seq'}->subseq($start, $stop);
> 
> to this:
> 
> $seq{$start}=$self->{'_seq'}->subseq($start, $stop) unless $start >
> $stop;
> 
> HTH
> 
> Remo
> 
>
From allenday at ucla.edu  Wed Jan 26 18:01:28 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 17:57:22 2005
Subject: [Bioperl-l] Re: RPMs for bioperl
In-Reply-To: <56220C54-6FD6-11D9-9E4D-000A95AE92B0@gnf.org>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<56220C54-6FD6-11D9-9E4D-000A95AE92B0@gnf.org>
Message-ID: <Pine.LNX.4.58.0501261457360.6354@sumo.ctrl.ucla.edu>

Hilmar,

I see that these are only in the bioperl-db cvs, not in the 0.1 tarball 
here: http://www.bioperl.org/Core/Latest/index.shtml .

Can we increment the version in the bioperl-db repository to 0.2 so I can 
RPM this?

-Allen


and it did not contain either of these packages

On Wed, 26 Jan 2005, Hilmar Lapp wrote:

> 
> On Jan 25, 2005, at 10:27 PM, Allen Day wrote:
> 
> > Hilmar, do you know about:
> >
> > * Bio::DB::BioDB
> > * Bio::DB::Query::BioQuery
> 
> These come with (are modules in) bioperl-db. If you have bioperl-db the 
> dependency should be satisfied.
> 
> 	-hilmar
> 
From allenday at ucla.edu  Wed Jan 26 18:03:25 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 17:59:18 2005
Subject: [Bioperl-l] Re: [GMOD-devel] RPMs for bioperl
In-Reply-To: <84383400f40db30e628067a6867d4c75@duke.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<84383400f40db30e628067a6867d4c75@duke.edu>
Message-ID: <Pine.LNX.4.58.0501261501530.6354@sumo.ctrl.ucla.edu>

i can do that, but i'd rather just include all optional modules are 
prerequisites rather than making a custom specfile.  any idea where i can 
find srsperl?  a google search didn't turn anything up.

-allen

On Wed, 26 Jan 2005, Jason Stajich wrote:

> srsperl is only for people with srs you should have it be ignored.
> 
> if you are using cpan2rpm I would tell it to ignore certain 
> dependancies. When I built rpms for our internal machines I just had it 
> ignore the non-essential ones like Ace, etc.
> 
> -jason
> On Jan 26, 2005, at 1:27 AM, Allen Day wrote:
> 
> > Hi,
> >
> > I've put together a set of RPMs for Bioperl, Bioperl-DB, Bioperl-Run, 
> > and
> > GBrowse.  It's still a work in progress, but you can see the current 
> > state
> > here:
> >   http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/
> > There are some related directories rooted here:
> >   http://sumo.genetics.ucla.edu/~allenday/flute/
> >
> > The RPMs don't install clean.  This is because I'm using an automated 
> > tool
> > to build the RPMs, and it looks through each downloaded tarball from 
> > CPAN
> > to see what that tarball depends on.  Sometimes there are dependencies 
> > on
> > libraries that don't exist on CPAN, or might be altogether 
> > non-existent.
> > These are the problem libraries and binaries:
> >
> > % rpm -Uvh --test *.rpm
> > error: Failed dependencies:
> >         perl(Ace::Browser::LocalSiteDefs) is needed by 
> > perl-AcePerl-1.87-allenday
> >         perl(Bio::Das::ProServer::SourceHydra) is needed by 
> > perl-Bio-Das-0.99-allenday
> >         perl(IndexSupport) is needed by perl-Bio-Das-0.99-allenday
> >         perl(srsperl) is needed by perl-bioperl-1.5.0-allenday
> >         perl(Bio::DB::BioDB) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(Bio::DB::Query::BioQuery) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(GuessDirectories) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(MOBY::Client::Central) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(MOBY::Client::Service) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(MOBY::CommonSubs) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(MOBY::MobyXMLConstants) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(PPM::Archive) is needed by 
> > perl-Generic-Genome-Browser-1.62-allenday
> >         perl(MQClient::MQSeries) is needed by 
> > perl-SOAP-Lite-0.60-allenday
> >         perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday
> >         perl(MQSeries::Message) is needed by 
> > perl-SOAP-Lite-0.60-allenday
> >         perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday
> >         perl(MQSeries::QueueManager) is needed by 
> > perl-SOAP-Lite-0.60-allenday
> >         /bin/perl is needed by perl-Tk-804.027-allenday
> >         /usr/local/bin/perl is needed by perl-Tk-804.027-allenday
> >         perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday
> >         perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday
> >         perl(XML::LibXML) >= 1.57 is needed by 
> > perl-XML-LibXSLT-1.57-allenday
> >         perl(XML::SAX::PurePerl::DTDDecls) is needed by 
> > perl-XML-SAX-0.12-allenday
> >         perl(XML::SAX::PurePerl::DocType) is needed by 
> > perl-XML-SAX-0.12-allenday
> >         perl(XML::SAX::PurePerl::EncodingDetect) is needed by 
> > perl-XML-SAX-0.12-allenday
> >         perl(XML::SAX::PurePerl::XMLDecl) is needed by 
> > perl-XML-SAX-0.12-allenday
> >
> > Lincoln, I'm guessing you can help me with:
> >
> > * Ace::Browser::LocalSiteDefs
> > * Bio::Das::ProServer::SourceHydra
> > * GuessDirectories
> > * IndexSupport
> > * MOBY::*
> >
> > Hilmar, do you know about:
> >
> > * Bio::DB::BioDB
> > * Bio::DB::Query::BioQuery
> >
> > I'm sure someone on this list knows where to get
> >
> > * srsperl
> > * PPM::Archive
> >
> > If anyone can shed light on where any of these libraries can be found, 
> > I'd
> > appreciate it.  Thanks.
> >
> > -Allen
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> > Tool for open source databases. Create drag-&-drop reports. Save time
> > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> > Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> > _______________________________________________
> > Gmod-devel mailing list
> > Gmod-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-devel
> >
> >
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
From hlapp at gnf.org  Wed Jan 26 18:16:54 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Wed Jan 26 18:13:01 2005
Subject: [Bioperl-l] Re: RPMs for bioperl
In-Reply-To: <Pine.LNX.4.58.0501261457360.6354@sumo.ctrl.ucla.edu>
Message-ID: <5DE987EC-6FF0-11D9-834C-000A959EB4C4@gnf.org>

The 0.1 tarball is out-dated since a long time.

Do you want me to introduce a specific tag?

	-hilmar

On Wednesday, January 26, 2005, at 03:01  PM, Allen Day wrote:

> Hilmar,
>
> I see that these are only in the bioperl-db cvs, not in the 0.1 tarball
> here: http://www.bioperl.org/Core/Latest/index.shtml .
>
> Can we increment the version in the bioperl-db repository to 0.2 so I 
> can
> RPM this?
>
> -Allen
>
>
> and it did not contain either of these packages
>
> On Wed, 26 Jan 2005, Hilmar Lapp wrote:
>
>>
>> On Jan 25, 2005, at 10:27 PM, Allen Day wrote:
>>
>>> Hilmar, do you know about:
>>>
>>> * Bio::DB::BioDB
>>> * Bio::DB::Query::BioQuery
>>
>> These come with (are modules in) bioperl-db. If you have bioperl-db 
>> the
>> dependency should be satisfied.
>>
>> 	-hilmar
>>
>>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From allenday at ucla.edu  Wed Jan 26 18:53:10 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 18:49:06 2005
Subject: [Bioperl-l] RPMs for bioperl (... and GBrowse, and lsid-perl,
	and biomoby)
In-Reply-To: <200501261643.10657.lstein@cshl.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<200501261643.10657.lstein@cshl.edu>
Message-ID: <Pine.LNX.4.58.0501261503540.6354@sumo.ctrl.ucla.edu>

On Wed, 26 Jan 2005, Lincoln Stein wrote:

> I'm glad you're doing this.
> 
> Can you simply turn off the warning messages about 
> Ace::Browser::LocalSiteDefs, Bio::Das::ProServer::SourceHydra, and 
> MOBY::*?  They are all optional and won't do anything useful without 
> a lot of extra configuration.

I could do that by creating the specfiles by hand, but I'd rather not do
this.  Is it difficult to just add the Bio::* and Ace::* modules to their
respective CPAN modules?

==========
% grep -r 'Hydra' ./Bio-Das-0.99/*
./Bio-Das-0.99/Das/ProServer/Config.pm:use Bio::Das::ProServer::SourceHydra;
./Bio-Das-0.99/Das/ProServer/Config.pm:# build all known SourceAdaptors (including those Hydra-based)
./Bio-Das-0.99/Das/ProServer/Config.pm:# build SourceHydra for a given dsn/hydraname
./Bio-Das-0.99/Das/ProServer/Config.pm:    my $hydraimpl = "Bio::Das::ProServer::SourceHydra::".$self->{'adaptors'}->{$hydraname}->{'hydra'};
==========

Missing SourceHydra.pm

==========
% grep -r 'Ace::Browser::LocalSiteDefs' ./AcePerl-1.87/*
./AcePerl-1.87/Ace/Browser/SiteDefs.pm:use Ace::Browser::LocalSiteDefs '$SITE_DEFS';
./AcePerl-1.87/acebrowser/conf/moviedb.pm:use Ace::Browser::LocalSiteDefs '$HTML_PATH';
./AcePerl-1.87/acebrowser/conf/default.pm:use Ace::Browser::LocalSiteDefs '$HTML_PATH';
./AcePerl-1.87/acebrowser/conf/simple.pm:use Ace::Browser::LocalSiteDefs '$HTML_PATH';
./AcePerl-1.87/Makefile.PL:  eval 'use Ace::Browser::LocalSiteDefs qw($SITE_DEFS $CGI_PATH $HTML_PATH)';
./AcePerl-1.87/Makefile.PL:package Ace::Browser::LocalSiteDefs;
./AcePerl-1.87/Makefile.PL:Ace::Browser::LocalSiteDefs - Master Configuration file for AceBrowser
./AcePerl-1.87/Makefile.PL: use Ace::Browser::LocalSiteDefs qw($SITE_DEFS $HTML_PATH $CGI_PATH);
./AcePerl-1.87/README.ACEBROWSER:Ace::Browser::LocalSiteDefs, typically somewhere inside the
./AcePerl-1.87/README.ACEBROWSER:  perl -MAce::Browser::LocalSiteDefs \
./AcePerl-1.87/README.ACEBROWSER:       -e 'print $Ace::Browser::LocalSiteDefs::SITE_DEFS,"\n"'
==========

LocalSiteDefs.pm does't exist, because package Ace::Browser::LocalSiteDefs
is, interestingly, defined in Makefile.PL and not installed.  Can we make
this a separate file, or at least define it in a file that is installed?

MOBY::*

If I fetch the biomoby and lsid-perl tarballs here: 
http://biomoby.org/releases/, and here: 
http://www-124.ibm.com/developerworks/oss/lsid/reference/tutorials/100/

I can resolve most of the MOBY::* requirements with this file.  These are
what's left:

  perl(MOBY::lsid::authority::dbConfigure) is needed by perl-biomoby-0.8.1-allenday
  perl(MOBY::MobyXMLConstants) is needed by perl-Generic-Genome-Browser-1.62-allenday

Is it possible to get releases onto CPAN of both biomoby and lsid-perl,
and to add these two MOBY files to MOBY?

Win32::Registry

This was introduced by lsid-perl by way of a Net::DNS requirement.  I'm
not sure what to do here.  Looks like I may need to handcode a specfile
here...

PPM::Archive

Where can I get this?  Looks like I may need to handcode a specfile
here...


> GuessDirectories.pm is a GBrowse install utility that is part of the 
> package.  I don't know why the RPM tool is complaining about it.

i think it has to do with the way the rpm build process identifies what
modules are "use"d.  from what i surmise, two lists are included in the
RPM, (1) a list of modules the package depends on, and (2) a list of
modules the package provides.  So in the case of Gbrowse's dependency on 
GuessDirectories, my guess is that it isn't correctly detecting 
GuessDirectories is provided by the package.  A quick look in the 
Generic-Genome-Browser checkout shows the module is referenced in a few 
places, but not actually included in the distribution:

==========
% grep -r 'GuessDirectories' Generic-Genome-Browser/*
Generic-Genome-Browser/install_util/CVS/Entries:/GuessDirectories.pm/1.1/Sun Jun  8 23:29:48 2003//Generic-Genome-Browser/Makefile.PL:use GuessDirectories;
Generic-Genome-Browser/Makefile.PL:  $OPTIONS{CONF} = GuessDirectories->conf || "$OPTIONS{APACHE}/conf";
Generic-Genome-Browser/Makefile.PL:  $OPTIONS{HTDOCS} = GuessDirectories->htdocs || "$OPTIONS{APACHE}/htdocs";
Generic-Genome-Browser/Makefile.PL:  $OPTIONS{CGIBIN} = GuessDirectories->cgibin || "$OPTIONS{APACHE}/cgi-bin";
Generic-Genome-Browser/MANIFEST:install_util/GuessDirectories.pm
==========

notice that there isn't a file containing 'package GuessDirectories;'.  
can you please add install_util/GuessDirectories.pm to the repository?

> I don't know about  IndexSupport.  What is it?

A download of Bio::Das 0.99 from: http://search.cpan.org/~lds/Bio-Das-0.99/
reveals:

% grep -r IndexSupport Bio-Das-0.99
Bio-Das-0.99/Das/ProServer/SourceAdaptor/haplotype.pm:use IndexSupport;
Bio-Das-0.99/Das/ProServer/SourceAdaptor/haplotype.pm:     my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens');
Bio-Das-0.99/Das/ProServer/SourceAdaptor/snp.pm:use IndexSupport;Bio-Das-0.99/Das/ProServer/SourceAdaptor/snp.pm:     my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens');
Bio-Das-0.99/Das/ProServer/SourceAdaptor/sts.pm:use IndexSupport;
Bio-Das-0.99/Das/ProServer/SourceAdaptor/sts.pm:     my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens');
Bio-Das-0.99/Das/ProServer/SourceAdaptor/trace.pm:use IndexSupport;
Bio-Das-0.99/Das/ProServer/SourceAdaptor/trace.pm:     my $conf = IndexSupport->new("$root/conf",'','Homo_sapiens');

notice that there isn't a file containing 'package IndexSupport;'.  can
you please add IndexSupport.pm to the CPAN released module?  
interestingly, there is no reference to IndexSupport in the open-bio das 
repository.

-allen

> 
> Lincoln
> 
> On Wednesday 26 January 2005 01:27 am, Allen Day wrote:
> > Hi,
> >
> > I've put together a set of RPMs for Bioperl, Bioperl-DB,
> > Bioperl-Run, and GBrowse.  It's still a work in progress, but you
> > can see the current state here:
> >   http://sumo.genetics.ucla.edu/~allenday/flute/bioperl-1.5/i686/
> > There are some related directories rooted here:
> >   http://sumo.genetics.ucla.edu/~allenday/flute/
> >
> > The RPMs don't install clean.  This is because I'm using an
> > automated tool to build the RPMs, and it looks through each
> > downloaded tarball from CPAN to see what that tarball depends on. 
> > Sometimes there are dependencies on libraries that don't exist on
> > CPAN, or might be altogether non-existent. These are the problem
> > libraries and binaries:
> >
> > % rpm -Uvh --test *.rpm
> > error: Failed dependencies:
> >         perl(Ace::Browser::LocalSiteDefs) is needed by
> > perl-AcePerl-1.87-allenday perl(Bio::Das::ProServer::SourceHydra)
> > is needed by perl-Bio-Das-0.99-allenday perl(IndexSupport) is
> > needed by perl-Bio-Das-0.99-allenday perl(srsperl) is needed by
> > perl-bioperl-1.5.0-allenday perl(Bio::DB::BioDB) is needed by
> > perl-Generic-Genome-Browser-1.62-allenday
> > perl(Bio::DB::Query::BioQuery) is needed by
> > perl-Generic-Genome-Browser-1.62-allenday perl(GuessDirectories) is
> > needed by perl-Generic-Genome-Browser-1.62-allenday
> > perl(MOBY::Client::Central) is needed by
> > perl-Generic-Genome-Browser-1.62-allenday
> > perl(MOBY::Client::Service) is needed by
> > perl-Generic-Genome-Browser-1.62-allenday perl(MOBY::CommonSubs) is
> > needed by perl-Generic-Genome-Browser-1.62-allenday
> > perl(MOBY::MobyXMLConstants) is needed by
> > perl-Generic-Genome-Browser-1.62-allenday perl(PPM::Archive) is
> > needed by perl-Generic-Genome-Browser-1.62-allenday
> > perl(MQClient::MQSeries) is needed by perl-SOAP-Lite-0.60-allenday
> > perl(MQSeries) is needed by perl-SOAP-Lite-0.60-allenday
> > perl(MQSeries::Message) is needed by perl-SOAP-Lite-0.60-allenday
> > perl(MQSeries::Queue) is needed by perl-SOAP-Lite-0.60-allenday
> > perl(MQSeries::QueueManager) is needed by
> > perl-SOAP-Lite-0.60-allenday /bin/perl is needed by
> > perl-Tk-804.027-allenday
> >         /usr/local/bin/perl is needed by perl-Tk-804.027-allenday
> >         perl(Tk::LabRadio) is needed by perl-Tk-804.027-allenday
> >         perl(Tk::TextReindex) is needed by perl-Tk-804.027-allenday
> >         perl(XML::LibXML) >= 1.57 is needed by
> > perl-XML-LibXSLT-1.57-allenday perl(XML::SAX::PurePerl::DTDDecls)
> > is needed by perl-XML-SAX-0.12-allenday
> > perl(XML::SAX::PurePerl::DocType) is needed by
> > perl-XML-SAX-0.12-allenday perl(XML::SAX::PurePerl::EncodingDetect)
> > is needed by perl-XML-SAX-0.12-allenday
> > perl(XML::SAX::PurePerl::XMLDecl) is needed by
> > perl-XML-SAX-0.12-allenday
> >
> > Lincoln, I'm guessing you can help me with:
> >
> > * Ace::Browser::LocalSiteDefs
> > * Bio::Das::ProServer::SourceHydra
> > * GuessDirectories
> > * IndexSupport
> > * MOBY::*
> >
> > Hilmar, do you know about:
> >
> > * Bio::DB::BioDB
> > * Bio::DB::Query::BioQuery
> >
> > I'm sure someone on this list knows where to get
> >
> > * srsperl
> > * PPM::Archive
> >
> > If anyone can shed light on where any of these libraries can be
> > found, I'd appreciate it.  Thanks.
> >
> > -Allen
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: IntelliVIEW -- Interactive
> > Reporting Tool for open source databases. Create drag-&-drop
> > reports. Save time by over 75%! Publish reports on the web. Export
> > to DOC, XLS, RTF, etc. Download a FREE copy at
> > http://www.intelliview.com/go/osdn_nl
> > _______________________________________________
> > Gmod-devel mailing list
> > Gmod-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-devel
> 
> 
From allenday at ucla.edu  Wed Jan 26 18:57:25 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 18:53:15 2005
Subject: [Bioperl-l] Re: RPMs for bioperl
In-Reply-To: <5DE987EC-6FF0-11D9-834C-000A959EB4C4@gnf.org>
References: <5DE987EC-6FF0-11D9-834C-000A959EB4C4@gnf.org>
Message-ID: <Pine.LNX.4.58.0501261553310.6354@sumo.ctrl.ucla.edu>

Whatever works for you.  My main aim is to have a tarball that contains
the missing packages.  Ideally this will be available in the form of a
bioperl-db release on CPAN to make the dependency automatically
resolveable, but I'd settle for a tarball at http://www.bioperl.org

The version number and/or tag don't really matter to me, as long as I have
something downloadable.

Thanks.

-Allen


On Wed, 26 Jan 2005, Hilmar Lapp wrote:

> The 0.1 tarball is out-dated since a long time.
> 
> Do you want me to introduce a specific tag?
> 
> 	-hilmar
> 
> On Wednesday, January 26, 2005, at 03:01  PM, Allen Day wrote:
> 
> > Hilmar,
> >
> > I see that these are only in the bioperl-db cvs, not in the 0.1 tarball
> > here: http://www.bioperl.org/Core/Latest/index.shtml .
> >
> > Can we increment the version in the bioperl-db repository to 0.2 so I 
> > can
> > RPM this?
> >
> > -Allen
> >
> >
> > and it did not contain either of these packages
> >
> > On Wed, 26 Jan 2005, Hilmar Lapp wrote:
> >
> >>
> >> On Jan 25, 2005, at 10:27 PM, Allen Day wrote:
> >>
> >>> Hilmar, do you know about:
> >>>
> >>> * Bio::DB::BioDB
> >>> * Bio::DB::Query::BioQuery
> >>
> >> These come with (are modules in) bioperl-db. If you have bioperl-db 
> >> the
> >> dependency should be satisfied.
> >>
> >> 	-hilmar
> >>
> >>
> 
From allenday at ucla.edu  Wed Jan 26 18:58:42 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 18:54:34 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <Pine.LNX.4.58.0501261150450.6354@sumo.ctrl.ucla.edu>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
	<9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
	<20050126192311.GA13534@bioinf.igc.gulbenkian.pt>
	<Pine.LNX.4.58.0501261150450.6354@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0501261558130.6354@sumo.ctrl.ucla.edu>

maybe this is worth considering for the bioperl-(ng/noveau/2.0) project, 
which doesn't yet have a home?

On Wed, 26 Jan 2005, Allen Day wrote:

> i really like having projects on sourceforge for all the reasons mentioned 
> in this thread.  i'd try to use a sourceforge or gforge site, if it was 
> available.
> 
> -allen
> 
> 
> On Wed, 26 Jan 2005, Paulo Almeida wrote:
> 
> > You might want to consider gforge (http://gforge.org), which was created
> > as a branch of the Sourceforge code, when that ceased to be Open Source.
> > That could bring SourceForge's features to the existing infrastructure,
> > giving you the best of both worlds. I never used it, but I wouldn't mind
> > finding out how it works, if you would be interested.
> > 
> > -Paulo
> > 
> > On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote:
> > > 
> > > If we could get content-management and RSS feeds to be easy to update 
> > > and edit that might make sense.  If we moved a majority of the web site 
> > > over to something like moveable-type.  This is what in fact the 
> > > biopython.org site is now done with and how the news.open-bio.org site 
> > > is run.
> > > 
> > > These are good thoughts - as always it take some energy and time to put 
> > > into place a new system.  We really welcome anyone trying to make this 
> > > a better system.  At some level it is hard for the core developers to 
> > > be project managers, developers, and system administrators.  So any 
> > > help is really much appreciated.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From allenday at ucla.edu  Wed Jan 26 18:58:42 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed Jan 26 18:54:40 2005
Subject: [Bioperl-l] bioperl development
In-Reply-To: <Pine.LNX.4.58.0501261150450.6354@sumo.ctrl.ucla.edu>
References: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAnReVWRFSMk6W+1SLQWMNKgEAAAAA@ukonline.co.uk>
	<9053D49D-6FC3-11D9-BF87-000393C44276@duke.edu>
	<20050126192311.GA13534@bioinf.igc.gulbenkian.pt>
	<Pine.LNX.4.58.0501261150450.6354@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0501261558130.6354@sumo.ctrl.ucla.edu>

maybe this is worth considering for the bioperl-(ng/noveau/2.0) project, 
which doesn't yet have a home?

On Wed, 26 Jan 2005, Allen Day wrote:

> i really like having projects on sourceforge for all the reasons mentioned 
> in this thread.  i'd try to use a sourceforge or gforge site, if it was 
> available.
> 
> -allen
> 
> 
> On Wed, 26 Jan 2005, Paulo Almeida wrote:
> 
> > You might want to consider gforge (http://gforge.org), which was created
> > as a branch of the Sourceforge code, when that ceased to be Open Source.
> > That could bring SourceForge's features to the existing infrastructure,
> > giving you the best of both worlds. I never used it, but I wouldn't mind
> > finding out how it works, if you would be interested.
> > 
> > -Paulo
> > 
> > On Wed, Jan 26, 2005 at 12:56:11PM -0500, Jason Stajich wrote:
> > > 
> > > If we could get content-management and RSS feeds to be easy to update 
> > > and edit that might make sense.  If we moved a majority of the web site 
> > > over to something like moveable-type.  This is what in fact the 
> > > biopython.org site is now done with and how the news.open-bio.org site 
> > > is run.
> > > 
> > > These are good thoughts - as always it take some energy and time to put 
> > > into place a new system.  We really welcome anyone trying to make this 
> > > a better system.  At some level it is hard for the core developers to 
> > > be project managers, developers, and system administrators.  So any 
> > > help is really much appreciated.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From perlguy at hotmail.com  Wed Jan 26 19:47:11 2005
From: perlguy at hotmail.com (Philip Parker)
Date: Wed Jan 26 19:43:59 2005
Subject: [Bioperl-l] Interested in helping...
Message-ID: <BAY13-F26EDCF6F85CAC7C1E88974AF780@phx.gbl>

My name is Philip and I'm interested in doing work with BioPerl but I
am not a bioinformatician or biologist. I *am* interested in helping
and learning in the process. I do have past professional experience with
Perl.

Philip Parker -  perlguy ~at- hotmail.com


From farid at vt.edu  Wed Jan 26 20:03:28 2005
From: farid at vt.edu (Merchant Farid)
Date: Wed Jan 26 19:59:26 2005
Subject: [Bioperl-l] Automate Fasta34.exe
In-Reply-To: <Pine.LNX.4.58.0501261558130.6354@sumo.ctrl.ucla.edu>
Message-ID: <000001c5040c$02c71f90$23af52c6@Merchant>

Hi guys.

I am trying to find the exhaustive homologous match of a given sequence
against a given library. 

I run the fasta34.exe from my perl script,input the sequence & library
file name which is stored in the perl folder and other input value.i got
a output in the fasta format, which I parse thru and get the best
sequence match above a given threshold value and store the match each at
a time in a file. Now use this extracted sequence file to run the fasta
again against the same library and inputs and try to call the fasta34
again and repeat the same procedure till u get the homologous match. If
u find any new sequence comparded to the original one, append to the
original fasta file.

Using my code I am able to extract the sequence from the orginal file
which fits the criteria above a threshold but callign the fasta34.exe
would means that I have to sequence file name manually each time the
fasta34.exe is called

Can anybody please help me to solve the problem

Following is my code

#!usr/bin/perl -w

print"\n********Running Fasta34**********  \n \n";
print"\n Please enter the file sequence and library present in your path
\n\n";
$result = system("c:/perl/sam/fasta34.exe");
print"\n\n*****enter the output file name you have given***\n";
$output = <STDIN>;
open(FASTA,"c:/perl/sam/$output") or die "cant open the output file \n";


my $seqname;
my $iteration;
print "Enter your cut off percentage";
$cut = <STDIN>;
while (<FASTA>)
{
	#compare the line of matching sequence 
	if(m/>>(.{4,6})(.*)/)
	{
		$seqname = $1;
		$laterhalf = $2;
	}
	#print "SKIP A LINE iF A MATCH \n";
	next if /^ini/;
	if (m/(\d+\.\d+)% identity/) 
	{
		$per = $1;
		#check if match is above the cutoff percentage  
		if ($per > $cut)
		{
			#print "\n\n$seqname$laterhalf \n";\
			#print "identity match $per % \n";
			#store the first line of the input file 
			open(ORG,">>c://perl/sam/orginal.fasta");
			open(OUT2IN,">c:/perl/sam/output1.aa");
			print OUT2IN "$seqname$laterhalf \n";
			print ORG "$seqname$laterhalf \n";
			while(<FASTA>)
			{
				if (m/^($seqname)(.*)/)
				{
					#store the match sequence
					print OUT2IN "$2 \n";
					print ORG "$2 \n";
				}
				if(m/>>/)
				{
					seek(FASTA,-100,1);
					last;
				}
			}close(OUT2IN);
			close(ORG);
		}
	}
}
close (FASTA);
print $result;


From brian_osborne at cognia.com  Wed Jan 26 21:20:53 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Jan 26 21:20:24 2005
Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat
	inSwissProtfile
In-Reply-To: <Pine.OSX.4.58.0501210916130.17957@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <GAEDKMGOKFBLJPKCLKCCIEJHEIAA.brian_osborne@cognia.com>

Chris and Kenny,

Bio::Index::Swissprot has an id_parser() method now but the uniqueness of
the key will be a concern, yes.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Chris Mungall
Sent: Friday, January 21, 2005 12:33 PM
To: Brian Osborne
Cc: Daily, Kenneth Michael; bioperl-l@portal.open-bio.org
Subject: RE: [Bioperl-l] Reading all sequences using Bio::DB::Flat
inSwissProtfile


Brian,

Unfortunately the id_parser method isn't supported in
Bio::Index::Swissprot

Even if it was I don't think it would be sufficient here - Kenny needs to
index using the feature fields. This implies that the search key wouldn't
be unique. Bio::Index::Abstract requires a unique key for the index.

Flexible indexing and retrieval such as this is best handled using some
generic non-bioperl specific solution - RDB, XMLDB, SRS, Lucene, LuceGene
etc

I forgot to mention Don Gilbert's LuceGene in my original reply - it's a
fairly sane open-source alternative to SRS. It handles lots of
bioinformatics file formats (not sure about swissprot but I'm sure it
could be added)

See:
http://www.gmod.org/lucegene/index.shtml

Cheers
Chris

On Fri, 21 Jan 2005, Brian Osborne wrote:

> Kenny,
>
> Did you take a look at Bio/Index/Swissprot.pm? What's important for you
will
> be building the index using the keys you're interested in as opposed to
the
> default key, using the id_parser method. See the Bio::Index section in the
> bptutorial for an example.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Daily,
> Kenneth Michael
> Sent: Wednesday, January 19, 2005 11:49 AM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] Reading all sequences using Bio::DB::Flat in
> SwissProtfile
>
>
> I want to work with a local copy of the SwissProt database, and need to
> search through all of the entries. I only see methods to return sequences
by
> accession. However, I cannot use just FASTA format of the SwissProt
records,
> as I need to use the feature fields. What I need to learn is how to do a
DB
> search on the features field of the SwissProt records, if its possible.
> Would there be any advantage do doing it with the DB instead of just using
> SeqIO as an input stream? I think it might, since every time I want to do
a
> search I must read in the entire file again, which is very costly. Thank
> you.
>
> Kenny Daily
> Indiana University
> School of Informatics
> kmdaily [at] indiana [dot] edu
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From d.humphreys at victorchang.unsw.edu.au  Wed Jan 26 20:55:12 2005
From: d.humphreys at victorchang.unsw.edu.au (David Humphreys)
Date: Wed Jan 26 23:58:05 2005
Subject: [Bioperl-l] Help can't solve an internal 500 error
Message-ID: <a05100300be1df9f50339@[129.94.224.210]>

Hi bioperl-groovers,

Has anyone ever seen the following error message?


MSG: WebDBSeqI Request error:
500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp')

I have traced previous threads from the archives but they appear to 
be slightly different internal 500 error. What is confusing me the 
most is that the script that leaves this error worked perfectly on my 
older machine (running win XP) but not on my colleagues machine (also 
running XP). The errors cascade from deep within bioperl and I have a 
feeling it has something to do with the way the machine is setup 
rather than the scripts themselves. Any ideas?

thanks in advance

Dave
From nathanhaigh at ukonline.co.uk  Thu Jan 27 03:36:22 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Thu Jan 27 03:32:44 2005
Subject: [Bioperl-l] Help can't solve an internal 500 error
In-Reply-To: <a05100300be1df9f50339@[129.94.224.210]>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAVzW11oBPD0m0XHxqA1q5HwEAAAAA@ukonline.co.uk>

Would you be able to supply the script that produces this error, so that we may be able to reproduce the error.

Nathan

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of David Humphreys
> Sent: 27 January 2005 01:55
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] Help can't solve an internal 500 error
> 
> Hi bioperl-groovers,
> 
> Has anyone ever seen the following error message?
> 
> 
> MSG: WebDBSeqI Request error:
> 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp')
> 
> I have traced previous threads from the archives but they appear to
> be slightly different internal 500 error. What is confusing me the
> most is that the script that leaves this error worked perfectly on my
> older machine (running win XP) but not on my colleagues machine (also
> running XP). The errors cascade from deep within bioperl and I have a
> feeling it has something to do with the way the machine is setup
> rather than the scripts themselves. Any ideas?
> 
> thanks in advance
> 
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-1, 27/01/2005
Tested on: 27/01/2005 08:36:21
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From palmeida at igc.gulbenkian.pt  Thu Jan 27 05:12:45 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Thu Jan 27 05:07:44 2005
Subject: [Bioperl-l] Help can't solve an internal 500 error
In-Reply-To: <a05100300be1df9f50339@[129.94.224.210]>
References: <a05100300be1df9f50339@[129.94.224.210]>
Message-ID: <20050127101245.GB13534@bioinf.igc.gulbenkian.pt>

It might be a problem with the tcp protocol in that computer. If that is the case, you
can try re-installing it as this page explains:

http://www.petri.co.il/reinstall_tcp_ip_on_windows_xp.htm

-Paulo

On Thu, Jan 27, 2005 at 12:55:12PM +1100, David Humphreys wrote:
> Hi bioperl-groovers,
> 
> Has anyone ever seen the following error message?
> 
> 
> MSG: WebDBSeqI Request error:
> 500 Can't connect to eutils.ncbi.nlm.nih.gov:80 (Bad protocol 'tcp')
> 
> I have traced previous threads from the archives but they appear to 
> be slightly different internal 500 error. What is confusing me the 
> most is that the script that leaves this error worked perfectly on my 
> older machine (running win XP) but not on my colleagues machine (also 
> running XP). The errors cascade from deep within bioperl and I have a 
> feeling it has something to do with the way the machine is setup 
> rather than the scripts themselves. Any ideas?
> 
> thanks in advance
> 
> Dave

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From davidg at lsi.upc.edu  Thu Jan 27 09:21:27 2005
From: davidg at lsi.upc.edu (=?iso-8859-1?Q?David_Garc=EDa_Cort=E9s?=)
Date: Thu Jan 27 09:18:14 2005
Subject: [Bioperl-l] BPpsilite possible bug?
Message-ID: <00c401c5047b$8058f810$fb1e5393@Davidg>

Hello.

I'm using BPpsilite to parse a PsiBlast results file, and I've noticed something strange that seems to be a bug. The thing is: it doesn't get the HSP length in some concrete cases, while it works correctly in others.

I obtain the HSP length this way:

while ( (my $sbjct =  $last_iteration ->nextSbjct) ) 
  {
       while (my $hsp = $sbjct->nextHSP) { 
         my $hlength  = $hsp->length;
         print "$hlength";
       }
 }

And it works fine for many cases, but in other ones it doesn't. I've seen that, when parsing result files where there are more than one sequence producing significant alignments versus the query sequence, everything works OK. But when there's only one sequence producing significant alignments, then it $hsp->length doesn't get the HSP size correctly.

For example, when parsing the results file I include at the end of this mail, the HSP lenghts are wrong. Is it a bug or am I doing something wrong?

Thanks in advance.

--
David Garc?a Cort?s
Instituto Nacional de Bioinform?tica (INB)
Nodo Computacional GNHC-2 UPC-CIRI
c/. Jordi Girona 1-3              
Modul C6-E201                   Tel.  : 934 011 650
E-08034 Barcelona               Fax   : 934 017 014
Catalunya (Spain)               e-mail: davidg@lsi.upc.edu


RESULTS FILE: 

***********************

BLASTP 2.2.6 [Apr-09-2003]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|18676612|dbj|BAB84958.1|
         (359 letters)

Database: nr-0.fa 
           175 sequences; 100,812 total letters

Searching.........done


Results from round 1


                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

dbj|BAB84958.1| FLJ00205 protein [Homo sapiens]                       710   0.0  

>dbj|BAB84958.1| FLJ00205 protein [Homo sapiens]
          Length = 359

 Score =  710 bits (1832), Expect = 0.0
 Identities = 359/359 (100%), Positives = 359/359 (100%)

Query: 1   LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60
           LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD
Sbjct: 1   LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60

Query: 61  GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120
           GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL
Sbjct: 61  GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120

Query: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180
           PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS
Sbjct: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180

Query: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240
           DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL
Sbjct: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240

Query: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300
           PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD
Sbjct: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300

Query: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359
           PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS
Sbjct: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359


Searching.........done


Results from round 2


                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value
Sequences used in model and found again:

dbj|BAB84958.1| FLJ00205 protein [Homo sapiens]                       758   0.0  

Sequences not found previously or not previously below threshold:


CONVERGED!
>dbj|BAB84958.1| FLJ00205 protein [Homo sapiens]
          Length = 359

 Score =  758 bits (1956), Expect = 0.0
 Identities = 359/359 (100%), Positives = 359/359 (100%)

Query: 1   LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60
           LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD
Sbjct: 1   LLQAVALVLAALVLLPNVGLWALYRERQPDGTPGGSGAAVAPAAGQGSHSRQKKTFFLGD 60

Query: 61  GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120
           GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL
Sbjct: 61  GQKLKDWHDKEAIRRDAQRVGNGEQGRPYPMTDAERVDQAYRENGFNIYVSDKISLNRSL 120

Query: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180
           PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS
Sbjct: 121 PDIRHPNCNSKRYLETLPNTSIIIPFHNEGWSSLLRTVHSVLNRSPPELVAEIVLVDDFS 180

Query: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240
           DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL
Sbjct: 181 DREHLKKPLEDYMALFPSVRILRTKKREGLIRTRMLGASVATGDVITFLDSHCEANVNWL 240

Query: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300
           PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD
Sbjct: 241 PPLLDRIARNRKTIVCPMIDVIDHDDFRYETQAGDAMRGAFDWEMYYKRIPIPPELQKAD 300

Query: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359
           PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS
Sbjct: 301 PSDPFESPVMAGGLFAVDRKWFWELGGYDPGLEIWGGEQYEISFKVSQLSRRPVLGTAS 359


  Database: nr-0.fa
    Posted date:  Jan 13, 2005  6:32 PM
  Number of letters in database: 100,812
  Number of sequences in database:  175
  
Lambda     K      H
   0.320    0.139    0.427 

Lambda     K      H
   0.267   0.0424    0.140 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 174,609
Number of Sequences: 175
Number of extensions: 8789
Number of successful extensions: 19
Number of sequences better than  1.0: 1
Number of HSP's better than  1.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 17
Number of HSP's gapped (non-prelim): 2
length of query: 359
length of database: 100,812
effective HSP length: 69
effective length of query: 290
effective length of database: 88,737
effective search space: 25733730
effective search space used: 25733730
T: 11
A: 40
X1: 16 ( 7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.8 bits)
S2: 53 (25.0 bits)

*********************************************
From jason.stajich at duke.edu  Thu Jan 27 18:13:51 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Jan 27 18:11:26 2005
Subject: [Bioperl-l] Extraction of Intergenic region
In-Reply-To: <6FCF17FF93748647A202BE44B651321A1297C6@EDENEVS1.asp.ad.uit.no>
References: <6FCF17FF93748647A202BE44B651321A1297C6@EDENEVS1.asp.ad.uit.no>
Message-ID: <45698441630d0e250855f9fc96df7ffc@duke.edu>

[bioperl-l is really the right list to post to]

There isn't exactly something that does this, but you can write a 
script to do this by parsing the sequence file with Bio::SeqIO and the 
coordinate file.
Have you tried to write the simple perl to do this yet.  You can do it 
pretty basically with the substr function.  I also have done it where I 
mask the coding sequence with 'N's first then use split to go back and 
extract all the non-N regions.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
On Jan 27, 2005, at 4:32 AM, Rafi Ahmad wrote:

> Hi everyone,
>
> I am new to BioPerl. Would like to know that is there a BioPerl code 
> that helps extract intergenic sequences in a genome, given a 
> coordinate file mentioning the start and stop position of all the 
> genes.
>
> Thanks for the help.
>
> Regards
>
> Rafi
>
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>

From barry.moore at genetics.utah.edu  Thu Jan 27 18:22:00 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu Jan 27 18:18:03 2005
Subject: [Bioperl-l] Re: load_seqdatabase.pl running SLOW!
In-Reply-To: <BB3BF516F698804298FE224B1F7A39814942C3@EXCHCLUSTER01.lj.gnf.org>
References: <BB3BF516F698804298FE224B1F7A39814942C3@EXCHCLUSTER01.lj.gnf.org>
Message-ID: <41F97798.6060203@genetics.utah.edu>

Hilmar-

Thanks for the suggestions.  Things are working smoothly now, but I'm 
not entirely sure why.  I stopped the slow running load_seqdatabase.pl 
process on the fast machine, built an identical biosql database under a 
different name, and began loading the same file into it.  This screamed 
along at 8-10 seq/sec.  I re-ran the load script into the old db - still 
slow.  I vacuumed the old db - still slow.  I dropped the old db and 
rebuilt it - now both load very fast.  I dropped both dbs and rebuilt 
just one, and it is now loading fine.  Go figure.  I send this to the 
list simply for the record in case it provides a clue to someone in the 
future with similar trouble.  I haven't got a clue.

Barry

Hilmar Lapp wrote:

>To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do).
> 
>I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ...
> 
>Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed.
> 
>Not sure this is very helpful ...
> 
> -hilmar
>
>	-----Original Message----- 
>	From: Barry Moore [mailto:barry.moore@genetics.utah.edu] 
>	Sent: Tue 1/25/2005 3:15 PM 
>	To: Bioperl list; Hilmar Lapp 
>	Cc: 
>	Subject: load_seqdatabase.pl running SLOW!
>	
>	
>
>	Hilmar (or others)-
>	
>	I've set up a biosql based database using PostgreSQL 7.2 on a PC with an
>	Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus.  1 GB of RAM, and
>	Linux (2.2 kernel - Debian woody distro).  Onto that I am loading
>	~352,000 sequences from RefSeq complete rna collection using
>	load_seqdatabase.pl.  It's running kind of slow - loding on average
>	about 1 sequence every 2-5 seconds.  In the archives I've read your
>	comments to a previous question like this suggesting two fast
>	processors, a couple gigs of memory and 2-3 drives to really make things
>	fly and while my system isn't that good, it seems like I should be doing
>	better.  I got to experimenting on another (slower) system while waiting
>	for things to load, and found that running the same script to load the
>	same file goes about 3X faster on a 266MHz Intel processor with 192 Mb
>	RAM.  Same installation of PostgreSQL (both installed from deb package
>	with defaults), and same installation of Debian Linux (except that the
>	kernel on the older slow machine has been updated to 2.4)  Another
>	difference I noticed between the two is that the old 266 MHz machine is
>	using about 75% CPU resources for perl and about 25% for postmaster
>	whereas the faster 3 GHz machine (but slower running
>	load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster
>	and about 3% for perl.  Both systems are using up most of their memory,
>	but little to no swap.  Could the kernel upgrade really be making the
>	difference?  Any thoughts?  As it's going now I can wait over a week for
>	all these sequences to load, or build the database on our dinosaur
>	server in a couple of days and dump it across to our sexy new 3 GHz
>	server.  Talk about bass ackwards!
>	
>	Barry
>	
>	--
>	Barry Moore
>	Dept. of Human Genetics
>	University of Utah
>	Salt Lake City, UT
>	
>	
>	
>	
>
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From hlapp at gnf.org  Thu Jan 27 20:10:43 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu Jan 27 20:07:44 2005
Subject: [Bioperl-l] RE: load_seqdatabase.pl running SLOW!
Message-ID: <BB3BF516F698804298FE224B1F7A39814942DE@EXCHCLUSTER01.lj.gnf.org>

Thanks for the update Barry. Great to hear it's working now for you. What you're looking at may really be a Postgres version issue. 7.2 has known problems and after the time 7.3 came out everybody was strongly urged to migrate to 7.3. 
 
All I can say. BTW if there's no package with higher version don't be scared of compiling from scratch. I built Pg on different platforms from Linux to MacOSX and it compiled like a charm on all of them.
 
-hilmar

	-----Original Message----- 
	From: Barry Moore [mailto:barry.moore@genetics.utah.edu] 
	Sent: Thu 1/27/2005 3:22 PM 
	To: Hilmar Lapp 
	Cc: Bioperl list 
	Subject: Re: load_seqdatabase.pl running SLOW!
	
	
	Hilmar-
	
	Thanks for the suggestions.  Things are working smoothly now, but I'm
	not entirely sure why.  I stopped the slow running load_seqdatabase.pl
	process on the fast machine, built an identical biosql database under a
	different name, and began loading the same file into it.  This screamed
	along at 8-10 seq/sec.  I re-ran the load script into the old db - still
	slow.  I vacuumed the old db - still slow.  I dropped the old db and
	rebuilt it - now both load very fast.  I dropped both dbs and rebuilt
	just one, and it is now loading fine.  Go figure.  I send this to the
	list simply for the record in case it provides a clue to someone in the
	future with similar trouble.  I haven't got a clue.
	
	Barry
	
	Hilmar Lapp wrote:
	
	>To be honest I've never loaded a large file into a Pg installation. The problem that I'd expect you to run into is that if you started with a fresh database the lookup queries will become slower and slower in the absence of the stats being recomputed on a frequent basis through vacuum (which the load script won't do).
	>
	>I believe in more recent releases you can actually vacuum the database concurrent to write access; not sure whether 7.2.x will allow this already. You should strongly consider upgrading to at least 7.3 if not 7.4 or even 8.x. The Pg developers may not even answer questions to 7.2 anymore ...
	>
	>Your obvservation that the slower machine with the later kernel would be faster leaves me puzzled. If blind-tested I would have suggested that the machine appearing faster has had the database vacuumed.
	>
	>Not sure this is very helpful ...
	>
	> -hilmar
	>
	>       -----Original Message-----
	>       From: Barry Moore [mailto:barry.moore@genetics.utah.edu]
	>       Sent: Tue 1/25/2005 3:15 PM
	>       To: Bioperl list; Hilmar Lapp
	>       Cc:
	>       Subject: load_seqdatabase.pl running SLOW!
	>      
	>      
	>
	>       Hilmar (or others)-
	>      
	>       I've set up a biosql based database using PostgreSQL 7.2 on a PC with an
	>       Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus.  1 GB of RAM, and
	>       Linux (2.2 kernel - Debian woody distro).  Onto that I am loading
	>       ~352,000 sequences from RefSeq complete rna collection using
	>       load_seqdatabase.pl.  It's running kind of slow - loding on average
	>       about 1 sequence every 2-5 seconds.  In the archives I've read your
	>       comments to a previous question like this suggesting two fast
	>       processors, a couple gigs of memory and 2-3 drives to really make things
	>       fly and while my system isn't that good, it seems like I should be doing
	>       better.  I got to experimenting on another (slower) system while waiting
	>       for things to load, and found that running the same script to load the
	>       same file goes about 3X faster on a 266MHz Intel processor with 192 Mb
	>       RAM.  Same installation of PostgreSQL (both installed from deb package
	>       with defaults), and same installation of Debian Linux (except that the
	>       kernel on the older slow machine has been updated to 2.4)  Another
	>       difference I noticed between the two is that the old 266 MHz machine is
	>       using about 75% CPU resources for perl and about 25% for postmaster
	>       whereas the faster 3 GHz machine (but slower running
	>       load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster
	>       and about 3% for perl.  Both systems are using up most of their memory,
	>       but little to no swap.  Could the kernel upgrade really be making the
	>       difference?  Any thoughts?  As it's going now I can wait over a week for
	>       all these sequences to load, or build the database on our dinosaur
	>       server in a couple of days and dump it across to our sexy new 3 GHz
	>       server.  Talk about bass ackwards!
	>      
	>       Barry
	>      
	>       --
	>       Barry Moore
	>       Dept. of Human Genetics
	>       University of Utah
	>       Salt Lake City, UT
	>      
	>      
	>      
	>      
	>
	> 
	>
	
	--
	Barry Moore
	Dept. of Human Genetics
	University of Utah
	Salt Lake City, UT
	
	
From allenday at ucla.edu  Thu Jan 27 23:48:51 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu Jan 27 23:45:02 2005
Subject: [Bioperl-l] RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>

SUMMARY:
========

I've successfully built RPMs for the GBrowse, Bioperl, and Bioperl-DB
dependency trees.  These are for CVS HEAD, not the most recent releases.  
I've only tested on Fedora Core 2 and RedHat 9.  Fedora Core 2 packages
are available here:

http://sumo.genetics.ucla.edu/~allenday/flute-fc2/i386/
http://sumo.genetics.ucla.edu/~allenday/flute-fc2/noarch/

The source RPMs are available as well

http://sumo.genetics.ucla.edu/~allenday/flute-fc2/SRPMS

A few notes on what non-obvious steps needed to be taken to make this 
work:

  * pruned Bioperl-DB to remove Oracle dependencies
  * pruned Gbrowse to remove AcePerl dependencies
  * pruned Bioperl to remove AcePerl dependencies
  * pruned SOAP::Lite to remove MQSeries dependencies
  * pruned Gbrowse to remove Mac OS X dependencies
  * pruned a few modules (Net::DNS, Mail::Sender, etc) to remove
    Win32::* dependencies.
  * piggybacked on existing RPMs for rrdtool, Template Toolkit, 
    Module-Build, and CPANPLUS.  Thanks to Dag Wieeers [1] for these.


TODO:
=====

The Bioperl install is fully functional as far as I can tell.

I'd appreciate it if someone with Oracle and DBD::Oracle installed could 
give Bioperl-DB a spin and verify that it works.

I'd also like someone with Oracle to help me make a DBD::Oracle rpm.  
Having a DBD::Oracle RPM will allow me to leave the Oracle code in
Bioperl-DB.

The Gbrowse install is slightly broken, but this is mainly due to a major 
rewrite that's taking place right now.  I'll make another announcement to 
the GMOD list when the Gbrowse RPM works out of the box.

-Allen

[1] http://dag.wieers.com
From nathanhaigh at ukonline.co.uk  Fri Jan 28 02:40:41 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Fri Jan 28 02:36:50 2005
Subject: [Bioperl-l] bioperl-1.5.0 PPM files
In-Reply-To: <F0EFADC3-6E79-11D9-9143-000393C44276@duke.edu>
Message-ID: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA+EsXQZcrCEGeBpZF7/IE7sKAAAAQAAAAV9+DvunsAUGFUc3QjtTHwwEAAAAA@ukonline.co.uk>

I've uploaded the ppd files for bioperl-1.5, unfortunately there isn't directory listing enabled, so here are the direct links:
http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl.ppd
http://web.ukonline.co.uk/nathanhaigh/bioperl/bioperl-1.5-ppm.tar.gz

Could someone copy these over to the http://bioperl.org/DIST/ directory; the existing bioperl.ppd file at http://bioperl.org/DIST/
being renamed to bioperl-1.2.ppd.

I've also uploaded the GD-SVG v0.25 files which aren't available at any of the repositories specified in point 1.3.2 of the
INSTALL.WIN file, and should be included in the http://bioperl.org/DIST/ directory to allow successful installation of bioperl-1.5.
If however, you think we shouldn't keep non-bioperl modules on the bioperl server, we should see about getting it added to one of
the repositories mentioned in point 1.3.2 of the INSTALL.WIN file. Again, the direct links are:
http://web.ukonline.co.uk/nathanhaigh/bioperl/GD-SVG.ppd
http://web.ukonline.co.uk/nathanhaigh/bioperl/GD-SVG-0.25-ppm.tar.gz

The MD5 checksums can be found at:
http://web.ukonline.co.uk/nathanhaigh/bioperl/md5.txt

If I can be of further assistance, please don't hesitate to ask.

Nathan


> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of Jason Stajich
> Sent: 25 January 2005 02:37
> To: 'bioperl-l@bioperl.org' List; bioperl-announce-l@bioperl.org
> Subject: [Bioperl-l] bioperl-1.5.0 released
> 
> Bioperl 1.5.0 Developer's release is available for download.
> ===============================================
> 
>   http://bioperl.org/DIST/bioperl-1.5.0.tar.bz2
> 425ac55ecbb4339b7b532ba6d429bb40
>   http://bioperl.org/DIST/bioperl-1.5.0.tar.gz
> 172472f0675de9a583432e21c9b1b5fc
>   http://bioperl.org/DIST/bioperl-1.5.0.zip
> 3febcd2445a7393c65981a6f9f13a9ed
> 
> We'll update the website to reflect this new release.
> 
> The odd-numbered releases are called developer releases and are not
> deposited on CPAN.  Please note that the API in 1.5.0 may change before
> the 1.6.0 release. which will be consider a stable API.  We may do
> another developer release before 1.6.0 goes out.
> 
> Lots of people have contributed to this release, I apologize for not
> naming them all.  I'll try to cover some: thanks to Aaron Mackey for
> getting this release started, Brian Osborne for extensive documentation
> improvements, Nathan Haigh for volunteering to make a PPM of the
> release and Barry Moore and Nathan answering many of the windows
> related questions, Allen Day & Scott Cain & Steffen Grossmann for the
> work on FeatureIO, GFF3, and SeqFeature::Annotated, Chris Mungall for
> the work with Unflattener to merge GenBank annotations into GFF3
> objects.
> 
> Please see the AUTHORS file for a complete list of contributors.
> 
> Jason Stajich on behalf of the Bioperl developers.
> 
> 
> Here is the info from the Changes file.
> 1.5 Developer release
> 
>      o Bio::Align::DNAStatistics and Bio::Align::ProteinStatistics
>        provide Jukes-Cantor and Kimura pairwise distance methods,
>        respectively.
> 
>      o Bio::AlignIO support for "po" format of POA, and "maf";
>        Bio::AlignIO::largemultifasta is a new alternative to
>        Bio::AlignIO::fasta for temporary file-based manipulation of
>        particularly large multiple sequence alignments.
> 
>      o Bio::Assembly::Singlet allows orphan, unassembled sequences to
>        be treated similarly as an assembled contig.
> 
>      o Bio::CodonUsage provides new rare_codon() and probable_codons()
>        methods for identifying particular codons that encode a given
>        amino acid.
> 
>      o Bio::Coordinate::Utils provides new from_align() method to build
>        a Bio::Coordinate pair directly from a
>        Bio::Align::AlignI-conforming object.
> 
>      o Bio::DB::Biblio::eutils is a class for querying NCBI's Eutils.
>        Send a Pubmed, Pubmed Central, Entrez, or other query to NCBI's
>        web service using standard Pubmed query syntax, and retrieve
>        results as XML.
> 
>      o Bio::DB::GFF has various sundry bug fixes.
> 
>      o Bio::FeatureIO is a new SeqIO-style subsystem for
>        writing/reading genomic features to/from files.  I/O classes
>        exist for BED, GTF (aka GFF v2.5), and GFF v3.  Bio::FeatureIO
>        classes only read/write Bio::SeqFeature::Annotated objects.
>        Notably, the GFF v3 class requires features to be typed into the
>        Sequence Ontology.
> 
>      o Bio::Graph namespace contains new modules for manipulation and
>        analysis of protein interaction graphs.
> 
>      o Bio::Graphics has many bug fixes and shiny new glyphs.
> 
>      o Bio::Index::Hmmer and Bio::Index::Qual provide multiple-file
>        indexing for HMMER reports and FASTA qual files, respectively.
> 
>      o Bio::Map::Clone, Bio::Map::Contig, and Bio::Map::FPCMarker are
>        new objects that can be placed within a Bio::Map::MapI-compliant
>        genetic/physical map; Bio::Map::Physical provides a new physical
>        map type; Bio::MapIO::fpc provides finger-printed clone mapping
>        import.
> 
>      o Bio::Matrix::PSM provide new support for postion-specific
>        (scoring) matrices (e.g. profiles, or "possums").
> 
>      o Bio::Ontology::Ontology and Bio::Ontology::Term objects can now
>        be instantiated without explicitly using Bio::OntologyIO.  This
>        is possible through changes to Bio::Ontology::OntologyStore to
>        download ontology files from the web as necessary.  Locations of
>        ontology files are hard-coded into
>        Bio::Ontology::DocumentRegistry.
> 
>      o Bio::PopGen includes many new methods and data types for
>        population genetics analyses.
> 
>      o New constructor to Bio::Range, unions().  Given a list of
>        ranges, returns another list of "flattened" ranges --
>        overlapping ranges are merged into a single range with the
>        mininum and maximum coordinates of the entire overlapping group.
> 
>      o Bio::Root::IO now supports -url, in addition to -file and -fh.
>        The new -url argument allows one to specify the network address
>        of a file for input.  -url currently only works for GET
>        requests, and thus is read-only.
> 
>      o Bio::SearchIO::hmmer now returns individual Hit objects for each
>        domain alignment (thus containing only one HSP); previously
>        separate alignments would be merged into one hit if the domain
>        involved in the alignments was the same, but this only worked
>        when the repeated domain occured without interruption by any
>        other domain, leading to a confusing mixture of Hit and HSP
>        objects.
> 
>      o Bio::Search::Result::ResultI-compliant report objects now
>        implement the "get_statistics" method to access
>        Bio::Search::StatisticsI objects that encapsulate any
>        statistical parameters associated with the search (e.g. Karlin's
>        lambda for BLAST/FASTA).
> 
>      o Bio::Seq::LargeLocatableSeq combines the functionality already
>        found in Bio::Seq::LargeSeq and Bio::LocatableSeq.
> 
>      o Bio::SeqFeature::Annotated is a replacement for
>        Bio::SeqFeature::Generic.  It breaks compliance with the
>        Bio::SeqFeatureI interface because the author was sick of
>        dealing with untyped annotation tags.  All
>        Bio::SeqFeature::Annotated annotations are Bio::AnnotationI
>        compliant, and accessible through Bio::Annotation::Collection.
> 
>      o Bio::SeqFeature::Primer implements a Tm() method for primer
>        melting point predictions.
> 
>      o Bio::SeqIO now supports AGAVE, BSML (via SAX), CHAOS-XML,
>        InterProScan-XML, TIGR-XML, and NCBI TinySeq formats.
> 
>      o Bio::Taxonomy::Node now implements the methods necessary for
>        Bio::Species interoperability.
> 
>      o Bio::Tools::CodonTable has new reverse_translate_all() and
>        make_iupac_string() methods.
> 
>      o Bio::Tools::dpAlign now provides sequence profile alignments.
> 
>      o Bio::Tools::GFF now parses GFF version 2.5 (a.k.a. GTF).
> 
>      o Bio::Tools::Fgenesh, Bio::Tools::tRNAscanSE are new report
>        parsers.
> 
>      o Bio::Tools::SiRNA includes two new rulesets (Saigo and Tuschl)
>        for designing small inhibitory RNA.
> 
>      o Bio::Tree::DistanceFactory provides NJ and UPGMA tree-building
>        methods based on a distance matrix.
> 
>      o Bio::Tree::Statistics provides an assess_bootstrap() method to
>        calculate bootstrap support values on a guide tree topology,
>        based on provided bootstrap tree topologies.
> 
>      o Bio::TreeIO now supports the Pagel (PAG) tree format.
> 
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> ---
> avast! Antivirus: Inbound message clean.
> Virus Database (VPS): 0503-2, 21/01/2005
> Tested on: 25/01/2005 17:41:57
> avast! is copyright (c) 2000-2003 ALWIL Software.
> http://www.avast.com
> 
> 
> 
> 
---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 25/01/2005 19:06:22
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-0, 25/01/2005
Tested on: 25/01/2005 19:28:02
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-3, 27/01/2005
Tested on: 28/01/2005 07:39:09
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


---
avast! Antivirus: Outbound message clean.
Virus Database (VPS): 0504-3, 27/01/2005
Tested on: 28/01/2005 07:40:38
avast! is copyright (c) 2000-2003 ALWIL Software.
http://www.avast.com


From hlapp at gnf.org  Fri Jan 28 12:58:58 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jan 28 12:54:59 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
Message-ID: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org>


On Jan 27, 2005, at 8:48 PM, Allen Day wrote:

> I'd appreciate it if someone with Oracle and DBD::Oracle installed 
> could
> give Bioperl-DB a spin and verify that it works.
>

Do you mean your RPM or bioperl-db on Oracle? I'm running the latter 
all the time.

> I'd also like someone with Oracle to help me make a DBD::Oracle rpm.
> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
> Bioperl-DB.

If installing the supposed DBD::Oracle is then a prerequisite for being 
able to install the rest, then you are taking the wrong path. 
DBD::Oracle itself will depend on the Oracle client libraries being 
installed which aren't even available on all platforms, aside from the 
fact that installing those is beyond your control and involves 
downloading about 350MB from OTN.

Frankly, I can't believe that there is no way to specify dependencies 
that are optional. Why would you require all of DBD::mysql, DBD::Pg, 
and DBD::Oracle if all a persons wants is mysql?? All of these will 
link to compiled runtime libraries and why should a failure to install 
DBD::Pg be of any concern to someone who wants to use mysql?

BTW DBD::Oracle is on CPAN. I thought that would make it easy to 
construct an RPM? (There's few if any binaries though - for a reason. 
Compiling DBD::Oracle may be a charm on some but involve some major 
tweaking on other platforms. I've been there multiple times, I know 
what I'm talking about.)

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From allenday at ucla.edu  Fri Jan 28 14:50:08 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Jan 28 14:46:06 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
	<48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org>
Message-ID: <Pine.LNX.4.58.0501281138170.18383@sumo.ctrl.ucla.edu>

> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter 
> all the time.

i mean the RPM.  it is the same as bioperl-db cvs head as of last night.

> > I'd also like someone with Oracle to help me make a DBD::Oracle rpm.
> > Having a DBD::Oracle RPM will allow me to leave the Oracle code in
> > Bioperl-DB.
> 
> If installing the supposed DBD::Oracle is then a prerequisite for being 
> able to install the rest, then you are taking the wrong path. 
> DBD::Oracle itself will depend on the Oracle client libraries being 
> installed which aren't even available on all platforms, aside from the 
> fact that installing those is beyond your control and involves 
> downloading about 350MB from OTN.
>
> Frankly, I can't believe that there is no way to specify dependencies
> that are optional. Why would you require all of DBD::mysql, DBD::Pg, and
> DBD::Oracle if all a persons wants is mysql?? All of these will link to
> compiled runtime libraries and why should a failure to install DBD::Pg
> be of any concern to someone who wants to use mysql?

the problem is something internal to the rpm installer -- it determines 
perl library dependencies at install-time rather than requiring you to 
explicitly specify perl packages in the rpm metafiles (aka specfile).

so, for instance, if i i tried to install perl-Generic-Genome-Browser, i 
might get an error like:

  requires perl(Bio::Root::Root)

which could be removed by one of:

  (1) installing the perl-bioperl package
  (2) installing bioperl from cvs
  (3) installing bioperl from cpan

there may be a way to code into the metafile to ignore missing perl
dependencies detected in the installation process -- i need to look into
this.

> BTW DBD::Oracle is on CPAN. I thought that would make it easy to 
> construct an RPM? (There's few if any binaries though - for a reason. 
> Compiling DBD::Oracle may be a charm on some but involve some major 
> tweaking on other platforms. I've been there multiple times, I know 
> what I'm talking about.)

given what i've said above, if i had a DBD::Oracle perl module installed,
it would prevent rpm from throwing errors about missing dependency
"perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
because the make process links to the oracle headers and .so files.  the 
DBD::Oracle can be made w/o having explicit dependencies on the oracle 
binary install, so it would install on a machine that didn't have oracle 
installed (but wouldn't work).  so as far as a bioperl-db rpm goes, here 
are the options i'm looking into:

  (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
      leaving out the binary Oracle file dependency.  distribute 
      bioperl-db from cvs as-is
  (2) patch Oracle classes out of bioperl-db as part of the rpm build
      process.  distribute modified bioperl-db.
  (3) modify rpm "detection of installed perl modules" functionality
      to have rpm explicitly ignore missing DBD::Oracle dependency.

(1) and (2) will definitely work.  i don't yet know the feasibility of
(3).

-allen
From jason.stajich at duke.edu  Fri Jan 28 15:05:06 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Jan 28 15:00:55 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501281138170.18383@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
	<48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org>
	<Pine.LNX.4.58.0501281138170.18383@sumo.ctrl.ucla.edu>
Message-ID: <4e40de988838c77a0768bb96cb4ea1c5@duke.edu>


--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/
On Jan 28, 2005, at 2:50 PM, Allen Day wrote:

>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
>> all the time.
>
> i mean the RPM.  it is the same as bioperl-db cvs head as of last 
> night.
>
>>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm.
>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
>>> Bioperl-DB.
>>
>> If installing the supposed DBD::Oracle is then a prerequisite for 
>> being
>> able to install the rest, then you are taking the wrong path.
>> DBD::Oracle itself will depend on the Oracle client libraries being
>> installed which aren't even available on all platforms, aside from the
>> fact that installing those is beyond your control and involves
>> downloading about 350MB from OTN.
>>
>> Frankly, I can't believe that there is no way to specify dependencies
>> that are optional. Why would you require all of DBD::mysql, DBD::Pg, 
>> and
>> DBD::Oracle if all a persons wants is mysql?? All of these will link 
>> to
>> compiled runtime libraries and why should a failure to install DBD::Pg
>> be of any concern to someone who wants to use mysql?
>
> the problem is something internal to the rpm installer -- it determines
> perl library dependencies at install-time rather than requiring you to
> explicitly specify perl packages in the rpm metafiles (aka specfile).
>
What are you using to generate the specfiles in the first place?  Are 
you using cpan2rpm?

> so, for instance, if i i tried to install perl-Generic-Genome-Browser, 
> i
> might get an error like:
>
>   requires perl(Bio::Root::Root)
>
> which could be removed by one of:
>
>   (1) installing the perl-bioperl package
>   (2) installing bioperl from cvs
>   (3) installing bioperl from cpan
>
> there may be a way to code into the metafile to ignore missing perl
> dependencies detected in the installation process -- i need to look 
> into
> this.
>
>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
>> construct an RPM? (There's few if any binaries though - for a reason.
>> Compiling DBD::Oracle may be a charm on some but involve some major
>> tweaking on other platforms. I've been there multiple times, I know
>> what I'm talking about.)
>
> given what i've said above, if i had a DBD::Oracle perl module 
> installed,
> it would prevent rpm from throwing errors about missing dependency
> "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
> because the make process links to the oracle headers and .so files.  
> the
> DBD::Oracle can be made w/o having explicit dependencies on the oracle
> binary install, so it would install on a machine that didn't have 
> oracle
> installed (but wouldn't work).  so as far as a bioperl-db rpm goes, 
> here
> are the options i'm looking into:
>
>   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
>       leaving out the binary Oracle file dependency.  distribute
>       bioperl-db from cvs as-is
>   (2) patch Oracle classes out of bioperl-db as part of the rpm build
>       process.  distribute modified bioperl-db.
>   (3) modify rpm "detection of installed perl modules" functionality
>       to have rpm explicitly ignore missing DBD::Oracle dependency.
>
> (1) and (2) will definitely work.  i don't yet know the feasibility of
> (3).
>
> -allen
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From lstein at cshl.edu  Fri Jan 28 10:41:47 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri Jan 28 15:01:58 2005
Subject: [Bioperl-l] Re: [GMOD-devel] RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
Message-ID: <200501281041.48221.lstein@cshl.edu>

Hi Allen,

Don't release a gbrowse RPM if it is even slightly broken.

Lincoln

On Thursday 27 January 2005 11:48 pm, Allen Day wrote:
> SUMMARY:
> ========
>
> I've successfully built RPMs for the GBrowse, Bioperl, and
> Bioperl-DB dependency trees.  These are for CVS HEAD, not the most
> recent releases. I've only tested on Fedora Core 2 and RedHat 9. 
> Fedora Core 2 packages are available here:
>
> http://sumo.genetics.ucla.edu/~allenday/flute-fc2/i386/
> http://sumo.genetics.ucla.edu/~allenday/flute-fc2/noarch/
>
> The source RPMs are available as well
>
> http://sumo.genetics.ucla.edu/~allenday/flute-fc2/SRPMS
>
> A few notes on what non-obvious steps needed to be taken to make
> this work:
>
>   * pruned Bioperl-DB to remove Oracle dependencies
>   * pruned Gbrowse to remove AcePerl dependencies
>   * pruned Bioperl to remove AcePerl dependencies
>   * pruned SOAP::Lite to remove MQSeries dependencies
>   * pruned Gbrowse to remove Mac OS X dependencies
>   * pruned a few modules (Net::DNS, Mail::Sender, etc) to remove
>     Win32::* dependencies.
>   * piggybacked on existing RPMs for rrdtool, Template Toolkit,
>     Module-Build, and CPANPLUS.  Thanks to Dag Wieeers [1] for
> these.
>
>
>
> TODO:
> =====
>
> The Bioperl install is fully functional as far as I can tell.
>
> I'd appreciate it if someone with Oracle and DBD::Oracle installed
> could give Bioperl-DB a spin and verify that it works.
>
> I'd also like someone with Oracle to help me make a DBD::Oracle
> rpm. Having a DBD::Oracle RPM will allow me to leave the Oracle
> code in Bioperl-DB.
>
> The Gbrowse install is slightly broken, but this is mainly due to a
> major rewrite that's taking place right now.  I'll make another
> announcement to the GMOD list when the Gbrowse RPM works out of the
> box.
>
> -Allen
>
> [1] http://dag.wieers.com
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive
> Reporting Tool for open source databases. Create drag-&-drop
> reports. Save time by over 75%! Publish reports on the web. Export
> to DOC, XLS, RTF, etc. Download a FREE copy at
> http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> Gmod-devel mailing list
> Gmod-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-devel

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050128/caa4e330/attachment.bin
From hlapp at gnf.org  Fri Jan 28 16:49:55 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jan 28 16:45:56 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501281138170.18383@sumo.ctrl.ucla.edu>
Message-ID: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>

Like this statement or not, but I think installing all kinds of CPAN 
packages onto somebody's machine irrespective of whether somebody is 
ever going to use - or need - them, let alone them working in the first 
place due to compiled code dependencies being absent, is a really *bad* 
idea

It basically defies the concept of modular packaging to begin with, and 
sounds way too intrusive for my taste.

Unless I misunderstand what Jason is saying then this is not even 
necessary and is in no way an inherent shortcoming that inevitably 
comes with RPMs. So unless I'm missing something here I understand that 
Jason is saying you can have RPMs and still not litter your system with 
DBD::blah or other modules for which you don't even have the client 
libraries installed, and still be able to install those at a later time 
because the respective pieces of code have not been pruned (which I 
think is actually also a bad idea).

	-hilmar

On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:

>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
>> all the time.
>
> i mean the RPM.  it is the same as bioperl-db cvs head as of last 
> night.
>
>>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm.
>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
>>> Bioperl-DB.
>>
>> If installing the supposed DBD::Oracle is then a prerequisite for 
>> being
>> able to install the rest, then you are taking the wrong path.
>> DBD::Oracle itself will depend on the Oracle client libraries being
>> installed which aren't even available on all platforms, aside from the
>> fact that installing those is beyond your control and involves
>> downloading about 350MB from OTN.
>>
>> Frankly, I can't believe that there is no way to specify dependencies
>> that are optional. Why would you require all of DBD::mysql, DBD::Pg, 
>> and
>> DBD::Oracle if all a persons wants is mysql?? All of these will link 
>> to
>> compiled runtime libraries and why should a failure to install DBD::Pg
>> be of any concern to someone who wants to use mysql?
>
> the problem is something internal to the rpm installer -- it determines
> perl library dependencies at install-time rather than requiring you to
> explicitly specify perl packages in the rpm metafiles (aka specfile).
>
> so, for instance, if i i tried to install perl-Generic-Genome-Browser, 
> i
> might get an error like:
>
>   requires perl(Bio::Root::Root)
>
> which could be removed by one of:
>
>   (1) installing the perl-bioperl package
>   (2) installing bioperl from cvs
>   (3) installing bioperl from cpan
>
> there may be a way to code into the metafile to ignore missing perl
> dependencies detected in the installation process -- i need to look 
> into
> this.
>
>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
>> construct an RPM? (There's few if any binaries though - for a reason.
>> Compiling DBD::Oracle may be a charm on some but involve some major
>> tweaking on other platforms. I've been there multiple times, I know
>> what I'm talking about.)
>
> given what i've said above, if i had a DBD::Oracle perl module 
> installed,
> it would prevent rpm from throwing errors about missing dependency
> "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
> because the make process links to the oracle headers and .so files.  
> the
> DBD::Oracle can be made w/o having explicit dependencies on the oracle
> binary install, so it would install on a machine that didn't have 
> oracle
> installed (but wouldn't work).  so as far as a bioperl-db rpm goes, 
> here
> are the options i'm looking into:
>
>   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
>       leaving out the binary Oracle file dependency.  distribute
>       bioperl-db from cvs as-is
>   (2) patch Oracle classes out of bioperl-db as part of the rpm build
>       process.  distribute modified bioperl-db.
>   (3) modify rpm "detection of installed perl modules" functionality
>       to have rpm explicitly ignore missing DBD::Oracle dependency.
>
> (1) and (2) will definitely work.  i don't yet know the feasibility of
> (3).
>
> -allen
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From allenday at ucla.edu  Fri Jan 28 19:49:22 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Jan 28 19:45:19 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
Message-ID: <Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>

okay, i've looked into this.  short answer: you cannot specify to omit
automatically determined dependencies without "lying" in the rpm specfile
and stating that a package provides a perl module that it, in fact, does
not.

for example, i can add a statement to the bioperl-db rpm stating that it
provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the
package.  there is a thread extensively discussing this aspect of the rpm
build system here:

http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html

if i'm making a package for private use only, i don't mind doing this, but
if this package is to be for public consumption i don't want to lie about
what is and is not provided.  i take the same stance on all the other perl
modules in the bioperl dependency tree, including esoteric modules such as
Net::Jabber and GD::Graph3d.

the only viable option i see here is to patch Oracle dependencies out of
bioperl-db.  that is what i will do until i have working Oracle and
perl-DBD-Oracle packages in-hand.

-allen


On Fri, 28 Jan 2005, Hilmar Lapp wrote:

> Like this statement or not, but I think installing all kinds of CPAN 
> packages onto somebody's machine irrespective of whether somebody is 
> ever going to use - or need - them, let alone them working in the first 
> place due to compiled code dependencies being absent, is a really *bad* 
> idea
> 
> It basically defies the concept of modular packaging to begin with, and 
> sounds way too intrusive for my taste.
> 
> Unless I misunderstand what Jason is saying then this is not even 
> necessary and is in no way an inherent shortcoming that inevitably 
> comes with RPMs. So unless I'm missing something here I understand that 
> Jason is saying you can have RPMs and still not litter your system with 
> DBD::blah or other modules for which you don't even have the client 
> libraries installed, and still be able to install those at a later time 
> because the respective pieces of code have not been pruned (which I 
> think is actually also a bad idea).
> 
> 	-hilmar
> 
> On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
> 
> >> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
> >> all the time.
> >
> > i mean the RPM.  it is the same as bioperl-db cvs head as of last 
> > night.
> >
> >>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm.
> >>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
> >>> Bioperl-DB.
> >>
> >> If installing the supposed DBD::Oracle is then a prerequisite for 
> >> being
> >> able to install the rest, then you are taking the wrong path.
> >> DBD::Oracle itself will depend on the Oracle client libraries being
> >> installed which aren't even available on all platforms, aside from the
> >> fact that installing those is beyond your control and involves
> >> downloading about 350MB from OTN.
> >>
> >> Frankly, I can't believe that there is no way to specify dependencies
> >> that are optional. Why would you require all of DBD::mysql, DBD::Pg, 
> >> and
> >> DBD::Oracle if all a persons wants is mysql?? All of these will link 
> >> to
> >> compiled runtime libraries and why should a failure to install DBD::Pg
> >> be of any concern to someone who wants to use mysql?
> >
> > the problem is something internal to the rpm installer -- it determines
> > perl library dependencies at install-time rather than requiring you to
> > explicitly specify perl packages in the rpm metafiles (aka specfile).
> >
> > so, for instance, if i i tried to install perl-Generic-Genome-Browser, 
> > i
> > might get an error like:
> >
> >   requires perl(Bio::Root::Root)
> >
> > which could be removed by one of:
> >
> >   (1) installing the perl-bioperl package
> >   (2) installing bioperl from cvs
> >   (3) installing bioperl from cpan
> >
> > there may be a way to code into the metafile to ignore missing perl
> > dependencies detected in the installation process -- i need to look 
> > into
> > this.
> >
> >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
> >> construct an RPM? (There's few if any binaries though - for a reason.
> >> Compiling DBD::Oracle may be a charm on some but involve some major
> >> tweaking on other platforms. I've been there multiple times, I know
> >> what I'm talking about.)
> >
> > given what i've said above, if i had a DBD::Oracle perl module 
> > installed,
> > it would prevent rpm from throwing errors about missing dependency
> > "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
> > because the make process links to the oracle headers and .so files.  
> > the
> > DBD::Oracle can be made w/o having explicit dependencies on the oracle
> > binary install, so it would install on a machine that didn't have 
> > oracle
> > installed (but wouldn't work).  so as far as a bioperl-db rpm goes, 
> > here
> > are the options i'm looking into:
> >
> >   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
> >       leaving out the binary Oracle file dependency.  distribute
> >       bioperl-db from cvs as-is
> >   (2) patch Oracle classes out of bioperl-db as part of the rpm build
> >       process.  distribute modified bioperl-db.
> >   (3) modify rpm "detection of installed perl modules" functionality
> >       to have rpm explicitly ignore missing DBD::Oracle dependency.
> >
> > (1) and (2) will definitely work.  i don't yet know the feasibility of
> > (3).
> >
> > -allen
> >
> 
From hlapp at gnf.org  Fri Jan 28 20:00:30 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jan 28 19:56:24 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
Message-ID: <2BB48BC3-7191-11D9-8A2B-000A95AE92B0@gnf.org>

Ah - I think I misunderstood Jason - he probably meant when installing 
the RPM you ignore certain dependencies? So why don't you allow people 
to ignore dependencies?

I'll stick to my guns here that I don't think this is a good approach, 
and not just due to DBD::Oracle. Why do you want somebody to install 
DBD::Pg if she doesn't have or intend to use PostgreSQL? What are you 
going to tell a sysadmin who wants a clean system? Why install all 
kinds of esoteric packages into somebody's perl installation without 
even asking, and even without some of them working?? Why does CPAN ask 
before it gets and installs a dependency?

My opinion anyways, and I'll shut up with this.

	-hilmar

On Jan 28, 2005, at 4:49 PM, Allen Day wrote:

> okay, i've looked into this.  short answer: you cannot specify to omit
> automatically determined dependencies without "lying" in the rpm 
> specfile
> and stating that a package provides a perl module that it, in fact, 
> does
> not.
>
> for example, i can add a statement to the bioperl-db rpm stating that 
> it
> provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the
> package.  there is a thread extensively discussing this aspect of the 
> rpm
> build system here:
>
> http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html
>
> if i'm making a package for private use only, i don't mind doing this, 
> but
> if this package is to be for public consumption i don't want to lie 
> about
> what is and is not provided.  i take the same stance on all the other 
> perl
> modules in the bioperl dependency tree, including esoteric modules 
> such as
> Net::Jabber and GD::Graph3d.
>
> the only viable option i see here is to patch Oracle dependencies out 
> of
> bioperl-db.  that is what i will do until i have working Oracle and
> perl-DBD-Oracle packages in-hand.
>
> -allen
>
>
> On Fri, 28 Jan 2005, Hilmar Lapp wrote:
>
>> Like this statement or not, but I think installing all kinds of CPAN
>> packages onto somebody's machine irrespective of whether somebody is
>> ever going to use - or need - them, let alone them working in the 
>> first
>> place due to compiled code dependencies being absent, is a really 
>> *bad*
>> idea
>>
>> It basically defies the concept of modular packaging to begin with, 
>> and
>> sounds way too intrusive for my taste.
>>
>> Unless I misunderstand what Jason is saying then this is not even
>> necessary and is in no way an inherent shortcoming that inevitably
>> comes with RPMs. So unless I'm missing something here I understand 
>> that
>> Jason is saying you can have RPMs and still not litter your system 
>> with
>> DBD::blah or other modules for which you don't even have the client
>> libraries installed, and still be able to install those at a later 
>> time
>> because the respective pieces of code have not been pruned (which I
>> think is actually also a bad idea).
>>
>> 	-hilmar
>>
>> On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
>>
>>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
>>>> all the time.
>>>
>>> i mean the RPM.  it is the same as bioperl-db cvs head as of last
>>> night.
>>>
>>>>> I'd also like someone with Oracle to help me make a DBD::Oracle 
>>>>> rpm.
>>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
>>>>> Bioperl-DB.
>>>>
>>>> If installing the supposed DBD::Oracle is then a prerequisite for
>>>> being
>>>> able to install the rest, then you are taking the wrong path.
>>>> DBD::Oracle itself will depend on the Oracle client libraries being
>>>> installed which aren't even available on all platforms, aside from 
>>>> the
>>>> fact that installing those is beyond your control and involves
>>>> downloading about 350MB from OTN.
>>>>
>>>> Frankly, I can't believe that there is no way to specify 
>>>> dependencies
>>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg,
>>>> and
>>>> DBD::Oracle if all a persons wants is mysql?? All of these will link
>>>> to
>>>> compiled runtime libraries and why should a failure to install 
>>>> DBD::Pg
>>>> be of any concern to someone who wants to use mysql?
>>>
>>> the problem is something internal to the rpm installer -- it 
>>> determines
>>> perl library dependencies at install-time rather than requiring you 
>>> to
>>> explicitly specify perl packages in the rpm metafiles (aka specfile).
>>>
>>> so, for instance, if i i tried to install 
>>> perl-Generic-Genome-Browser,
>>> i
>>> might get an error like:
>>>
>>>   requires perl(Bio::Root::Root)
>>>
>>> which could be removed by one of:
>>>
>>>   (1) installing the perl-bioperl package
>>>   (2) installing bioperl from cvs
>>>   (3) installing bioperl from cpan
>>>
>>> there may be a way to code into the metafile to ignore missing perl
>>> dependencies detected in the installation process -- i need to look
>>> into
>>> this.
>>>
>>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
>>>> construct an RPM? (There's few if any binaries though - for a 
>>>> reason.
>>>> Compiling DBD::Oracle may be a charm on some but involve some major
>>>> tweaking on other platforms. I've been there multiple times, I know
>>>> what I'm talking about.)
>>>
>>> given what i've said above, if i had a DBD::Oracle perl module
>>> installed,
>>> it would prevent rpm from throwing errors about missing dependency
>>> "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
>>> because the make process links to the oracle headers and .so files.
>>> the
>>> DBD::Oracle can be made w/o having explicit dependencies on the 
>>> oracle
>>> binary install, so it would install on a machine that didn't have
>>> oracle
>>> installed (but wouldn't work).  so as far as a bioperl-db rpm goes,
>>> here
>>> are the options i'm looking into:
>>>
>>>   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
>>>       leaving out the binary Oracle file dependency.  distribute
>>>       bioperl-db from cvs as-is
>>>   (2) patch Oracle classes out of bioperl-db as part of the rpm build
>>>       process.  distribute modified bioperl-db.
>>>   (3) modify rpm "detection of installed perl modules" functionality
>>>       to have rpm explicitly ignore missing DBD::Oracle dependency.
>>>
>>> (1) and (2) will definitely work.  i don't yet know the feasibility 
>>> of
>>> (3).
>>>
>>> -allen
>>>
>>
>>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From allenday at ucla.edu  Fri Jan 28 20:01:39 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Jan 28 19:57:27 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <4e40de988838c77a0768bb96cb4ea1c5@duke.edu>
References: <Pine.LNX.4.58.0501252214470.3069@sumo.ctrl.ucla.edu>
	<Pine.LNX.4.58.0501272032380.1190@sumo.ctrl.ucla.edu>
	<48F109E2-7156-11D9-8A2B-000A95AE92B0@gnf.org>
	<Pine.LNX.4.58.0501281138170.18383@sumo.ctrl.ucla.edu>
	<4e40de988838c77a0768bb96cb4ea1c5@duke.edu>
Message-ID: <Pine.LNX.4.58.0501281700590.22585@sumo.ctrl.ucla.edu>

no, i'm using RPM::Specfile's cpanflute2.  it's similar.  i autogenerate, 
then add patches/tweaks as necessary.

-allen


On Fri, 28 Jan 2005, Jason Stajich wrote:

> 
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> On Jan 28, 2005, at 2:50 PM, Allen Day wrote:
> 
> >> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
> >> all the time.
> >
> > i mean the RPM.  it is the same as bioperl-db cvs head as of last 
> > night.
> >
> >>> I'd also like someone with Oracle to help me make a DBD::Oracle rpm.
> >>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
> >>> Bioperl-DB.
> >>
> >> If installing the supposed DBD::Oracle is then a prerequisite for 
> >> being
> >> able to install the rest, then you are taking the wrong path.
> >> DBD::Oracle itself will depend on the Oracle client libraries being
> >> installed which aren't even available on all platforms, aside from the
> >> fact that installing those is beyond your control and involves
> >> downloading about 350MB from OTN.
> >>
> >> Frankly, I can't believe that there is no way to specify dependencies
> >> that are optional. Why would you require all of DBD::mysql, DBD::Pg, 
> >> and
> >> DBD::Oracle if all a persons wants is mysql?? All of these will link 
> >> to
> >> compiled runtime libraries and why should a failure to install DBD::Pg
> >> be of any concern to someone who wants to use mysql?
> >
> > the problem is something internal to the rpm installer -- it determines
> > perl library dependencies at install-time rather than requiring you to
> > explicitly specify perl packages in the rpm metafiles (aka specfile).
> >
> What are you using to generate the specfiles in the first place?  Are 
> you using cpan2rpm?
> 
> > so, for instance, if i i tried to install perl-Generic-Genome-Browser, 
> > i
> > might get an error like:
> >
> >   requires perl(Bio::Root::Root)
> >
> > which could be removed by one of:
> >
> >   (1) installing the perl-bioperl package
> >   (2) installing bioperl from cvs
> >   (3) installing bioperl from cpan
> >
> > there may be a way to code into the metafile to ignore missing perl
> > dependencies detected in the installation process -- i need to look 
> > into
> > this.
> >
> >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
> >> construct an RPM? (There's few if any binaries though - for a reason.
> >> Compiling DBD::Oracle may be a charm on some but involve some major
> >> tweaking on other platforms. I've been there multiple times, I know
> >> what I'm talking about.)
> >
> > given what i've said above, if i had a DBD::Oracle perl module 
> > installed,
> > it would prevent rpm from throwing errors about missing dependency
> > "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
> > because the make process links to the oracle headers and .so files.  
> > the
> > DBD::Oracle can be made w/o having explicit dependencies on the oracle
> > binary install, so it would install on a machine that didn't have 
> > oracle
> > installed (but wouldn't work).  so as far as a bioperl-db rpm goes, 
> > here
> > are the options i'm looking into:
> >
> >   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
> >       leaving out the binary Oracle file dependency.  distribute
> >       bioperl-db from cvs as-is
> >   (2) patch Oracle classes out of bioperl-db as part of the rpm build
> >       process.  distribute modified bioperl-db.
> >   (3) modify rpm "detection of installed perl modules" functionality
> >       to have rpm explicitly ignore missing DBD::Oracle dependency.
> >
> > (1) and (2) will definitely work.  i don't yet know the feasibility of
> > (3).
> >
> > -allen
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
From hlapp at gnf.org  Fri Jan 28 20:03:14 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Fri Jan 28 19:59:07 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
Message-ID: <8DA2A4A6-7191-11D9-8A2B-000A95AE92B0@gnf.org>

BTW,

%define _use_internal_dependency_generator 0

is not an option?

	-hilmar

On Jan 28, 2005, at 4:49 PM, Allen Day wrote:

> okay, i've looked into this.  short answer: you cannot specify to omit
> automatically determined dependencies without "lying" in the rpm 
> specfile
> and stating that a package provides a perl module that it, in fact, 
> does
> not.
>
> for example, i can add a statement to the bioperl-db rpm stating that 
> it
> provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the
> package.  there is a thread extensively discussing this aspect of the 
> rpm
> build system here:
>
> http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html
>
> if i'm making a package for private use only, i don't mind doing this, 
> but
> if this package is to be for public consumption i don't want to lie 
> about
> what is and is not provided.  i take the same stance on all the other 
> perl
> modules in the bioperl dependency tree, including esoteric modules 
> such as
> Net::Jabber and GD::Graph3d.
>
> the only viable option i see here is to patch Oracle dependencies out 
> of
> bioperl-db.  that is what i will do until i have working Oracle and
> perl-DBD-Oracle packages in-hand.
>
> -allen
>
>
> On Fri, 28 Jan 2005, Hilmar Lapp wrote:
>
>> Like this statement or not, but I think installing all kinds of CPAN
>> packages onto somebody's machine irrespective of whether somebody is
>> ever going to use - or need - them, let alone them working in the 
>> first
>> place due to compiled code dependencies being absent, is a really 
>> *bad*
>> idea
>>
>> It basically defies the concept of modular packaging to begin with, 
>> and
>> sounds way too intrusive for my taste.
>>
>> Unless I misunderstand what Jason is saying then this is not even
>> necessary and is in no way an inherent shortcoming that inevitably
>> comes with RPMs. So unless I'm missing something here I understand 
>> that
>> Jason is saying you can have RPMs and still not litter your system 
>> with
>> DBD::blah or other modules for which you don't even have the client
>> libraries installed, and still be able to install those at a later 
>> time
>> because the respective pieces of code have not been pruned (which I
>> think is actually also a bad idea).
>>
>> 	-hilmar
>>
>> On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
>>
>>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
>>>> all the time.
>>>
>>> i mean the RPM.  it is the same as bioperl-db cvs head as of last
>>> night.
>>>
>>>>> I'd also like someone with Oracle to help me make a DBD::Oracle 
>>>>> rpm.
>>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
>>>>> Bioperl-DB.
>>>>
>>>> If installing the supposed DBD::Oracle is then a prerequisite for
>>>> being
>>>> able to install the rest, then you are taking the wrong path.
>>>> DBD::Oracle itself will depend on the Oracle client libraries being
>>>> installed which aren't even available on all platforms, aside from 
>>>> the
>>>> fact that installing those is beyond your control and involves
>>>> downloading about 350MB from OTN.
>>>>
>>>> Frankly, I can't believe that there is no way to specify 
>>>> dependencies
>>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg,
>>>> and
>>>> DBD::Oracle if all a persons wants is mysql?? All of these will link
>>>> to
>>>> compiled runtime libraries and why should a failure to install 
>>>> DBD::Pg
>>>> be of any concern to someone who wants to use mysql?
>>>
>>> the problem is something internal to the rpm installer -- it 
>>> determines
>>> perl library dependencies at install-time rather than requiring you 
>>> to
>>> explicitly specify perl packages in the rpm metafiles (aka specfile).
>>>
>>> so, for instance, if i i tried to install 
>>> perl-Generic-Genome-Browser,
>>> i
>>> might get an error like:
>>>
>>>   requires perl(Bio::Root::Root)
>>>
>>> which could be removed by one of:
>>>
>>>   (1) installing the perl-bioperl package
>>>   (2) installing bioperl from cvs
>>>   (3) installing bioperl from cpan
>>>
>>> there may be a way to code into the metafile to ignore missing perl
>>> dependencies detected in the installation process -- i need to look
>>> into
>>> this.
>>>
>>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
>>>> construct an RPM? (There's few if any binaries though - for a 
>>>> reason.
>>>> Compiling DBD::Oracle may be a charm on some but involve some major
>>>> tweaking on other platforms. I've been there multiple times, I know
>>>> what I'm talking about.)
>>>
>>> given what i've said above, if i had a DBD::Oracle perl module
>>> installed,
>>> it would prevent rpm from throwing errors about missing dependency
>>> "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
>>> because the make process links to the oracle headers and .so files.
>>> the
>>> DBD::Oracle can be made w/o having explicit dependencies on the 
>>> oracle
>>> binary install, so it would install on a machine that didn't have
>>> oracle
>>> installed (but wouldn't work).  so as far as a bioperl-db rpm goes,
>>> here
>>> are the options i'm looking into:
>>>
>>>   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
>>>       leaving out the binary Oracle file dependency.  distribute
>>>       bioperl-db from cvs as-is
>>>   (2) patch Oracle classes out of bioperl-db as part of the rpm build
>>>       process.  distribute modified bioperl-db.
>>>   (3) modify rpm "detection of installed perl modules" functionality
>>>       to have rpm explicitly ignore missing DBD::Oracle dependency.
>>>
>>> (1) and (2) will definitely work.  i don't yet know the feasibility 
>>> of
>>> (3).
>>>
>>> -allen
>>>
>>
>>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From allenday at ucla.edu  Fri Jan 28 20:09:11 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Jan 28 20:05:25 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <8DA2A4A6-7191-11D9-8A2B-000A95AE92B0@gnf.org>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
	<8DA2A4A6-7191-11D9-8A2B-000A95AE92B0@gnf.org>
Message-ID: <Pine.LNX.4.58.0501281704260.22585@sumo.ctrl.ucla.edu>

it's no different than lying about what the package provides.  you're
still going to have components in the package that will not function,
because all package dependencies have not been installed.

patching out the dependent code is really the most honest and least
problematic solution here.

-allen


On Fri, 28 Jan 2005, Hilmar Lapp wrote:

> BTW,
> 
> %define _use_internal_dependency_generator 0
> 
> is not an option?
> 
> 	-hilmar
> 
> On Jan 28, 2005, at 4:49 PM, Allen Day wrote:
> 
> > okay, i've looked into this.  short answer: you cannot specify to omit
> > automatically determined dependencies without "lying" in the rpm 
> > specfile
> > and stating that a package provides a perl module that it, in fact, 
> > does
> > not.
> >
> > for example, i can add a statement to the bioperl-db rpm stating that 
> > it
> > provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the
> > package.  there is a thread extensively discussing this aspect of the 
> > rpm
> > build system here:
> >
> > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html
> >
> > if i'm making a package for private use only, i don't mind doing this, 
> > but
> > if this package is to be for public consumption i don't want to lie 
> > about
> > what is and is not provided.  i take the same stance on all the other 
> > perl
> > modules in the bioperl dependency tree, including esoteric modules 
> > such as
> > Net::Jabber and GD::Graph3d.
> >
> > the only viable option i see here is to patch Oracle dependencies out 
> > of
> > bioperl-db.  that is what i will do until i have working Oracle and
> > perl-DBD-Oracle packages in-hand.
> >
> > -allen
> >
> >
> > On Fri, 28 Jan 2005, Hilmar Lapp wrote:
> >
> >> Like this statement or not, but I think installing all kinds of CPAN
> >> packages onto somebody's machine irrespective of whether somebody is
> >> ever going to use - or need - them, let alone them working in the 
> >> first
> >> place due to compiled code dependencies being absent, is a really 
> >> *bad*
> >> idea
> >>
> >> It basically defies the concept of modular packaging to begin with, 
> >> and
> >> sounds way too intrusive for my taste.
> >>
> >> Unless I misunderstand what Jason is saying then this is not even
> >> necessary and is in no way an inherent shortcoming that inevitably
> >> comes with RPMs. So unless I'm missing something here I understand 
> >> that
> >> Jason is saying you can have RPMs and still not litter your system 
> >> with
> >> DBD::blah or other modules for which you don't even have the client
> >> libraries installed, and still be able to install those at a later 
> >> time
> >> because the respective pieces of code have not been pruned (which I
> >> think is actually also a bad idea).
> >>
> >> 	-hilmar
> >>
> >> On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
> >>
> >>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
> >>>> all the time.
> >>>
> >>> i mean the RPM.  it is the same as bioperl-db cvs head as of last
> >>> night.
> >>>
> >>>>> I'd also like someone with Oracle to help me make a DBD::Oracle 
> >>>>> rpm.
> >>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
> >>>>> Bioperl-DB.
> >>>>
> >>>> If installing the supposed DBD::Oracle is then a prerequisite for
> >>>> being
> >>>> able to install the rest, then you are taking the wrong path.
> >>>> DBD::Oracle itself will depend on the Oracle client libraries being
> >>>> installed which aren't even available on all platforms, aside from 
> >>>> the
> >>>> fact that installing those is beyond your control and involves
> >>>> downloading about 350MB from OTN.
> >>>>
> >>>> Frankly, I can't believe that there is no way to specify 
> >>>> dependencies
> >>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg,
> >>>> and
> >>>> DBD::Oracle if all a persons wants is mysql?? All of these will link
> >>>> to
> >>>> compiled runtime libraries and why should a failure to install 
> >>>> DBD::Pg
> >>>> be of any concern to someone who wants to use mysql?
> >>>
> >>> the problem is something internal to the rpm installer -- it 
> >>> determines
> >>> perl library dependencies at install-time rather than requiring you 
> >>> to
> >>> explicitly specify perl packages in the rpm metafiles (aka specfile).
> >>>
> >>> so, for instance, if i i tried to install 
> >>> perl-Generic-Genome-Browser,
> >>> i
> >>> might get an error like:
> >>>
> >>>   requires perl(Bio::Root::Root)
> >>>
> >>> which could be removed by one of:
> >>>
> >>>   (1) installing the perl-bioperl package
> >>>   (2) installing bioperl from cvs
> >>>   (3) installing bioperl from cpan
> >>>
> >>> there may be a way to code into the metafile to ignore missing perl
> >>> dependencies detected in the installation process -- i need to look
> >>> into
> >>> this.
> >>>
> >>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
> >>>> construct an RPM? (There's few if any binaries though - for a 
> >>>> reason.
> >>>> Compiling DBD::Oracle may be a charm on some but involve some major
> >>>> tweaking on other platforms. I've been there multiple times, I know
> >>>> what I'm talking about.)
> >>>
> >>> given what i've said above, if i had a DBD::Oracle perl module
> >>> installed,
> >>> it would prevent rpm from throwing errors about missing dependency
> >>> "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
> >>> because the make process links to the oracle headers and .so files.
> >>> the
> >>> DBD::Oracle can be made w/o having explicit dependencies on the 
> >>> oracle
> >>> binary install, so it would install on a machine that didn't have
> >>> oracle
> >>> installed (but wouldn't work).  so as far as a bioperl-db rpm goes,
> >>> here
> >>> are the options i'm looking into:
> >>>
> >>>   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
> >>>       leaving out the binary Oracle file dependency.  distribute
> >>>       bioperl-db from cvs as-is
> >>>   (2) patch Oracle classes out of bioperl-db as part of the rpm build
> >>>       process.  distribute modified bioperl-db.
> >>>   (3) modify rpm "detection of installed perl modules" functionality
> >>>       to have rpm explicitly ignore missing DBD::Oracle dependency.
> >>>
> >>> (1) and (2) will definitely work.  i don't yet know the feasibility 
> >>> of
> >>> (3).
> >>>
> >>> -allen
> >>>
> >>
> >>
> 
From allenday at ucla.edu  Fri Jan 28 20:15:31 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Jan 28 20:11:29 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <2BB48BC3-7191-11D9-8A2B-000A95AE92B0@gnf.org>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
	<2BB48BC3-7191-11D9-8A2B-000A95AE92B0@gnf.org>
Message-ID: <Pine.LNX.4.58.0501281709200.22585@sumo.ctrl.ucla.edu>

On Fri, 28 Jan 2005, Hilmar Lapp wrote:

> Ah - I think I misunderstood Jason - he probably meant when installing
> the RPM you ignore certain dependencies? So why don't you allow people
> to ignore dependencies?

you can force rpms to ignore dependencies and install anyway.  if you're
trying to make a purely rpm-maintained system though, this leads to
missing package dependency problems further down the road... in my
experience force installing packages generally causes bigger problems than
it's worth.

> I'll stick to my guns here that I don't think this is a good approach,
> and not just due to DBD::Oracle. Why do you want somebody to install
> DBD::Pg if she doesn't have or intend to use PostgreSQL? What are you
> going to tell a sysadmin who wants a clean system? Why install all kinds
> of esoteric packages into somebody's perl installation without even
> asking, and even without some of them working?? Why does CPAN ask before
> it gets and installs a dependency?

if a user is fortunate enough to be in the position to have a system
administrator to do these installation and configuration tasks for them,
by all means they should do a source-based install or resolve the
dependencies in some other way.

but for the graduate student that just wants to get gbrowse up and running
to visualize some data he's working with, saving several hours interacting
with CPAN, make, gcc, autoconf, tweaking configuration files, etc in
exchange for installation of an extra module or two might sound like a
good deal.

-Allen


> My opinion anyways, and I'll shut up with this.
> 
> 	-hilmar
> 
> On Jan 28, 2005, at 4:49 PM, Allen Day wrote:
> 
> > okay, i've looked into this.  short answer: you cannot specify to omit
> > automatically determined dependencies without "lying" in the rpm 
> > specfile
> > and stating that a package provides a perl module that it, in fact, 
> > does
> > not.
> >
> > for example, i can add a statement to the bioperl-db rpm stating that 
> > it
> > provides perl(DBD::Oracle), but not actually add DBD/Oracle.pm to the
> > package.  there is a thread extensively discussing this aspect of the 
> > rpm
> > build system here:
> >
> > http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html
> >
> > if i'm making a package for private use only, i don't mind doing this, 
> > but
> > if this package is to be for public consumption i don't want to lie 
> > about
> > what is and is not provided.  i take the same stance on all the other 
> > perl
> > modules in the bioperl dependency tree, including esoteric modules 
> > such as
> > Net::Jabber and GD::Graph3d.
> >
> > the only viable option i see here is to patch Oracle dependencies out 
> > of
> > bioperl-db.  that is what i will do until i have working Oracle and
> > perl-DBD-Oracle packages in-hand.
> >
> > -allen
> >
> >
> > On Fri, 28 Jan 2005, Hilmar Lapp wrote:
> >
> >> Like this statement or not, but I think installing all kinds of CPAN
> >> packages onto somebody's machine irrespective of whether somebody is
> >> ever going to use - or need - them, let alone them working in the 
> >> first
> >> place due to compiled code dependencies being absent, is a really 
> >> *bad*
> >> idea
> >>
> >> It basically defies the concept of modular packaging to begin with, 
> >> and
> >> sounds way too intrusive for my taste.
> >>
> >> Unless I misunderstand what Jason is saying then this is not even
> >> necessary and is in no way an inherent shortcoming that inevitably
> >> comes with RPMs. So unless I'm missing something here I understand 
> >> that
> >> Jason is saying you can have RPMs and still not litter your system 
> >> with
> >> DBD::blah or other modules for which you don't even have the client
> >> libraries installed, and still be able to install those at a later 
> >> time
> >> because the respective pieces of code have not been pruned (which I
> >> think is actually also a bad idea).
> >>
> >> 	-hilmar
> >>
> >> On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
> >>
> >>>> Do you mean your RPM or bioperl-db on Oracle? I'm running the latter
> >>>> all the time.
> >>>
> >>> i mean the RPM.  it is the same as bioperl-db cvs head as of last
> >>> night.
> >>>
> >>>>> I'd also like someone with Oracle to help me make a DBD::Oracle 
> >>>>> rpm.
> >>>>> Having a DBD::Oracle RPM will allow me to leave the Oracle code in
> >>>>> Bioperl-DB.
> >>>>
> >>>> If installing the supposed DBD::Oracle is then a prerequisite for
> >>>> being
> >>>> able to install the rest, then you are taking the wrong path.
> >>>> DBD::Oracle itself will depend on the Oracle client libraries being
> >>>> installed which aren't even available on all platforms, aside from 
> >>>> the
> >>>> fact that installing those is beyond your control and involves
> >>>> downloading about 350MB from OTN.
> >>>>
> >>>> Frankly, I can't believe that there is no way to specify 
> >>>> dependencies
> >>>> that are optional. Why would you require all of DBD::mysql, DBD::Pg,
> >>>> and
> >>>> DBD::Oracle if all a persons wants is mysql?? All of these will link
> >>>> to
> >>>> compiled runtime libraries and why should a failure to install 
> >>>> DBD::Pg
> >>>> be of any concern to someone who wants to use mysql?
> >>>
> >>> the problem is something internal to the rpm installer -- it 
> >>> determines
> >>> perl library dependencies at install-time rather than requiring you 
> >>> to
> >>> explicitly specify perl packages in the rpm metafiles (aka specfile).
> >>>
> >>> so, for instance, if i i tried to install 
> >>> perl-Generic-Genome-Browser,
> >>> i
> >>> might get an error like:
> >>>
> >>>   requires perl(Bio::Root::Root)
> >>>
> >>> which could be removed by one of:
> >>>
> >>>   (1) installing the perl-bioperl package
> >>>   (2) installing bioperl from cvs
> >>>   (3) installing bioperl from cpan
> >>>
> >>> there may be a way to code into the metafile to ignore missing perl
> >>> dependencies detected in the installation process -- i need to look
> >>> into
> >>> this.
> >>>
> >>>> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
> >>>> construct an RPM? (There's few if any binaries though - for a 
> >>>> reason.
> >>>> Compiling DBD::Oracle may be a charm on some but involve some major
> >>>> tweaking on other platforms. I've been there multiple times, I know
> >>>> what I'm talking about.)
> >>>
> >>> given what i've said above, if i had a DBD::Oracle perl module
> >>> installed,
> >>> it would prevent rpm from throwing errors about missing dependency
> >>> "perl(DBD::Oracle)".  however, i can't build DBD::Oracle into an rpm
> >>> because the make process links to the oracle headers and .so files.
> >>> the
> >>> DBD::Oracle can be made w/o having explicit dependencies on the 
> >>> oracle
> >>> binary install, so it would install on a machine that didn't have
> >>> oracle
> >>> installed (but wouldn't work).  so as far as a bioperl-db rpm goes,
> >>> here
> >>> are the options i'm looking into:
> >>>
> >>>   (1) get a binary perl-DBD-Oracle rpm built by someone with Oracle,
> >>>       leaving out the binary Oracle file dependency.  distribute
> >>>       bioperl-db from cvs as-is
> >>>   (2) patch Oracle classes out of bioperl-db as part of the rpm build
> >>>       process.  distribute modified bioperl-db.
> >>>   (3) modify rpm "detection of installed perl modules" functionality
> >>>       to have rpm explicitly ignore missing DBD::Oracle dependency.
> >>>
> >>> (1) and (2) will definitely work.  i don't yet know the feasibility 
> >>> of
> >>> (3).
> >>>
> >>> -allen
> >>>
> >>
> >>
> 
From perlguy at hotmail.com  Sat Jan 29 16:34:04 2005
From: perlguy at hotmail.com (Philip Parker)
Date: Sat Jan 29 16:30:53 2005
Subject: [Bioperl-l] Request for info on volunteering...
Message-ID: <BAY13-F3815008721E7127FCDE1CCAF7A0@phx.gbl>

I'm curious about volunteering for the BioPerl project. I have 4 years of 
professional Perl programming experience and have an interest in 
bioinformatics.

Philip Parker -  perlguy@hotmail.com


From hlapp at gmx.net  Sat Jan 29 18:19:25 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Jan 29 18:15:19 2005
Subject: [Bioperl-l] struggling with Bio::FeatureIO and
	Bio::SeqFeature::Annotated
In-Reply-To: <Pine.LNX.4.58.0501250141240.22458@sumo.ctrl.ucla.edu>
Message-ID: <374835EA-724C-11D9-A311-000A959EB4C4@gmx.net>


On Tuesday, January 25, 2005, at 01:45  AM, Allen Day wrote:

>>
>> Also, do you think it will be possible to convert the 
>> Bio::SeqFeature::Annotated features into persistent ones so that 
>> these can be stored in BioSQL ? I'll try to test that out today.
>
> no idea.  my guess is not without substantial effort.
>

There shouldn't be a problem to serialize them unless 
SeqFeature::Annotated does not implement SeqFeatureI.

The problem is rather that you will get them out in a slightly 
different fashion.

Provided my understanding of SeqFeature::Annotated is correct (which it 
may not be!) then all tags be treated (stored) equally as any others, 
unlike SeqFeature::Generic which has methods primary_tag and source_tag 
that store their values separately.

So, upon retrieval of such a feature you would probably have the 
primary_tag and source_tag values in the tag/value system as well. This 
may or may not be an issue.

Furthermore, SeqFeature::Annotated does away with tag/value plus 
annotation bundle and stores everything in the latter. Bioperl-db uses 
SeqFeature::AnnotationAdaptor to access a feature's tags and 
annotations as if there only was an annotation bundle, which is what 
SeqFeature::Annotated does too but AnnotationAdaptor assumes that the 
underlying SeqFeatureI implementation stores them separately. The 
result is that when you plug a SeqFeature::Annotated into 
SeqFeature::Annotation, every tag/value may be reported both by the 
plugged feature's get_tag_values() and annotation->get_Annotations() 
methods, which may lead to redundant storage (and retrieval).

So at worst you may get duplication of all tag/value pairs for a 
feature.

If you retrieve features directly (instead of automatically as those 
attached to the sequence you retrieved), then you may even be able to 
circumvent this problem by providing a SeqFeatureI factory that 
instantiates SeqFeature::Annotated instead of SeqFeature::Generic 
(which is the default). Bioperl-db will again set the tag/value 
properties through the AnnotationAdaptor, but if the plugged feature is 
a SeqFeature::Annotated instance, it may take care of the duplication 
because redundant set operations will probably overwrite the previous 
one (because everything is stored in the annotation bundle).

Bottom line is, provided SeqFeature::Annotated implements SeqFeatureI 
it will be stored - just the result may have some redundancy in the 
annotation and tags. To know exactly it would need to be debugged, 
which I think nobody's done yet.

Also, if I'm wrong w.r.t. SeqFeature::Annotated's behaviour, any 
education from its authors will be welcome ...

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From sutripa at vbi.vt.edu  Sat Jan 29 23:17:58 2005
From: sutripa at vbi.vt.edu (Sucheta Tripathy)
Date: Sat Jan 29 23:14:04 2005
Subject: [Bioperl-l] GD::Font does not work
Message-ID: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu>


Hi,

I searched the archive but could not find a solution to this problem. My
apologies, if this problem has already been discussed.

Recently some of my CGI scripts threw error message saying
"premature end of script headers". Finally, I found out after running the
script in commandline that, it gives a segmentation fault wherever there
is a use of: GD::Font.

example:
my $image=GD::Image->new();
$image->string(gdLargeFont,.....) etc.

After commenting these lines the script runs fine.

I tried re-installing GD, but it's saying GD up to date.

Is there a way around to make this working?

Many thanks

Sucheta

-- 
Sucheta Tripathy
Virginia Bioinformatics Institute Phase-I
Washington street.
Virginia Tech.
Blacksburg,VA 24061-0447
phone:(540)231-8138
Fax:  (540) 231-2606
From Guillaume.Rousse at inria.fr  Sun Jan 30 12:23:26 2005
From: Guillaume.Rousse at inria.fr (Guillaume Rousse)
Date: Sun Jan 30 12:19:24 2005
Subject: [Bioperl-l] various problems with mdk bioperl package
Message-ID: <41FD180E.4040103@inria.fr>

Hello. I'm the maintainer for mdk bioperl packages. Here are some 
problems I had with latest release.

First, one test fail as part of the whole rpm build process:
t/SeqFeatCollection..........FAILED test 425
         Failed 1/432 tests, 99.77% okay

However, I'm unable to reproduce it manually issuing a 'make test' 
command in the build directory.
perl -Iblib t/SeqFeatCollection.t -> OK

Using BIOPERLDEBUG to have more verbose output, other tests fails:
not ok 421
# Test 421 got: '0' (t/SeqFeatCollection.t at line 156 fail #406)
#     Expected: '6'
not ok 423
# Test 423 got: '0' (t/SeqFeatCollection.t at line 156 fail #408)
#     Expected: '4'

This is perl 5.8.6, without thread support, on mandrake cooker. There is 
no special environment variable used during package building that could 
explain the different results.

Second, man page generation is disabled by default, using some strange 
construction in Makefile.PL:
sub MY::manifypods {
   my $self = shift;
   #print STDERR "In manifypods moment\n";
   if( 1 ) {
     return "\nmanifypods : pure_all\n\t$self->{NOECHO}\$(NOOP)\n"
   }
   else {
     return $self->SUPER::manifypods(@_);
   }
}
If the goal is just to avoid man page generation, why not
INSTALLMAN3DIR => undef ?

Third, I'd like to split the package a little bit, to avoid drawing so 
much dependencies. Here is the whole list of external dependencies of 
the current package, as automatically computed by rpm:
perl(CGI)
perl(CGI::Carp)
perl(Cache::FileCache)
perl(Carp)
perl(Class::AutoClass)
perl(Clone)
perl(DBI)
perl(DB_File)
perl(Data::Dumper)
perl(Data::Stag)
perl(Data::Stag::XMLWriter)
perl(Digest::MD5)
perl(Dumpvalue)
perl(English)
perl(Error)
perl(Exporter)
perl(Fcntl)
perl(File::Basename)
perl(File::Path)
perl(File::Spec)
perl(File::Temp)
perl(FileHandle)
perl(GD)
perl(Getopt::Long)
perl(Getopt::Std)
perl(HTML::Entities)
perl(HTML::HeadParser)
perl(HTML::Parser)
perl(HTTP::Request::Common)
perl(HTTP::Response)
perl(IO::File)
perl(IO::Handle)
perl(IO::Socket)
perl(IO::String)
perl(LWP)
perl(LWP::Simple)
perl(LWP::UserAgent)
perl(Math::BigFloat)
perl(POSIX)
perl(Pod::Usage)
perl(SOAP::Lite)
perl(SVG::Graph)
perl(SVG::Graph::Data)
perl(SVG::Graph::Data::Node)
perl(SVG::Graph::Data::Tree)
perl(Storable)
perl(Symbol)
perl(TestInterface)
perl(Text::Shellwords)
perl(Text::Wrap)
perl(Tie::Handle)
perl(Tie::RefHash)
perl(Tree::DAG_Node)
perl(UNIVERSAL)
perl(URI)
perl(URI::Escape)
perl(XML::DOM)
perl(XML::DOM::XPath)
perl(XML::Handler::Subs)
perl(XML::Parser)
perl(XML::Parser::PerlSAX)
perl(XML::SAX)
perl(XML::SAX::Base)
perl(XML::SAX::Writer)
perl(XML::Twig)
perl(XML::Writer) >= 0.4

Just having the Bio::DB branch in a subpackage would be enough to avoid 
a mandatory dependency on Ace. What else could I split ?

Fourth, they are still two scripts in the main bioperl archive relying 
on bioperl-run: bp_pairwise_kaks.pl and bp_blast2tree.pl. They should 
really be moved there, to avoid circular dependencies.
-- 
The engine falls out of the car the day after the warranty expires
		-- Murphy's Driving Laws n?18
From allenday at ucla.edu  Sun Jan 30 15:26:57 2005
From: allenday at ucla.edu (Allen Day)
Date: Sun Jan 30 15:22:54 2005
Subject: [Bioperl-l] various problems with mdk bioperl package
In-Reply-To: <41FD180E.4040103@inria.fr>
References: <41FD180E.4040103@inria.fr>
Message-ID: <Pine.LNX.4.58.0501301224440.10644@sumo.ctrl.ucla.edu>

I was thinking about this as well.  I agree that bioperl-db requiring 
scripts should be moved out of the bioperl-live repository.

We might also think about distributing a bioperl-core package that
contains Bio::Root::*, and separate packages for each of the *IO
subsystems (SeqIO, SearchIO, FeatureIO, etc).

-Allen


> Third, I'd like to split the package a little bit, to avoid drawing so 
> much dependencies. Here is the whole list of external dependencies of 
> the current package, as automatically computed by rpm:
> perl(CGI)
> perl(CGI::Carp)
> perl(Cache::FileCache)
> perl(Carp)
> perl(Class::AutoClass)
> perl(Clone)
> perl(DBI)
> perl(DB_File)
> perl(Data::Dumper)
> perl(Data::Stag)
> perl(Data::Stag::XMLWriter)
> perl(Digest::MD5)
> perl(Dumpvalue)
> perl(English)
> perl(Error)
> perl(Exporter)
> perl(Fcntl)
> perl(File::Basename)
> perl(File::Path)
> perl(File::Spec)
> perl(File::Temp)
> perl(FileHandle)
> perl(GD)
> perl(Getopt::Long)
> perl(Getopt::Std)
> perl(HTML::Entities)
> perl(HTML::HeadParser)
> perl(HTML::Parser)
> perl(HTTP::Request::Common)
> perl(HTTP::Response)
> perl(IO::File)
> perl(IO::Handle)
> perl(IO::Socket)
> perl(IO::String)
> perl(LWP)
> perl(LWP::Simple)
> perl(LWP::UserAgent)
> perl(Math::BigFloat)
> perl(POSIX)
> perl(Pod::Usage)
> perl(SOAP::Lite)
> perl(SVG::Graph)
> perl(SVG::Graph::Data)
> perl(SVG::Graph::Data::Node)
> perl(SVG::Graph::Data::Tree)
> perl(Storable)
> perl(Symbol)
> perl(TestInterface)
> perl(Text::Shellwords)
> perl(Text::Wrap)
> perl(Tie::Handle)
> perl(Tie::RefHash)
> perl(Tree::DAG_Node)
> perl(UNIVERSAL)
> perl(URI)
> perl(URI::Escape)
> perl(XML::DOM)
> perl(XML::DOM::XPath)
> perl(XML::Handler::Subs)
> perl(XML::Parser)
> perl(XML::Parser::PerlSAX)
> perl(XML::SAX)
> perl(XML::SAX::Base)
> perl(XML::SAX::Writer)
> perl(XML::Twig)
> perl(XML::Writer) >= 0.4
From hlapp at gmx.net  Sun Jan 30 21:35:36 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun Jan 30 21:31:33 2005
Subject: [Bioperl-l] various problems with mdk bioperl package
In-Reply-To: <Pine.LNX.4.58.0501301224440.10644@sumo.ctrl.ucla.edu>
Message-ID: <CA07A490-7330-11D9-BDB0-000A959EB4C4@gmx.net>

I think you may be confusing Bio::DB::GFF with Bio::DB::BioSQL. AFAIK 
there hasn't been any test in bioperl-live that would require 
bioperl-db since a long time.

	-hilmar
On Sunday, January 30, 2005, at 12:26  PM, Allen Day wrote:

> I was thinking about this as well.  I agree that bioperl-db requiring
> scripts should be moved out of the bioperl-live repository.
>
> We might also think about distributing a bioperl-core package that
> contains Bio::Root::*, and separate packages for each of the *IO
> subsystems (SeqIO, SearchIO, FeatureIO, etc).
>
> -Allen
>
>
>> Third, I'd like to split the package a little bit, to avoid drawing so
>> much dependencies. Here is the whole list of external dependencies of
>> the current package, as automatically computed by rpm:
>> perl(CGI)
>> perl(CGI::Carp)
>> perl(Cache::FileCache)
>> perl(Carp)
>> perl(Class::AutoClass)
>> perl(Clone)
>> perl(DBI)
>> perl(DB_File)
>> perl(Data::Dumper)
>> perl(Data::Stag)
>> perl(Data::Stag::XMLWriter)
>> perl(Digest::MD5)
>> perl(Dumpvalue)
>> perl(English)
>> perl(Error)
>> perl(Exporter)
>> perl(Fcntl)
>> perl(File::Basename)
>> perl(File::Path)
>> perl(File::Spec)
>> perl(File::Temp)
>> perl(FileHandle)
>> perl(GD)
>> perl(Getopt::Long)
>> perl(Getopt::Std)
>> perl(HTML::Entities)
>> perl(HTML::HeadParser)
>> perl(HTML::Parser)
>> perl(HTTP::Request::Common)
>> perl(HTTP::Response)
>> perl(IO::File)
>> perl(IO::Handle)
>> perl(IO::Socket)
>> perl(IO::String)
>> perl(LWP)
>> perl(LWP::Simple)
>> perl(LWP::UserAgent)
>> perl(Math::BigFloat)
>> perl(POSIX)
>> perl(Pod::Usage)
>> perl(SOAP::Lite)
>> perl(SVG::Graph)
>> perl(SVG::Graph::Data)
>> perl(SVG::Graph::Data::Node)
>> perl(SVG::Graph::Data::Tree)
>> perl(Storable)
>> perl(Symbol)
>> perl(TestInterface)
>> perl(Text::Shellwords)
>> perl(Text::Wrap)
>> perl(Tie::Handle)
>> perl(Tie::RefHash)
>> perl(Tree::DAG_Node)
>> perl(UNIVERSAL)
>> perl(URI)
>> perl(URI::Escape)
>> perl(XML::DOM)
>> perl(XML::DOM::XPath)
>> perl(XML::Handler::Subs)
>> perl(XML::Parser)
>> perl(XML::Parser::PerlSAX)
>> perl(XML::SAX)
>> perl(XML::SAX::Base)
>> perl(XML::SAX::Writer)
>> perl(XML::Twig)
>> perl(XML::Writer) >= 0.4
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Sun Jan 30 23:10:49 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun Jan 30 23:07:20 2005
Subject: [Bioperl-l] struggling with Bio::FeatureIO and
	Bio::SeqFeature::Annotated
In-Reply-To: <Pine.LNX.4.58.0501250141240.22458@sumo.ctrl.ucla.edu>
Message-ID: <172B7F31-733E-11D9-BDB0-000A959EB4C4@gmx.net>


On Tuesday, January 25, 2005, at 01:45  AM, Allen Day wrote:

>>> because Bio::SeqFeautre::Annotated holds annotations as
>>> objects pointers
>>> rather than strings.  We can fix this with a stringification
>>> overload, but I noticed that the code exists to do this in the 
>>> Bio::Annotation::*
>>> classes but is commented out, and I'm not sure why.  Maybe
>>> Hilmar can shed some light on this.
>>>

sorry I think I missed this. I don't know what pieces of code you're 
talking about, so I can't shed light either. Where did you see the 
commented out stringification overload? I checked SimpleValue and 
couldn't see anything.

Generally, I'd comment that if a method is supposed to return an array 
of strings but in violation returns an array of objects, then adding 
stringification overload to the returned objects' implementations is 
the wrong strategy.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From lstein at cshl.edu  Mon Jan 31 10:21:15 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Jan 31 13:16:01 2005
Subject: [Bioperl-l] Re: [GMOD-devel] Re: RPMs for Bioperl and GMOD
In-Reply-To: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
Message-ID: <200501311021.16482.lstein@cshl.edu>

I agree with Hilmar's sentiment.  If it is possible, the RPMs should 
only install what is necessary to bring up the core functionality of 
the modules in question.

Lincoln


On Friday 28 January 2005 04:49 pm, Hilmar Lapp wrote:
> Like this statement or not, but I think installing all kinds of
> CPAN packages onto somebody's machine irrespective of whether
> somebody is ever going to use - or need - them, let alone them
> working in the first place due to compiled code dependencies being
> absent, is a really *bad* idea
>
> It basically defies the concept of modular packaging to begin with,
> and sounds way too intrusive for my taste.
>
> Unless I misunderstand what Jason is saying then this is not even
> necessary and is in no way an inherent shortcoming that inevitably
> comes with RPMs. So unless I'm missing something here I understand
> that Jason is saying you can have RPMs and still not litter your
> system with DBD::blah or other modules for which you don't even
> have the client libraries installed, and still be able to install
> those at a later time because the respective pieces of code have
> not been pruned (which I think is actually also a bad idea).
>
> 	-hilmar
>
> On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
> >> Do you mean your RPM or bioperl-db on Oracle? I'm running the
> >> latter all the time.
> >
> > i mean the RPM.  it is the same as bioperl-db cvs head as of last
> > night.
> >
> >>> I'd also like someone with Oracle to help me make a DBD::Oracle
> >>> rpm. Having a DBD::Oracle RPM will allow me to leave the Oracle
> >>> code in Bioperl-DB.
> >>
> >> If installing the supposed DBD::Oracle is then a prerequisite
> >> for being
> >> able to install the rest, then you are taking the wrong path.
> >> DBD::Oracle itself will depend on the Oracle client libraries
> >> being installed which aren't even available on all platforms,
> >> aside from the fact that installing those is beyond your control
> >> and involves downloading about 350MB from OTN.
> >>
> >> Frankly, I can't believe that there is no way to specify
> >> dependencies that are optional. Why would you require all of
> >> DBD::mysql, DBD::Pg, and
> >> DBD::Oracle if all a persons wants is mysql?? All of these will
> >> link to
> >> compiled runtime libraries and why should a failure to install
> >> DBD::Pg be of any concern to someone who wants to use mysql?
> >
> > the problem is something internal to the rpm installer -- it
> > determines perl library dependencies at install-time rather than
> > requiring you to explicitly specify perl packages in the rpm
> > metafiles (aka specfile).
> >
> > so, for instance, if i i tried to install
> > perl-Generic-Genome-Browser, i
> > might get an error like:
> >
> >   requires perl(Bio::Root::Root)
> >
> > which could be removed by one of:
> >
> >   (1) installing the perl-bioperl package
> >   (2) installing bioperl from cvs
> >   (3) installing bioperl from cpan
> >
> > there may be a way to code into the metafile to ignore missing
> > perl dependencies detected in the installation process -- i need
> > to look into
> > this.
> >
> >> BTW DBD::Oracle is on CPAN. I thought that would make it easy to
> >> construct an RPM? (There's few if any binaries though - for a
> >> reason. Compiling DBD::Oracle may be a charm on some but involve
> >> some major tweaking on other platforms. I've been there multiple
> >> times, I know what I'm talking about.)
> >
> > given what i've said above, if i had a DBD::Oracle perl module
> > installed,
> > it would prevent rpm from throwing errors about missing
> > dependency "perl(DBD::Oracle)".  however, i can't build
> > DBD::Oracle into an rpm because the make process links to the
> > oracle headers and .so files. the
> > DBD::Oracle can be made w/o having explicit dependencies on the
> > oracle binary install, so it would install on a machine that
> > didn't have oracle
> > installed (but wouldn't work).  so as far as a bioperl-db rpm
> > goes, here
> > are the options i'm looking into:
> >
> >   (1) get a binary perl-DBD-Oracle rpm built by someone with
> > Oracle, leaving out the binary Oracle file dependency. 
> > distribute bioperl-db from cvs as-is
> >   (2) patch Oracle classes out of bioperl-db as part of the rpm
> > build process.  distribute modified bioperl-db.
> >   (3) modify rpm "detection of installed perl modules"
> > functionality to have rpm explicitly ignore missing DBD::Oracle
> > dependency.
> >
> > (1) and (2) will definitely work.  i don't yet know the
> > feasibility of (3).
> >
> > -allen

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/e98fc593/attachment-0001.bin
From lstein at cshl.edu  Mon Jan 31 10:35:40 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Jan 31 13:16:05 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
Message-ID: <200501311035.40955.lstein@cshl.edu>

Perhaps we should split the modules into bioperl-db and 
bioperl-db-oracle.

And so forth.

Lincoln


On Friday 28 January 2005 07:49 pm, Allen Day wrote:
> okay, i've looked into this.  short answer: you cannot specify to
> omit automatically determined dependencies without "lying" in the
> rpm specfile and stating that a package provides a perl module that
> it, in fact, does not.
>
> for example, i can add a statement to the bioperl-db rpm stating
> that it provides perl(DBD::Oracle), but not actually add
> DBD/Oracle.pm to the package.  there is a thread extensively
> discussing this aspect of the rpm build system here:
>
> http://www.redhat.com/archives/rpm-list/2004-February/msg00083.html
>
> if i'm making a package for private use only, i don't mind doing
> this, but if this package is to be for public consumption i don't
> want to lie about what is and is not provided.  i take the same
> stance on all the other perl modules in the bioperl dependency
> tree, including esoteric modules such as Net::Jabber and
> GD::Graph3d.
>
> the only viable option i see here is to patch Oracle dependencies
> out of bioperl-db.  that is what i will do until i have working
> Oracle and perl-DBD-Oracle packages in-hand.
>
> -allen
>
> On Fri, 28 Jan 2005, Hilmar Lapp wrote:
> > Like this statement or not, but I think installing all kinds of
> > CPAN packages onto somebody's machine irrespective of whether
> > somebody is ever going to use - or need - them, let alone them
> > working in the first place due to compiled code dependencies
> > being absent, is a really *bad* idea
> >
> > It basically defies the concept of modular packaging to begin
> > with, and sounds way too intrusive for my taste.
> >
> > Unless I misunderstand what Jason is saying then this is not even
> > necessary and is in no way an inherent shortcoming that
> > inevitably comes with RPMs. So unless I'm missing something here
> > I understand that Jason is saying you can have RPMs and still not
> > litter your system with DBD::blah or other modules for which you
> > don't even have the client libraries installed, and still be able
> > to install those at a later time because the respective pieces of
> > code have not been pruned (which I think is actually also a bad
> > idea).
> >
> > 	-hilmar
> >
> > On Friday, January 28, 2005, at 11:50  AM, Allen Day wrote:
> > >> Do you mean your RPM or bioperl-db on Oracle? I'm running the
> > >> latter all the time.
> > >
> > > i mean the RPM.  it is the same as bioperl-db cvs head as of
> > > last night.
> > >
> > >>> I'd also like someone with Oracle to help me make a
> > >>> DBD::Oracle rpm. Having a DBD::Oracle RPM will allow me to
> > >>> leave the Oracle code in Bioperl-DB.
> > >>
> > >> If installing the supposed DBD::Oracle is then a prerequisite
> > >> for being
> > >> able to install the rest, then you are taking the wrong path.
> > >> DBD::Oracle itself will depend on the Oracle client libraries
> > >> being installed which aren't even available on all platforms,
> > >> aside from the fact that installing those is beyond your
> > >> control and involves downloading about 350MB from OTN.
> > >>
> > >> Frankly, I can't believe that there is no way to specify
> > >> dependencies that are optional. Why would you require all of
> > >> DBD::mysql, DBD::Pg, and
> > >> DBD::Oracle if all a persons wants is mysql?? All of these
> > >> will link to
> > >> compiled runtime libraries and why should a failure to install
> > >> DBD::Pg be of any concern to someone who wants to use mysql?
> > >
> > > the problem is something internal to the rpm installer -- it
> > > determines perl library dependencies at install-time rather
> > > than requiring you to explicitly specify perl packages in the
> > > rpm metafiles (aka specfile).
> > >
> > > so, for instance, if i i tried to install
> > > perl-Generic-Genome-Browser, i
> > > might get an error like:
> > >
> > >   requires perl(Bio::Root::Root)
> > >
> > > which could be removed by one of:
> > >
> > >   (1) installing the perl-bioperl package
> > >   (2) installing bioperl from cvs
> > >   (3) installing bioperl from cpan
> > >
> > > there may be a way to code into the metafile to ignore missing
> > > perl dependencies detected in the installation process -- i
> > > need to look into
> > > this.
> > >
> > >> BTW DBD::Oracle is on CPAN. I thought that would make it easy
> > >> to construct an RPM? (There's few if any binaries though - for
> > >> a reason. Compiling DBD::Oracle may be a charm on some but
> > >> involve some major tweaking on other platforms. I've been
> > >> there multiple times, I know what I'm talking about.)
> > >
> > > given what i've said above, if i had a DBD::Oracle perl module
> > > installed,
> > > it would prevent rpm from throwing errors about missing
> > > dependency "perl(DBD::Oracle)".  however, i can't build
> > > DBD::Oracle into an rpm because the make process links to the
> > > oracle headers and .so files. the
> > > DBD::Oracle can be made w/o having explicit dependencies on the
> > > oracle binary install, so it would install on a machine that
> > > didn't have oracle
> > > installed (but wouldn't work).  so as far as a bioperl-db rpm
> > > goes, here
> > > are the options i'm looking into:
> > >
> > >   (1) get a binary perl-DBD-Oracle rpm built by someone with
> > > Oracle, leaving out the binary Oracle file dependency. 
> > > distribute bioperl-db from cvs as-is
> > >   (2) patch Oracle classes out of bioperl-db as part of the rpm
> > > build process.  distribute modified bioperl-db.
> > >   (3) modify rpm "detection of installed perl modules"
> > > functionality to have rpm explicitly ignore missing DBD::Oracle
> > > dependency.
> > >
> > > (1) and (2) will definitely work.  i don't yet know the
> > > feasibility of (3).
> > >
> > > -allen

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/79d67bf9/attachment-0001.bin
From lstein at cshl.edu  Mon Jan 31 10:52:14 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Jan 31 13:16:09 2005
Subject: [Bioperl-l] GD::Font does not work
In-Reply-To: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu>
References: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu>
Message-ID: <200501311052.14693.lstein@cshl.edu>

Try removing old versions of libgd (the C library, not the perl 
module) and installing libgd 2.0.33 or higher.

Lincoln

On Saturday 29 January 2005 11:17 pm, Sucheta Tripathy wrote:
> Hi,
>
> I searched the archive but could not find a solution to this
> problem. My apologies, if this problem has already been discussed.
>
> Recently some of my CGI scripts threw error message saying
> "premature end of script headers". Finally, I found out after
> running the script in commandline that, it gives a segmentation
> fault wherever there is a use of: GD::Font.
>
> example:
> my $image=GD::Image->new();
> $image->string(gdLargeFont,.....) etc.
>
> After commenting these lines the script runs fine.
>
> I tried re-installing GD, but it's saying GD up to date.
>
> Is there a way around to make this working?
>
> Many thanks
>
> Sucheta

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/01c1569a/attachment-0002.bin
From lstein at cshl.edu  Mon Jan 31 10:52:14 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Jan 31 13:16:12 2005
Subject: [Bioperl-l] GD::Font does not work
In-Reply-To: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu>
References: <1182.199.3.136.4.1107058678.squirrel@webmail.vbi.vt.edu>
Message-ID: <200501311052.14693.lstein@cshl.edu>

Try removing old versions of libgd (the C library, not the perl 
module) and installing libgd 2.0.33 or higher.

Lincoln

On Saturday 29 January 2005 11:17 pm, Sucheta Tripathy wrote:
> Hi,
>
> I searched the archive but could not find a solution to this
> problem. My apologies, if this problem has already been discussed.
>
> Recently some of my CGI scripts threw error message saying
> "premature end of script headers". Finally, I found out after
> running the script in commandline that, it gives a segmentation
> fault wherever there is a use of: GD::Font.
>
> example:
> my $image=GD::Image->new();
> $image->string(gdLargeFont,.....) etc.
>
> After commenting these lines the script runs fine.
>
> I tried re-installing GD, but it's saying GD up to date.
>
> Is there a way around to make this working?
>
> Many thanks
>
> Sucheta

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/01c1569a/attachment-0003.bin
From lstein at cshl.edu  Mon Jan 31 12:00:11 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Jan 31 13:16:15 2005
Subject: Fwd: Re: [Bioperl-l] GD::Font does not work
Message-ID: <200501311200.11743.lstein@cshl.edu>

This is a followup from Sucheta, who was able to fix the GD::Font 
issues by updating libgd (the C library, not the perl module).

Lincoln

----------  Forwarded Message  ----------

Subject: Re: [Bioperl-l] GD::Font does not work
Date: Monday 31 January 2005 11:25 am
From: Sucheta Tripathy <sutripa@vbi.vt.edu>
To: Lincoln Stein <lstein@cshl.edu>

Thanks, I already did that and it worked fine for me.

Sucheta

At 10:52 AM 1/31/2005 -0500, you wrote:
>Try removing old versions of libgd (the C library, not the perl
>module) and installing libgd 2.0.33 or higher.
>
>Lincoln
>
>On Saturday 29 January 2005 11:17 pm, Sucheta Tripathy wrote:
> > Hi,
> >
> > I searched the archive but could not find a solution to this
> > problem. My apologies, if this problem has already been
> > discussed.
> >
> > Recently some of my CGI scripts threw error message saying
> > "premature end of script headers". Finally, I found out after
> > running the script in commandline that, it gives a segmentation
> > fault wherever there is a use of: GD::Font.
> >
> > example:
> > my $image=GD::Image->new();
> > $image->string(gdLargeFont,.....) etc.
> >
> > After commenting these lines the script runs fine.
> >
> > I tried re-installing GD, but it's saying GD up to date.
> >
> > Is there a way around to make this working?
> >
> > Many thanks
> >
> > Sucheta
>
>--
>Lincoln D. Stein
>Cold Spring Harbor Laboratory
>1 Bungtown Road
>Cold Spring Harbor, NY 11724
>
>NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
>all emails regarding scheduling and other time-critical topics.

-------------------------------------------------------

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050131/62578482/attachment-0001.bin
From Guillaume.Rousse at inria.fr  Mon Jan 31 15:45:32 2005
From: Guillaume.Rousse at inria.fr (Guillaume Rousse)
Date: Mon Jan 31 15:42:23 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <200501311035.40955.lstein@cshl.edu>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
	<200501311035.40955.lstein@cshl.edu>
Message-ID: <41FE98EC.4010008@inria.fr>

I'm taking the discussion in the middle, so I may be wrong...

Lincoln Stein wrote:
> Perhaps we should split the modules into bioperl-db and 
> bioperl-db-oracle.
This isn't needed. Splitting a package into subpackages is a packager 
decision that doesn't rely on upstream developpers action. It would just 
bring everyone additional work.

> And so forth.
> 
> Lincoln
> 
> 
> On Friday 28 January 2005 07:49 pm, Allen Day wrote:
> 
>>okay, i've looked into this.  short answer: you cannot specify to
>>omit automatically determined dependencies without "lying" in the
>>rpm specfile and stating that a package provides a perl module that
>>it, in fact, does not.
>>
>>for example, i can add a statement to the bioperl-db rpm stating
>>that it provides perl(DBD::Oracle), but not actually add
>>DBD/Oracle.pm to the package.
I don't think so. Unless this is a specific mdk rpm patch, you can 
always use exceptions to automatic requires/provides computing:
%define _requires_exceptions perl(DBD::Oracle)

And if it doesn't work, you can also disable completly automatic 
dependency computing:
AutoReqProv: no

BTW, why do you bother dealing with rpm when some distributions as 
Debian or Mandrake already provide official packages, and biolinux 
project provide Redhat and Suze packages too ?

-- 
If you improve or tinker with something long enough, eventually it will 
break or malfunction
		-- Murphy's In Laws n?8
From hlapp at gnf.org  Mon Jan 31 16:28:25 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jan 31 16:24:32 2005
Subject: [Bioperl-l] Re: RPMs for Bioperl and GMOD
In-Reply-To: <200501311035.40955.lstein@cshl.edu>
References: <8C2D5F1C-7176-11D9-9251-000A959EB4C4@gnf.org>
	<Pine.LNX.4.58.0501281633250.22585@sumo.ctrl.ucla.edu>
	<200501311035.40955.lstein@cshl.edu>
Message-ID: <0A944023-73CF-11D9-9995-000A95AE92B0@gnf.org>


On Jan 31, 2005, at 7:35 AM, Lincoln Stein wrote:

> Perhaps we should split the modules into bioperl-db and
> bioperl-db-oracle.
>
> And so forth.

Sure you could ... but where do you draw the line? E.g., 
gbrowse-pgsql-png-no-gif-SVG-no-staden-Ace-berkeleyDB.rpm ... I mean, 
applying this to all dependencies will lead to permutations of several 
dependencies - which I'm not sure will further the goal of making it 
easier on the end user's end ...

Just my two cents ...

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From allenday at ucla.edu  Mon Jan 31 17:27:36 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon Jan 31 17:23:42 2005
Subject: [Bioperl-l] [Bioperl-guts-l] [Bug 1742] New: GFF parser messes
	attributes (fwd)
Message-ID: <Pine.LNX.4.58.0501311424530.17005@sumo.ctrl.ucla.edu>

Here's the stringification problem being discussed in another thread.  It
came up in a 1.5 branch bug report.  Objections to putting the
stringification overload back?

-Allen

---------- Forwarded message ----------
Date: Mon, 31 Jan 2005 15:24:19 -0500
From: bugzilla-daemon@portal.open-bio.org
To: bioperl-guts-l@bioperl.org
Subject: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes

http://bugzilla.open-bio.org/show_bug.cgi?id=1742

           Summary: GFF parser messes attributes
           Product: Bioperl
           Version: 1.5 branch
          Platform: Macintosh
        OS/Version: MacOS X
            Status: NEW
          Severity: major
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l@bioperl.org
        ReportedBy: jldai@yahoo.com


In BioPerl 1.5.0, use Bio::Tools::GFF and Bio::SeqIO to paser GFF string:

8255763	tigrscan	final-exon	67	558	56.8	-	2	transgrp "1001";

 into an Bio::SeqIO object and later print out as embl file, resulting in:

FT   final-exon      complement(67..558)
FT                   /transgrp="Bio::Annotation::SimpleValue=HASH(0x93a5d8)"
FT                   /note="score=56.8"
FT                   /note="frame=2"

The value of tag "transgrp" should be 1001.

Same script worked fine in BioPerl-1.4


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
_______________________________________________
Bioperl-guts-l mailing list
Bioperl-guts-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
From ed at compbio.berkeley.edu  Mon Jan 31 18:07:56 2005
From: ed at compbio.berkeley.edu (Ed Green)
Date: Mon Jan 31 18:05:07 2005
Subject: [Bioperl-l] dssp script
In-Reply-To: <41F13730.1080203@compbio.berkeley.edu>
References: <5F7CE35370B6CF429AA3CA960ECC278001638D65@EXCHANGE2.charite.de>
	<41F13730.1080203@compbio.berkeley.edu>
Message-ID: <41FEBA4C.3090906@compbio.berkeley.edu>

Peter-
Just checked in these fixes to Bio::Structure::SecStr::DSSP:Res.pm
You may find the new residues() interator method useful.

Regards,
Ed Green


Ed Green wrote:
> Dear Peter,
> 
> These two are in fact bugs that I will fix. The first results because of 
>  the presence of 'termination residues' that don't have residue numbers. 
> Their residue numbers, then, can't be compared numerically. Fortunately, 
> this bug won't result in wrong results as we want this comparison to 
> always be false anyway. The solution to this is to first check if either 
> of the termination residue signals are set and if so, don't do this 
> numerical comparison.
> 
> The second, blank line(s) at end of file will also be fixed.
> 
> Beware that there is, I think, a bug in your script. It appears that you 
> are attempting to iterate over all residues. However, iterating A:1 .. 
> A:max doesn't get it done because of the crazy way residues can be 
> numbered in PDB files: you'll miss all the residues with altloc codes 
> (A:27A, A:27B, A:27C, e.g.).
> 
> To make this easy an iterator is called for. It will just return all 
> 'real' residues for the pdb file or for a specified chain - I'll try to 
> get that done this weekend.
> 
> Regards,
> Ed Green
> 
> Robinson, Peter wrote:
> 
>> Dear BioPerlers,
>>
>> I am writing a script to use the BioPerl DSSP module to print out a 
>> list of phi and psi angles for all applicable residues  of all chains. 
>> Although the results are correct, I get the following error message at 
>> the end of each chain:
>>
>> Argument "" isn't numeric in numeric eq (==) at 
>> /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 1168.
>>
>> and I am not quite sure where it is coming from. Perhaps I am using 
>> the wrong part of the API, but I am trying to get a list of all 
>> residues for each chain as follows:
>>
>> foreach my $ch (@chains) {
>>   my $ss_elements_pts = $dssp->secBounds($ch);
>>   print "Chain $ch:\n";
>>   my $pos = 0;
>>   my $max = 0;
>>   foreach my $stretch (@{$ss_elements_pts}) {
>>     my $start = $stretch->[0];
>>     my $end = $stretch->[1];     if ($end =~ m/(\d+)/) { $end = $1; }
>>        if ($end  > $max) { $max = $end; }
>>   }
>>   ## END is now the last residue in this chain
>>   for my $res (1..$max) {
>>     my $residueID = $res . ":" . $ch;
>>     my ($phi,$psi,$SS,$SSsum,$AA);
>>     eval { $phi = $dssp->resPhi($residueID);};
>>     etc.
>>
>> The full script is appended to the bottom of this mail.
>>
>>
>> I also noticed what might be a minor bug in the module DSSP/Res.pm; 
>> when I use dsspcmbi to analyze a PDB file, it produces a results file 
>> with an empty last line. This causes a crash:
>>
>> Use of uninitialized value in chomp at 
>> /usr/local/share/perl/5.8.4/Bio/Structure/SecStr/DSSP/Res.pm line 
>> 1284, <GEN1> line 955.
>>
>>
>>  If I manually remove this last empty line, there was no error. By 
>> adding the following line at Res.pm l.1284, you can fix the problem:
>>
>>
>>  while ( chomp( $cur = <$file> ) ) {
>>       next if ($cur =~ m/^\s*$/);  
>> *********************************************
>>     $res_num = substr( $cur, 0, 5 );
>>     $res_num =~ s/\s//g;
>>     $self->{ 'Res' }->[ $res_num ] = &_parseResLine( $cur );
>>     }
>> }
>>
>>
>>
>>
>> Thanks in adavance for any tips! Peter
>> Peter N. Robinson, M.D.
>> Institute of Medical Genetics
>> Charit? University Hospital
>> Augustenburger Platz 1
>> 13353 Berlin
>> Germany
>> ++49-30-450 569124
>> peter.robinson@charite.de
>> http://www.charite.de/ch/medgen/robinson
>> Beware of bugs in the above code; I have only proved it correct, not 
>> tried it. -Donald Knuth, computer scientist (1938- )
>>
>> ########################
>>
>> #!/usr/bin/perl -w
>> use IO::File;
>> use Bio::Structure::SecStr::DSSP::Res;
>> use Data::Dumper;
>>
>>
>> =pod
>> parseDSSP.pl
>> Script to parse the output of DSSP using the BioPerl module
>> Bio::Structure::SecStr::DSSP::Res. To use it, process a PDB
>> file with dssp or dsspcmbi, and pass the resulting file to this 
>> script. For more information on dssp and BioPerl see the
>> module documentation at http://bioperl.org
>>
>> @email peter.robinson@charite.de
>> 21 January, 2005
>>
>> =cut
>>
>>
>>
>> my $file = "pdb43ca.dssp";
>> my $dssp = new Bio::Structure::SecStr::DSSP::Res('-file'=> "$file");
>>
>> my $pdbID = $dssp->pdbID();
>> my $auth  = $dssp->pdbAuthor();
>> my $cmpd = $dssp->pdbCompound();
>> my $pdb_date = $dssp->pdbDate();
>> my $header = $dssp->pdbHeader();
>> my $pdbSource = $dssp->pdbSource();
>>
>> print "PDB entry $pdbID \n\tauthor:\t$auth",
>>   "\n\tCompound:\t$cmpd",
>>   "\n\tDate:\t$pdb_date",
>>   "\n\tHeader:\t$header",
>>   "\n\tsource:\t$pdbSource\n\n";
>>
>> my $totalRes = $dssp->numResidues();
>> print "Total residue count (all chains):$totalRes\n";
>>
>>
>> my $surArea= $dssp->totSurfArea();
>> print "Total accessible surface area:\t$surArea  (square Ang)\n";
>>
>>
>> my $chainRef = $dssp->chains();
>> my @chains = sort  @{$chainRef};
>> print "Chain[s]:\n";
>> foreach my $ch (@chains) {
>>   print "\t$ch";
>> }
>> print "\n";
>>
>> my $hb = $dssp->hBonds();
>> print "H BONDS.\n";
>> print "TYPE O(I)-->H-N(J): $hb->[0]\n",
>>    "IN PARALLEL BRIDGES: $hb->[1]\n",
>>    "IN ANTIPARALLEL BRIDGES $hb->[2]\n",
>>    "TYPE O(I)-->H-N(I-5) $hb->[3]\n",
>>    "TYPE O(I)-->H-N(I-4) $hb->[4]\n",
>>    "TYPE O(I)-->H-N(I-3) $hb->[5]\n",
>>    "TYPE O(I)-->H-N(I-2) $hb->[6]\n",
>>    "TYPE O(I)-->H-N(I-1) $hb->[7]\n",
>>    "TYPE O(I)-->H-N(I+0) $hb->[8]\n",
>>    "TYPE O(I)-->H-N(I+1) $hb->[9]\n",
>>    "TYPE O(I)-->H-N(I+2) $hb->[10]\n",
>>    "TYPE O(I)-->H-N(I+3) $hb->[11]\n",
>>    "TYPE O(I)-->H-N(I+4) $hb->[12]\n",
>>    "TYPE O(I)-->H-N(I+5) $hb->[13]\n",
>>   "\n";
>>
>>     
>> foreach my $ch (@chains) {
>>   my $ss_elements_pts = $dssp->secBounds($ch);
>>   print "Chain $ch:\n";
>>   my $pos = 0;
>>   my $max = 0;
>>   foreach my $stretch (@{$ss_elements_pts}) {
>>     my $start = $stretch->[0];
>>     my $end = $stretch->[1];     if ($end =~ m/(\d+)/) { $end = $1; }
>>        if ($end  > $max) { $max = $end; }
>>   }
>>   ## END is now the last residue in this chain
>>   for my $res (1..$max) {
>>     my $residueID = $res . ":" . $ch;
>>     my ($phi,$psi,$SS,$SSsum,$AA);
>>     eval { $phi = $dssp->resPhi($residueID);};
>>     eval { $psi = $dssp->resPsi($residueID);};
>>     eval { $SS = $dssp->resSecStr($residueID);};
>>     eval { $SSsum = $dssp->resSecStrSum($residueID);};
>>     $AA = $dssp->resAA($residueID);
>>     $phi = $phi || "n/a";
>>     $psi = $psi || "n/a";
>>     $SS = $SS || "-";
>>     my $SSclass;
>>     if ($SSsum eq "H") { $SSclass = "helix"; }
>>     elsif ($SSsum eq "T") { $SSclass = "turn"; }
>>     elsif ($SSsum eq "B") { $SSclass = "beta"; }
>>     else { $SSclass = $SSsum; }
>>     print "$residueID) [$AA] phi:$phi psi:$psi SecStruct: $SS 
>> ($SSclass) \n";
>>   }
>> }
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From hlapp at gnf.org  Mon Jan 31 18:34:29 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jan 31 18:30:52 2005
Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes
	attributes (fwd)
In-Reply-To: <Pine.LNX.4.58.0501311424530.17005@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0501311424530.17005@sumo.ctrl.ucla.edu>
Message-ID: <A6DFCF98-73E0-11D9-9995-000A95AE92B0@gnf.org>

This is coming from SeqFeature::Annotated, right? What's wrong with 
making SeqFeature::Annotated return a tag's value as a string as 
demanded by the contract instead of returning an object?

IMNSHO stringification overload band-aids rather than fixes the 
problem, and also introduces a trip wire that sooner or later will be 
triggered by someone unsuspecting. I.e., it makes the code more 
brittle, not more robust. You won't like me for this, but I do think 
it's the wrong strategy.

	-hilmar

On Jan 31, 2005, at 2:27 PM, Allen Day wrote:

> Here's the stringification problem being discussed in another thread.  
> It
> came up in a 1.5 branch bug report.  Objections to putting the
> stringification overload back?
>
> -Allen
>
> ---------- Forwarded message ----------
> Date: Mon, 31 Jan 2005 15:24:19 -0500
> From: bugzilla-daemon@portal.open-bio.org
> To: bioperl-guts-l@bioperl.org
> Subject: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1742
>
>            Summary: GFF parser messes attributes
>            Product: Bioperl
>            Version: 1.5 branch
>           Platform: Macintosh
>         OS/Version: MacOS X
>             Status: NEW
>           Severity: major
>           Priority: P2
>          Component: Core Components
>         AssignedTo: bioperl-guts-l@bioperl.org
>         ReportedBy: jldai@yahoo.com
>
>
> In BioPerl 1.5.0, use Bio::Tools::GFF and Bio::SeqIO to paser GFF 
> string:
>
> 8255763	tigrscan	final-exon	67	558	56.8	-	2	transgrp "1001";
>
>  into an Bio::SeqIO object and later print out as embl file, resulting 
> in:
>
> FT   final-exon      complement(67..558)
> FT                   
> /transgrp="Bio::Annotation::SimpleValue=HASH(0x93a5d8)"
> FT                   /note="score=56.8"
> FT                   /note="frame=2"
>
> The value of tag "transgrp" should be 1001.
>
> Same script worked fine in BioPerl-1.4
>
>
>
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

From allenday at ucla.edu  Mon Jan 31 19:56:21 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon Jan 31 19:52:11 2005
Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes
 attributes (fwd)
In-Reply-To: <A6DFCF98-73E0-11D9-9995-000A95AE92B0@gnf.org>
References: <Pine.LNX.4.58.0501311424530.17005@sumo.ctrl.ucla.edu>
	<A6DFCF98-73E0-11D9-9995-000A95AE92B0@gnf.org>
Message-ID: <Pine.LNX.4.58.0501311648070.17005@sumo.ctrl.ucla.edu>

This bug isn't coming from SeqFeature::Annotated, it's from the refactor
of SeqFeatureI to inherit from AnnotatableI.  Meaning get_tag_values() and
similar functions are now get/setting attributes to a
Bio::AnnotationColleciton store.  Simple strings passed in are turned into
objects when added to the store, and given back in object form.

It was never specified in Bio::SeqFeatureI, even before refactoring, that
the returned values of annotation tags should be strings.  Code using the
interface just assumed this was the case, and it wasn't a bad assumption
given that Bio::SeqFeature::Generic was the only instantiable class and
did use strings as values rather than objects.

-Allen


On Mon, 31 Jan 2005, Hilmar Lapp wrote:

> This is coming from SeqFeature::Annotated, right? What's wrong with 
> making SeqFeature::Annotated return a tag's value as a string as 
> demanded by the contract instead of returning an object?
> 
> IMNSHO stringification overload band-aids rather than fixes the 
> problem, and also introduces a trip wire that sooner or later will be 
> triggered by someone unsuspecting. I.e., it makes the code more 
> brittle, not more robust. You won't like me for this, but I do think 
> it's the wrong strategy.
> 
> 	-hilmar
> 
> On Jan 31, 2005, at 2:27 PM, Allen Day wrote:
> 
> > Here's the stringification problem being discussed in another thread.  
> > It
> > came up in a 1.5 branch bug report.  Objections to putting the
> > stringification overload back?
> >
> > -Allen
> >
> > ---------- Forwarded message ----------
> > Date: Mon, 31 Jan 2005 15:24:19 -0500
> > From: bugzilla-daemon@portal.open-bio.org
> > To: bioperl-guts-l@bioperl.org
> > Subject: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes attributes
> >
> > http://bugzilla.open-bio.org/show_bug.cgi?id=1742
> >
> >            Summary: GFF parser messes attributes
> >            Product: Bioperl
> >            Version: 1.5 branch
> >           Platform: Macintosh
> >         OS/Version: MacOS X
> >             Status: NEW
> >           Severity: major
> >           Priority: P2
> >          Component: Core Components
> >         AssignedTo: bioperl-guts-l@bioperl.org
> >         ReportedBy: jldai@yahoo.com
> >
> >
> > In BioPerl 1.5.0, use Bio::Tools::GFF and Bio::SeqIO to paser GFF 
> > string:
> >
> > 8255763	tigrscan	final-exon	67	558	56.8	-	2	transgrp "1001";
> >
> >  into an Bio::SeqIO object and later print out as embl file, resulting 
> > in:
> >
> > FT   final-exon      complement(67..558)
> > FT                   
> > /transgrp="Bio::Annotation::SimpleValue=HASH(0x93a5d8)"
> > FT                   /note="score=56.8"
> > FT                   /note="frame=2"
> >
> > The value of tag "transgrp" should be 1001.
> >
> > Same script worked fine in BioPerl-1.4
> >
> >
> >
> > ------- You are receiving this mail because: -------
> > You are the assignee for the bug, or are watching the assignee.
> > _______________________________________________
> > Bioperl-guts-l mailing list
> > Bioperl-guts-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-guts-l
> >
> 
From hlapp at gnf.org  Mon Jan 31 20:18:23 2005
From: hlapp at gnf.org (Hilmar Lapp)
Date: Mon Jan 31 20:14:49 2005
Subject: [Bioperl-l] Re: [Bioperl-guts-l] [Bug 1742] New: GFF parser messes
	attributes (fwd)
In-Reply-To: <Pine.LNX.4.58.0501311648070.17005@sumo.ctrl.ucla.edu>
Message-ID: <2A70A6B0-73EF-11D9-860C-000A959EB4C4@gnf.org>


On Monday, January 31, 2005, at 04:56  PM, Allen Day wrote:

> This bug isn't coming from SeqFeature::Annotated, it's from the 
> refactor
> of SeqFeatureI to inherit from AnnotatableI.  Meaning get_tag_values() 
> and
> similar functions are now get/setting attributes to a
> Bio::AnnotationColleciton store.  Simple strings passed in are turned 
> into
> objects when added to the store, and given back in object form.
>
> It was never specified in Bio::SeqFeatureI, even before refactoring, 
> that
> the returned values of annotation tags should be strings.  Code using 
> the
> interface just assumed this was the case, and it wasn't a bad 
> assumption
> given that Bio::SeqFeature::Generic was the only instantiable class and
> did use strings as values rather than objects.

If it wasn't a bad assumption and if everybody made that assumption, 
what's so great about breaking that?

What SeqFeatureI stated was:

  Title   : get_tag_values
  Usage   : @values = $self->get_tag_values('some_tag')
  Function:
  Returns : An array comprising the values of the specified tag.
  Args    : a string

So you might say the term 'values' does not say it must be a string, 
yet in the synopsis that's exactly how the method is demonstrated.

I think it's fair to say that implicitly by usage pattern the contract 
has become you have to return a string here, and I think breaking this 
so as to return objects may be a great idea and a great change but in a 
bioperl-2.0 only.

Otherwise, you demand that all current and future Bio::AnnotationI 
implementations are properly stringification-overloaded, and that 
people are perfectly aware that $annvalue and "$annvalue" are two very 
different things.


	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------