From elia at tigem.it  Tue Mar  1 06:51:16 2005
From: elia at tigem.it (Elia Stupka)
Date: Tue Mar  1 07:51:08 2005
Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file
In-Reply-To: <200502281005.06990.jswanson@iastate.edu>
References: <200502141205.52256.jswanson@iastate.edu>
	<200502281005.06990.jswanson@iastate.edu>
Message-ID: <4f12c65ac919697fd8a7e9220db182fd@tigem.it>

Hi Jordan,

I have been doing some work on Contig::Assembly myself recently, and 
have also been in touch with the author (Robson) about it. Perhaps the 
best thing would be for the three of us to have a chat about this 
object, try to revamp it a little with our improvements, and then 
Robson or I can check it in?

regards,

Elia

On Feb 28, 2005, at 5:05 PM, Jordan Swanson wrote:

> On Monday 14 February 2005 12:05 pm, Jordan Swanson wrote:
>> Hi,
>> I am new to bioperl, but I have a proposal for updating bioperl with 
>> some
>> of the code I have been using.
>>
>> Bioperl packages currently exist that open ACE assembly files (output 
>> by
>> phrap/cap3, and other assembly program).  However, the current code 
>> brings
>> in the entire file in one call:
>>
>> my $assembly_in =
>> 	 Bio::Assembly::IO->new(-file=>"input.ace",
>> 						-format=>'ace');
>>
>> my $assembly = $assembly_in->next_assembly;
>>
>> I am working on a large EST assembly project(roughly 150K) and our 
>> assembly
>> files have been around 200 MB in size.  For many of our applications, 
>> we
>> only need to process one contig at a time, not to mention that 
>> reading the
>> entire assembly at once requires a large amount of memory and/or disc
>> space.
>>
>> I have developed some code that reads in contigs one at a time, 
>> therefore
>> using only the amount of space needed for one contig object. A brief
>> synopsis:
>>
>> my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace');
>> while( my $contig = $contig_in->next_contig)
>> {
>> 	do_stuff_with_contig();
>> }
>>
>> Furthermore, there is no code that currently writes out ACE files or
>> reverses the contigs orientation.  I have developed some code that
>> implements both, and if you would have it, I would like to submit this
>> code.  I have been working on converting this code to a more bioperl
>> friendly format
>> ( inheriting from bioseq objects, using the bioperl IO system, bioperl
>> style warnings and so forth)
>>
>> I would appreciate some advice on how to proceed, specifically on
>> inheriting from the correct classes and avoiding duplication of code. 
>> My
>> initial thoughts:
>>
>> *  Pull out the parsing code from Assembly::IO::ace.pm and into a new
>> ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig 
>> object
>> is an AssemblyI)
>> * Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire
>> contig into, and to output the assembly
>> * Incorporate somewhere, my reverse_contig function ( which is like 
>> revcom
>> for Bio::SeqI, so possibly in the ContigI.pm file)
>>
>> Thoughts?
>
> I've gone ahead and incorporated my changes into bioperl compliant 
> objects.
>
> *Bio/Assembly/ContigIO.pm created
> *Bio/Assembly/ContigIO directory created
> *Bio/Assembly/ContigIO/ace.pm created
> *Bio/Assembly/IO/ace.pm modified to use Bio::Assembly::Contig
> *Bio/Assembly/Contig.pm modified to allow base segments and to add a 
> revcom
> method
> *t/ContigIO.t created
>
> How does one submit their code for inspection/review/incorporation?  I 
> used
> cvs to check out the code I've been using, but "cvs add" is not 
> working at my
> permission level.
>
>
>
>
> -- 
> Jordan M Swanson
> Department of Ecology, Evolution, and Organismal Biology
> Iowa State University
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
---
Telethon Institute of Genetics and Medicine
Via Pietro Castellino, 111
80131 Napoli

Tel. +39 081 6132 335
Fax. +39 081 560 98 77

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3488 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050301/67c65329/attachment.bin
From jswanson at iastate.edu  Tue Mar  1 10:13:23 2005
From: jswanson at iastate.edu (Jordan Swanson)
Date: Tue Mar  1 10:13:34 2005
Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file
In-Reply-To: <4f12c65ac919697fd8a7e9220db182fd@tigem.it>
References: <200502141205.52256.jswanson@iastate.edu>
	<200502281005.06990.jswanson@iastate.edu>
	<4f12c65ac919697fd8a7e9220db182fd@tigem.it>
Message-ID: <200503010913.23399.jswanson@iastate.edu>

On Tuesday 01 March 2005 05:51 am, Elia Stupka wrote:
> Hi Jordan,
>
> I have been doing some work on Contig::Assembly myself recently, and
> have also been in touch with the author (Robson) about it. Perhaps the
> best thing would be for the three of us to have a chat about this
> object, try to revamp it a little with our improvements, and then
> Robson or I can check it in?

Good idea,  you and Robson can expect a copy of my changes very soon.


-- 
Jordan M Swanson   
Department of Ecology, Evolution, and Organismal Biology 
431 Bessey Hall 
Iowa State University 
Ames, IA 50011 
Lab 515 294-7098 
FAX: 515-294-1337 
From s_waechter at gmx.net  Tue Mar  1 11:05:12 2005
From: s_waechter at gmx.net (=?ISO-8859-1?Q?Stefan_W=E4chter?=)
Date: Tue Mar  1 11:00:29 2005
Subject: [Bioperl-l] which one and how to configure(blastall)
In-Reply-To: <42234138.4020903@csit.fsu.edu>
References: <42234138.4020903@csit.fsu.edu>
Message-ID: <422492B8.8060905@gmx.net>

Hi Yanfeng,

Try this:
(I make the assumption, that your blast installation is in 
/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10).
 
Create a file named   .ncbirc (don't forget the little dot)  in 
/home/yanfeng and write the following in this file:

    [NCBI]
    DATA="/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10/data"

save file.

In the data dir, BLAST will find the BLOSSUM tables.

In your blast installation dir you will find also db dir. That's a good 
place to store your blast databases.

Set the environment variables in your .bashrc  (.profile, .cshrc...... - 
depends on your shell) . I know it's trivial, but..... .

something like [bash]:

<>BLASTDIR=/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10
<>export BLASTDIR

<>BLASTDB=/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10/db
<>export BLASTDB


Additionally it is an good idea to add 
/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10 in your PATH variabel.
 

In a last step you have to install one of the NCBI databases in the 
BLASTDB dir or create one with formatdb

Hope I could help

Cheers
Stefan


yanfeng wrote:

> Hi, Sorry to bother you again.
> I want to download blast program now
> I want to run blast and get blast report.
> I donot know which one I should install and how to configure it( is 
> that  like " export BLASTDIR=/ data1/blast/ " )
> I use
> BEGIN {
> $ENV{'BLASTDIR'} = 
> '/home/yanfeng/blast-2.2.10-amd64-linux/blast-2.2.10/';
> }
>
> but it doesnot work.
>
> blast-2.2.10-amd64-linux.tar.gz 
> <ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.10/blast-2.2.10-amd64-linux.tar.gz> 
>
> blast-2.2.10-ia32-linux.tar.gz 
> <ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.10/blast-2.2.10-ia32-linux.tar.gz> 
>
> blast-2.2.10-ia64-linux.tar.gz 
> <ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.10/blast-2.2.10-ia64-linux.tar.gz> 
>
>
>
> My perl script
> //
> use Bio::SeqIO;
> use Bio::Seq;
> use Bio::Tools::Run::StandAloneBlast;
> $seqio_obj = Bio::SeqIO->new(-file => 'mun_lab.fasta',
>                           -format => 'fasta' ); # to wrtie the 
> sequence to afasta file
> $seq_obj = $seqio_obj->next_seq;
> #print $seq_obj->seq,"\n";
> @params = (program  => 'blastn',
>         database => 'db.fa' );
> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(@params);
> $report_obj = $blast_obj->blastall($seq_obj);
> $result_obj = $report_obj->next_result;
> print $result_obj->num_hits;
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>

From biolist at brinkman.mbb.sfu.ca  Wed Mar  2 13:36:43 2005
From: biolist at brinkman.mbb.sfu.ca (Matthew Laird)
Date: Wed Mar  2 13:31:44 2005
Subject: [Bioperl-l] blastall & StandAloneBlast
Message-ID: <Pine.LNX.4.44.0503021007050.8854-100000@c001>

Hi all,

I'm yet again being faced with a mysterious crash in blastall and Bioperl 
that has been occuring for the past year.  I'm receiving more reports from 
people around the world using our software also experiencing this problem, 
and the only answer I once received about the problem was, "That shouldn't 
be possible."

Anyhow, the error occurs when blastall is called from StandAloneBlast.pm, 
Blast returns with a -1 error code which causes the Bioperl module to 
throw an exception.  We've had reports of this occurring on multiple Linux 
distributions as well as on Solaris and OS X.  But it doesn't happen on 
all machine even if they're running the same distribution.

The crash output is below....

Fatal error:
------------- EXCEPTION  -------------
MSG: blastall call crashed: -1 /usr/local/blast/blastall -p  blastp  -d
"/usr/local/psort/conf/analysis/sclblast/gramneg/sclblast"  -i
/var/tmp/6m0QxSirC3  -e  1e-09  -o  /var/tmp/AKCDNMCTyo  -F  F

STACK Bio::Tools::Run::StandAloneBlast::_runblast
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:751
STACK Bio::Tools::Run::StandAloneBlast::_generic_local_blast
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:698
STACK Bio::Tools::Run::StandAloneBlast::blastall
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:553
STACK Bio::Tools::Run::SCLBlast::blast
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/SCLBlast.pm:135
STACK Bio::Tools::PSort::Module::SCLBlast::run
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort/Module/SCLBlast.pm:72
STACK Bio::Tools::PSort::Pathway::__ANON__
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort/Pathway.pm:194
STACK Bio::Tools::PSort::Pathway::traverse
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort/Pathway.pm:157
STACK Bio::Tools::PSort::classify
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/PSort.pm:160
STACK (eval) /usr/local/psort/bin/psort:320
STACK toplevel /usr/local/psort/bin/psort:320

--------------------------------------

The line in StandAloneBlast.pm we track the problem back to is:
$self->throw("$executable call crashed: $? $commandstring\n")unless 
($status==0) ;

Odd thing is, Blast DOES run.  If one comments out this line in 
StandAloneBlast.pm, the execution succeeds perfectly fine.  When I've 
editted the error message being thrown to give more details, perl says the 
error is related to a process not being able to be created, which is even 
weirder.

So, for some odd reason either blastall is passing back this -1 or perl is 
giving it back to bioperl for whatever reason.  The only advice we have 
for people is to comment out this line in StandAloneBlast.pm.

Anyone have any thoughts of advice on where this problem is coming from?

Thanks.

-- 
Matthew Laird
SysAdmin/Developer, Brinkman Laboratory, MBB Dept.
Simon Fraser University


From dcj at sanger.ac.uk  Wed Mar  2 08:27:55 2005
From: dcj at sanger.ac.uk (Daniel Jeffares)
Date: Wed Mar  2 14:16:18 2005
Subject: [Bioperl-l] Bio::LiveSeq::Transcript query from new bioperl user
Message-ID: <E2D86D4A-8B1E-11D9-BF4A-000D933C5076@sanger.ac.uk>

This is a request for help from a *very* new bioperl user. IM also 
pretty new to perl....

I want to use the Bio::LiveSeq::Transcript->new method to make a 
transcript object from an .embl file.

I then want to use the $frame = $transcript->frame($label) method so 
that I can trim sub-sections of the transcript to include only complete 
codons.

So, in other words, I want to collect subsets of a transcript 
(coordinates that I have defined earlier), and then trim those 
coordinates to the nearest complete codons. And the get the sequence of 
those coordinates.

____________________________
Daniel Jeffares
Wellcome Trust Sanger Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SA, UK
Phone: +44(0)1223 834244 x 7297
____________________________

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 796 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050302/5ec22dbc/attachment.bin
From elia at tigem.it  Wed Mar  2 14:46:30 2005
From: elia at tigem.it (Elia Stupka)
Date: Wed Mar  2 14:41:25 2005
Subject: [Bioperl-l] blastall & StandAloneBlast
In-Reply-To: <Pine.LNX.4.44.0503021007050.8854-100000@c001>
References: <Pine.LNX.4.44.0503021007050.8854-100000@c001>
Message-ID: <fa613b51d577b305bef061424eb7b1f6@tigem.it>

Could you send the output you get when you comment the throw and print 
out the errors you mention below?

Elia

On 2 Mar 2005, at 19:36, Matthew Laird wrote:

> When I've
> editted the error message being thrown to give more details, perl says 
> the
> error is related to a process not being able to be created, which is 
> even
> weirder.

From lairdm at sfu.ca  Wed Mar  2 14:51:58 2005
From: lairdm at sfu.ca (Matthew Laird)
Date: Wed Mar  2 14:46:44 2005
Subject: [Bioperl-l] blastall & StandAloneBlast
In-Reply-To: <fa613b51d577b305bef061424eb7b1f6@tigem.it>
Message-ID: <Pine.LNX.4.44.0503021151010.8854-100000@c001>

When I comment out the throw there is no output because the program 
executes correctly.  blastall runs and returns the results through 
bioperl.  That's the mysterious part of this.

On Wed, 2 Mar 2005, Elia Stupka wrote:

> Could you send the output you get when you comment the throw and print 
> out the errors you mention below?
> 
> Elia
> 
> On 2 Mar 2005, at 19:36, Matthew Laird wrote:
> 
> > When I've
> > editted the error message being thrown to give more details, perl says 
> > the
> > error is related to a process not being able to be created, which is 
> > even
> > weirder.
> 
> 

-- 
Matthew Laird
SysAdmin/Developer, Brinkman Laboratory, MBB Dept.
Simon Fraser University


From elia at tigem.it  Wed Mar  2 15:03:28 2005
From: elia at tigem.it (Elia Stupka)
Date: Wed Mar  2 14:58:20 2005
Subject: [Bioperl-l] blastall & StandAloneBlast
In-Reply-To: <Pine.LNX.4.44.0503021151010.8854-100000@c001>
References: <Pine.LNX.4.44.0503021151010.8854-100000@c001>
Message-ID: <37c1c72203e2dcd41eb3cacae33146e7@tigem.it>

Sorry, wrote my answer badly, I meant when you mentioned that printing 
more details about the error it gave you something weird about process 
not being able to be created:

> editted the error message being thrown to give more details, perl says 
>  the error is related to a process not being able to be created, which 
> is  even weirder.
>

Elia

From lairdm at sfu.ca  Wed Mar  2 18:56:07 2005
From: lairdm at sfu.ca (Matthew Laird)
Date: Wed Mar  2 18:51:13 2005
Subject: [Bioperl-l] blastall & StandAloneBlast
In-Reply-To: <37c1c72203e2dcd41eb3cacae33146e7@tigem.it>
Message-ID: <Pine.LNX.4.44.0503021554140.8854-100000@c001>

Alas no.  I no longer have any machines around I had to do the hack on.  I 
just tried to install it on two other machines and it sadly ran fine.... 
I'm reluctant to harass any of the users who have emailed us and ask them 
to intentionally "break" their install (by uncommenting that line) to help 
us test this.

Anyhow, I had just added $! to the error message and it said something 
along the lines of "Process can not be created."

On Wed, 2 Mar 2005, Elia Stupka wrote:

> Sorry, wrote my answer badly, I meant when you mentioned that printing 
> more details about the error it gave you something weird about process 
> not being able to be created:
> 
> > editted the error message being thrown to give more details, perl says 
> >  the error is related to a process not being able to be created, which 
> > is  even weirder.
> >
> 
> Elia
> 
> 

-- 
Matthew Laird
SysAdmin/Developer, Brinkman Laboratory, MBB Dept.
Simon Fraser University


From heikki at nildram.co.uk  Thu Mar  3 02:50:04 2005
From: heikki at nildram.co.uk (Heikki Lehvaslaiho)
Date: Thu Mar  3 02:45:10 2005
Subject: [Bioperl-l] Bio::LiveSeq::Transcript query from new bioperl user
In-Reply-To: <E2D86D4A-8B1E-11D9-BF4A-000D933C5076@sanger.ac.uk>
References: <E2D86D4A-8B1E-11D9-BF4A-000D933C5076@sanger.ac.uk>
Message-ID: <200503030750.04401.heikki@nildram.co.uk>

Daniel,

While LiveSeq can be used for this, there is quite a lot of overhead in 
creating those objects. If you are going to apply this in a highthroughput
pipeline, I recommend you retrieve the CDS feature from the standard 
SeqIO-produced sequence object and determine the frame yourself.

Let me know in more detail what you want to do and I'll help you.

	-Heikki

On Wednesday 02 March 2005 13:27, Daniel Jeffares wrote:
> This is a request for help from a *very* new bioperl user. IM also
> pretty new to perl....
>
> I want to use the Bio::LiveSeq::Transcript->new method to make a
> transcript object from an .embl file.
>
> I then want to use the $frame = $transcript->frame($label) method so
> that I can trim sub-sections of the transcript to include only complete
> codons.
>
> So, in other words, I want to collect subsets of a transcript
> (coordinates that I have defined earlier), and then trim those
> coordinates to the nearest complete codons. And the get the sequence of
> those coordinates.
>
> ____________________________
> Daniel Jeffares
> Wellcome Trust Sanger Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge, CB10 1SA, UK
> Phone: +44(0)1223 834244 x 7297
> ____________________________

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From heikki at nildram.co.uk  Thu Mar  3 02:50:04 2005
From: heikki at nildram.co.uk (Heikki Lehvaslaiho)
Date: Thu Mar  3 02:45:23 2005
Subject: [Bioperl-l] Bio::LiveSeq::Transcript query from new bioperl user
In-Reply-To: <E2D86D4A-8B1E-11D9-BF4A-000D933C5076@sanger.ac.uk>
References: <E2D86D4A-8B1E-11D9-BF4A-000D933C5076@sanger.ac.uk>
Message-ID: <200503030750.04401.heikki@nildram.co.uk>

Daniel,

While LiveSeq can be used for this, there is quite a lot of overhead in 
creating those objects. If you are going to apply this in a highthroughput
pipeline, I recommend you retrieve the CDS feature from the standard 
SeqIO-produced sequence object and determine the frame yourself.

Let me know in more detail what you want to do and I'll help you.

	-Heikki

On Wednesday 02 March 2005 13:27, Daniel Jeffares wrote:
> This is a request for help from a *very* new bioperl user. IM also
> pretty new to perl....
>
> I want to use the Bio::LiveSeq::Transcript->new method to make a
> transcript object from an .embl file.
>
> I then want to use the $frame = $transcript->frame($label) method so
> that I can trim sub-sections of the transcript to include only complete
> codons.
>
> So, in other words, I want to collect subsets of a transcript
> (coordinates that I have defined earlier), and then trim those
> coordinates to the nearest complete codons. And the get the sequence of
> those coordinates.
>
> ____________________________
> Daniel Jeffares
> Wellcome Trust Sanger Institute
> Wellcome Trust Genome Campus
> Hinxton, Cambridge, CB10 1SA, UK
> Phone: +44(0)1223 834244 x 7297
> ____________________________

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From venancio at iq.usp.br  Thu Mar  3 08:11:27 2005
From: venancio at iq.usp.br (Thiago Motta Venancio)
Date: Thu Mar  3 09:21:00 2005
Subject: [Bioperl-l] GFF question
Message-ID: <42270CFF.1080502@iq.usp.br>

Hi folks.
I would like to get a more detailed explanation about how to construct 
GFF files with the outputs of several programs, like genescan, 
repeatmasker...
thanks in advance.
Thiago

-- 
Thiago Motta Venancio - PhD student in Bioinformatics


From jrm at compbio.dundee.ac.uk  Thu Mar  3 10:26:00 2005
From: jrm at compbio.dundee.ac.uk (Jon manning)
Date: Thu Mar  3 10:23:31 2005
Subject: [Bioperl-l] gap/ambiguous character only sequences: Bio::PrimarySeq
Message-ID: <1109863560.20641.154.camel@tick.compbio.dundee.ac.uk>

Hi All,

For a lot of the stuff I'm doing at the moment I'm chopping up
alignments and playing with the bits etc. I've had to nobble
Bio::PrimarySeq to allow the resulting gap-only sequences in
Bio::LocatableSeq- I understand the rationale behind this check, and
it's a useful default, but could we perhaps have an option to allow
tolerance instead? If such exists, I'd be grateful if someone could
point me in the right direction! 

Thanks,

Jon

From ak at ebi.ac.uk  Thu Mar  3 16:00:44 2005
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Thu Mar  3 15:55:35 2005
Subject: [Bioperl-l] FYI: BioPerl port for OpenBSD
Message-ID: <20050303210044.GA8592@ebi.ac.uk>

List,

Just for your information, I noticed that a port of BioPerl
1.5.0 recently got committed to the OpenBSD ports tree as
"biology/bioperl".  So if there is anyone out there doing
bioinformatics on OpenBSD (I know only of myself), this might be
mildly interesting to investigate.

I haven't had the time to try the port out yet, and since I tend
to go with bioperl-live from CVS anyway it might take some time
before I do.

Something that might possibly be interesting to others is that
the port apparently patches the code to use Text::ParseWords in
place of Text::Shellwords.  The Text::ParseWords is part of the
standard Perl 5.8.6 installation on OpenBSD systems, so that
kinda makes sense, and it gets rid of a dependency.

OpenBSD users following the CURRENT development branch knows
where to go if they are intrigued...


Cheers,
Andreas

ps: I don't have anything to do with this, really.


-- 
Andreas K?h?ri
EMBL-EBI/ensembl

1024D/C2E163CB
From jason.stajich at duke.edu  Fri Mar  4 11:53:39 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Mar  4 15:59:49 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] Parser
In-Reply-To: <42289EC0.1020505@ime.usp.br>
References: <42289EC0.1020505@ime.usp.br>
Message-ID: <0803522c2b529e44030542e66b046c07@duke.edu>

Bio::SearchIO::psl will pretty much do this for you.

There is a search2table script which may work out of the box or you may 
have to tweak a little to get the right fields to the right place.  It 
is in scripts/utilities/search2gff.pl.  There had been some off-by-one 
errors a while ago with the SearchIO psl parser, I *think* that is 
fixed now.

I don't know that anyone has contributed a RepeatMasker parser to 
Bioperl.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 4, 2005, at 12:45 PM, Thiago Motta Venancio wrote:

> Hi folks.
> Anyone here knows where can i find parsers that build GFF?
> I have one parser for RepeatMasker output and one for Blast output.
> I need a parser that converts PSL (Blat) output to GFF.
> Thanks in advance.
> Thiago
>
> -- 
> Thiago Motta Venancio - PhD student in Bioinformatics
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050304/4e50a2a5/PGP.bin
From jason.stajich at duke.edu  Fri Mar  4 16:06:35 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri Mar  4 16:04:55 2005
Subject: [Bioperl-l] Re: [Gmod-gbrowse] Parser
In-Reply-To: <4228A3ED.7020501@ime.usp.br>
References: <42289EC0.1020505@ime.usp.br>
	<0803522c2b529e44030542e66b046c07@duke.edu>
	<4228A3ED.7020501@ime.usp.br>
Message-ID: <a251afa4c0cf1d67fa27fc58dbb07bde@duke.edu>

You'll have to read-up on the SearchIO system for it to make any sense.
http://bioperl.org/HOWTOs/SearchIO/index.html

The HSPs are the "features" which are written back out with the 
Bio::Tools::GFF module.

Most of the work is already done in the search2gff.PLS script for you 
-- there is a lot of code in there to handle asking for query or hit 
strand (you can only output one or the other) and filtering.

The code is in bioperl scripts/utilities directory or you can pull it 
down here:
http://bioperl.org/SRC/bioperl-live/scripts/utilities/search2gff.PLS

So you want to run it like this (argument order doesn't matter)
perl search2gff -o myout.gff -i myinput.psl -f psl

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 4, 2005, at 1:07 PM, Thiago Motta Venancio wrote:

> Dear Jason.
> Thanks for replying me.
> I saw the documentation of your package before writing to the list, 
> but i did not understand how to use it.
> Sorry about my low knowledge in Bioperl.
> Here is your code:
>
> use Bio::SearchIO;
>  my $parser = new Bio::SearchIO(-file   => 'file.psl',
>                                 -format => 'psl');
>  while( my $result = $parser->next_result ) {
>  }
>
>
> The question is where to specify the GFF format...
> Regards.
> Thiago
>
> -- 
> Thiago Motta Venancio - PhD student in Bioinformatics
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050304/107cffd0/PGP.bin
From gowribio2004 at yahoo.co.in  Mon Mar  7 02:03:20 2005
From: gowribio2004 at yahoo.co.in (Gowri Karthik)
Date: Mon Mar  7 01:58:06 2005
Subject: [Bioperl-l] regarding bioperl project
Message-ID: <20050307070320.21704.qmail@web8510.mail.in.yahoo.com>

sir/madam
              I am doing my PGDBI . I want to my project in bioperl . can anyone help me in suggesting topic which will be helpful for my career.
             thank you.
regards,
gowri  

		
---------------------------------
Celebrate Yahoo!'s 10th Birthday! 
 Yahoo! Netrospective: 100 Moments of the Web 
From jaiswal at iitk.ac.in  Mon Mar  7 02:00:51 2005
From: jaiswal at iitk.ac.in (jaiswal@iitk.ac.in)
Date: Mon Mar  7 13:37:27 2005
Subject: [Bioperl-l] need some help about  pqs
Message-ID: <3481.172.28.124.134.1110178851.squirrel@nwebmail.iitk.ac.in>

Dear sir,

Im student of Bioinformatics
                    ,sir im sending u ,a problem which im facing at this
time , in PQS, it is attached with this mail,
 the problem is ..
1 . i have 6000 proteins which i selected for my research ( see attachment
as list_id ), in first case i run it on pqs page  of pdb id ,which give me
out put in     2. for mate  & then on going to   3ed step it will give me
result in  .mol file , which was i needed .

 It is all correct , but it is good for 100 or 200 proteins , it can be
done manually , but for more than 6000 proteins it is ,tedious  job , so ,
 can u help me to do this job by any other method other than manually , or
is their any script for downloading all these files.


              waiting for reply ..

                                      thanks..


----------------------------------------------------
        Ashish Kumar Jaiswal
        MScBioinformatics
        c/o Dr. Balaji Prakash
        Structural Biology Lab,
        Department of Biological Sciences
        and Bioengineering,
        Indian Institute of Technology, Kanpur,
        UP-208016, INDIA

        Ph:  +91-512-2594024
        FAX: +91-512-2594010
        Email: jaiswal@iitk.ac.in
----------------------------------------------------


-------------- next part --------------
12AS

153L

16PK

16VP

1A04

1A05

1A0C

1A0I

1A0P

1A12

1A17

1A1X

1A26

1A2P

1A2V

1A2X

1A2Z

1A38

1A3A

1A3C

1A3W

1A41

1A44

1A49

1A4E

1A4I

1A4M

1A4S

1A4Y

1A53

1A58

1A59

1A5C

1A5T

1A5Z

1A62

1A6C

1A6D

1A6F

1A6J

1A6M

1A6Q

1A6Z

1A76

1A78

1A79

1A7J

1A7T

1A81

1A87

1A88

1A8B

1A8D

1A8H

1A8I

1A8L

1A8P

1A8Q

1A8R

1A8S

1A8Y

1A99

1A9X

1A9X

1AA6

1AA7

1AB4

1AC6

1ACC

1ACF

1AD1

1AD2

1AD3

1ADE

1ADO

1ADW

1AE1

1AE7

1AE9

1AEP

1AF5

1AF6

1AF7

1AFW

1AG9

1AGI

1AGJ

1AGQ

1AGR

1AGX

1AH7

1AHS

1AIH

1AJ2

1AJ8

1AJS

1AK2

1AKO

1AKY

1AKZ

1AL3

1ALU

1ALY

1AM2

1AM5

1AM7

1AMF

1AMK

1AMU

1AMW

1AMX

1AN7

1AN8

1ANF

1ANV

1AOA

1AOC

1AOE

1AOG

1AOH

1AOP

1AOR

1AQC

1AQE

1AQM

1AQU

1AQZ

1AR1

1ARB

1ARO

1AS4

1ASH

1ASS

1AT0

1AT3

1ATG

1ATI

1ATL

1ATZ

1AUA

1AUI

1AUN

1AUO

1AUT

1AUY

1AVA

1AVG

1AVO

1AVQ

1AVW

1AW1

1AW9

1AX4

1AX8

1AXC

1AXD

1AXI

1AXN

1AYA

1AYF

1AYL

1AYM

1AYX

1AYZ

1AZ9

1AZO

1AZS

1AZW

1AZZ

1AZZ

1B00

1B04

1B06

1B09

1B0A

1B0B

1B0N

1B0U

1B12

1B16

1B1U

1B1Y

1B24

1B25

1B2P

1B33

1B34

1B35

1B35

1B35

1B3U

1B43

1B48

1B4P

1B56

1B5E

1B5L

1B5P

1B5T

1B63

1B65

1B66

1B6A

1B6E

1B6G

1B6R

1B74

1B77

1B78

1B79

1B7B

1B7E

1B7G

1B7Y

1B7Y

1B80

1B8A

1B8D

1B8D

1B8M

1B8M

1B8O

1B8P

1B93

1B9B

1B9H

1B9L

1B9O

1B9V

1BAM

1BAW

1BB9

1BBT

1BBT

1BBT

1BCF

1BCO

1BD0

1BD2

1BD2

1BD2

1BD2

1BD3

1BD8

1BDB

1BDF

1BDG

1BDM

1BDY

1BE9

1BEA

1BEB

1BEC

1BED

1BEF

1BEV

1BEV

1BEV

1BF2

1BF6

1BFG

1BFT

1BG2

1BG6

1BG7

1BGC

1BGF

1BGP

1BGV

1BGX

1BGX

1BGX

1BHD

1BHE

1BHT

1BI0

1BI5

1BI7

1BI7

1BIA

1BIF

1BIH

1BIO

1BJ7

1BJF

1BJJ

1BJN

1BJT

1BKB

1BKC

1BKC

1BKC

1BKF

1BKJ

1BKP

1BKR

1BKZ

1BLE

1BLX

1BLX

1BM9

1BMP

1BN7

1BN8

1BND

1BO1

1BO4

1BOB

1BOL

1BOO

1BOU

1BOU

1BOW

1BPB

1BPO

1BQB

1BQC

1BQK

1BQU

1BQY

1BR2

1BR9

1BRT

1BRU

1BRW

1BS0

1BS2

1BSG

1BSM

1BT3

1BTN

1BU2

1BU6

1BUC

1BUD

1BUE

1BUN

1BUO

1BUP

1BUV

1BUV

1BVP

1BVS

1BVY

1BVY

1BW0

1BWD

1BWV

1BWV

1BWW

1BX4

1BXB

1BXN

1BXN

1BXT

1BY5

1BYB

1BYF

1BYG

1BYI

1BYK

1BYR

1BYW

1BZY

1C02

1C0D

1C0P

1C1D

1C1K

1C1L

1C1Y

1C25

1C2P

1C2Y

1C3A

1C3A

1C3C

1C3G

1C3H

1C3M

1C3P

1C41

1C44

1C4K

1C4O

1C4R

1C4X

1C4Z

1C4Z

1C52

1C5C

1C5C

1C6V

1C7G

1C7K

1C7Q

1C8B

1C8N

1C8U

1C8Z

1C9K

1CBF

1CBG

1CBK

1CBS

1CBY

1CC1

1CC1

1CC3

1CCR

1CCW

1CCW

1CD1

1CD3

1CD3

1CD3

1CD3

1CD8

1CDD

1CDO

1CDP

1CDY

1CEO

1CER

1CEW

1CEX

1CF1

1CF2

1CFR

1CFY

1CFZ

1CG2

1CG5

1CGH

1CHD

1CHK

1CHM

1CHU

1CI0

1CI3

1CI9

1CID

1CII

1CIY

1CJA

1CJB

1CJC

1CJW

1CJX

1CKE

1CKI

1CKM

1CL1

1CLC

1CLI

1CLV

1CM0

1CM4

1CM8

1CMB

1CMX

1CN3

1CNT

1CNU

1CNV

1CNZ

1CO6

1COJ

1COL

1COT

1COV

1COV

1COV

1COZ

1CP2

1CP9

1CP9

1CPN

1CPT

1CPY

1CQ3

1CQQ

1CQX

1CR1

1CR5

1CRB

1CRU

1CRZ

1CS1

1CS6

1CSE

1CSH

1CSN

1CT5

1CT9

1CTQ

1CTT

1CUK

1CUO

1CV8

1CVL

1CVR

1CVS

1CVS

1CWN

1CWV

1CX4

1CXC

1CXQ

1CXZ

1CYD

1CYG

1CYX

1CZA

1CZT

1CZY

1D09

1D09

1D0C

1D0Q

1D1G

1D1J

1D1Q

1D2E

1D2N

1D2O

1D2S

1D2T

1D2Z

1D2Z

1D3G

1D3L

1D3V

1D3Y

1D4A

1D4D

1D4O

1D4T

1D4V

1D4V

1D5C

1D5N

1D5R

1D5T

1D6A

1D6M

1D6R

1D7M

1D7P

1D7U

1D7Y

1D8C

1D8I

1D8U

1D8W

1D9C

1DAB

1DAR

1DB3

1DBF

1DBQ

1DBW

1DBX

1DCE

1DCE

1DCF

1DCI

1DCO

1DCQ

1DCS

1DCU

1DD3

1DD5

1DD9

1DDG

1DDJ

1DDW

1DDZ

1DEE

1DEE

1DEK

1DEU

1DEV

1DF7

1DFC

1DFO

1DFX

1DG6

1DG9

1DGF

1DGJ

1DGS

1DHN

1DHR

1DHS

1DI0

1DI1

1DI6

1DIN

1DIQ

1DIV

1DJ0

1DJ2

1DJ7

1DJX

1DK0

1DK8

1DKG

1DKG

1DKI

1DKL

1DKU

1DKZ

1DL2

1DL5

1DLC

1DLF

1DLF

1DLJ

1DLP

1DLW

1DLY

1DM5

1DM9

1DMG

1DMH

1DML

1DN0

1DN0

1DN1

1DN1

1DNP

1DO5

1DOF

1DOI

1DOS

1DOW

1DOZ

1DPE

1DPG

1DPI

1DPS

1DPT

1DQ3

1DQA

1DQE

1DQG

1DQI

1DQN

1DQT

1DQU

1DQV

1DQZ

1DRW

1DRY

1DSS

1DSY

1DT0

1DT9

1DTD

1DTL

1DTO

1DU5

1DUG

1DUN

1DUS

1DUV

1DV1

1DV8

1DVK

1DVO

1DVP

1DW0

1DWK

1DWN

1DX5

1DX5

1DXE

1DXH

1DXJ

1DXL

1DXR

1DXR

1DXR

1DXR

1DXY

1DY2

1DY5

1DY9

1DYN

1DYO

1DYP

1DYQ

1DYR

1DYS

1DYT

1DZ3

1DZB

1DZB

1DZF

1DZI

1DZK

1DZL

1DZO

1DZR

1E0C

1E0F

1E0T

1E1H

1E1H

1E1O

1E2K

1E2T

1E2W

1E2Y

1E3D

1E3I

1E3J

1E3P

1E4C

1E4E

1E4F

1E4I

1E4Y

1E5D

1E5E

1E5K

1E5M

1E5P

1E5Q

1E5R

1E5X

1E6B

1E6C

1E6I

1E6U

1E6V

1E6V

1E6V

1E6W

1E6Y

1E6Y

1E6Y

1E7L

1E7N

1E7W

1E8C

1E8G

1E8Y

1E9G

1E9I

1E9L

1E9M

1E9R

1E9Z

1E9Z

1EA0

1EA7

1EA9

1EAF

1EAG

1EAI

1EAJ

1EAQ

1EAR

1EAX

1EAZ

1EB6

1EB7

1EBA

1EBD

1EBF

1EC7

1ECA

1ECF

1ECM

1ECS

1ED1

1ED9

1EDG

1EDO

1EDQ

1EDT

1EDY

1EDZ

1EE0

1EE2

1EE6

1EE8

1EEJ

1EEM

1EEQ

1EER

1EER

1EEX

1EEX

1EEX

1EF1

1EF8

1EFD

1EFH

1EFN

1EFP

1EFP

1EFU

1EFU

1EFV

1EFV

1EG2

1EG3

1EG5

1EG7

1EGA

1EGI

1EGU

1EGZ

1EH1

1EH9

1EHI

1EHK

1EHK

1EHW

1EHY

1EI1

1EI5

1EI6

1EI7

1EIA

1EJ0

1EJ2

1EJ8

1EJA

1EJB

1EJD

1EJE

1EJF

1EJX

1EJX

1EJX

1EK0

1EK6

1EK9

1EKB

1EKE

1EKG

1EKJ

1EKQ

1EKR

1EL1

1EL5

1EL6

1ELK

1ELR

1ELT

1ELU

1ELW

1EM2

1EM8

1EM8

1EM9

1EMS

1ENF

1ENP

1ENY

1EO6

1EO9

1EO9

1EOK

1EOV

1EP0

1EP3

1EP3

1EP5

1EP7

1EPA

1EPF

1EPU

1EPW

1EPX

1EQ2

1EQ9

1EQC

1EQF

1EQR

1EQW

1ERJ

1ERV

1ES0

1ES0

1ES5

1ES6

1ES8

1ES9

1ESC

1ESL

1ESO

1ESW

1ET9

1ETE

1ETU

1EU1

1EU3

1EU8

1EUA

1EUD

1EUD

1EUH

1EUM

1EUP

1EUV

1EUW

1EV1

1EV1

1EV1

1EV2

1EV2

1EV7

1EVH

1EVS

1EVX

1EVY

1EW0

1EW2

1EW3

1EW4

1EW6

1EWF

1EWR

1EX0

1EX2

1EX9

1EXB

1EXB

1EXM

1EXQ

1EXR

1EXS

1EXT

1EXU

1EYB

1EYE

1EYH

1EYL

1EYQ

1EYS

1EYS

1EYS

1EYS

1EYV

1EZ0

1EZ3

1EZ4

1EZF

1EZI

1EZJ

1EZW

1F00

1F02

1F05

1F06

1F07

1F08

1F0I

1F0K

1F0L

1F0X

1F0Y

1F15

1F1C

1F1E

1F1G

1F1J

1F1M

1F1O

1F1S

1F1U

1F20

1F28

1F2D

1F2E

1F2N

1F2T

1F2T

1F2V

1F32

1F35

1F39

1F3G

1F3H

1F3L

1F3M

1F3U

1F3U

1F3V

1F3V

1F46

1F4P

1F52

1F58

1F58

1F5J

1F5M

1F5N

1F5Q

1F5Q

1F5V

1F60

1F6B

1F6D

1F6F

1F6F

1F6Y

1F74

1F75

1F76

1F7C

1F7D

1F7L

1F7S

1F80

1F83

1F86

1F89

1F8F

1F8M

1F97

1F9A

1F9M

1F9V

1F9Y

1F9Z

1FA2

1FAO

1FB6

1FC3

1FC4

1FC6

1FCD

1FCD

1FCH

1FCJ

1FCQ

1FCY

1FD9

1FDQ

1FDR

1FEC

1FEH

1FEP

1FF3

1FFG

1FFG

1FFT

1FFT

1FFT

1FFT

1FFV

1FFV

1FFV

1FGJ

1FGK

1FGU

1FGV

1FGV

1FGY

1FH0

1FH9

1FHF

1FHG

1FI2

1FI4

1FI8

1FIO

1FIT

1FJ2

1FJH

1FJJ

1FKM

1FKN

1FL0

1FL1

1FL2

1FLE

1FLG

1FLJ

1FLK

1FLL

1FLM

1FM0

1FM4

1FM9

1FM9

1FMB

1FMC

1FMD

1FMD

1FMD

1FMJ

1FMT

1FMU

1FN9

1FNH

1FNL

1FNN

1FNO

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNT

1FNU

1FNY

1FO0

1FO0

1FO0

1FO1

1FO3

1FO8

1FOB

1FOE

1FOE

1FON

1FOT

1FP1

1FP2

1FP3

1FP5

1FP6

1FPO

1FPR

1FPZ

1FQI

1FQJ

1FQJ

1FQT

1FR2

1FRB

1FRF

1FRF

1FS0

1FS0

1FS1

1FS5

1FS7

1FSG

1FSL

1FT5

1FT9

1FTP

1FTR

1FTS

1FUE

1FUI

1FUK

1FUR

1FUS

1FUX

1FV1

1FV1

1FVG

1FVI

1FVK

1FVP

1FVR

1FVU

1FVU

1FW1

1FWX

1FX2

1FX3

1FX7

1FX8

1FXK

1FXK

1FXK

1FXO

1FXW

1FXW

1FXX

1FXY

1FXZ

1FY7

1FYE

1FYH

1FYH

1FYV

1FYX

1FZQ

1FZV

1FZY

1G0C

1G0D

1G0H

1G0O

1G0S

1G0Y

1G16

1G1K

1G1Q

1G24

1G29

1G2A

1G2I

1G2N

1G2O

1G2Q

1G2R

1G31

1G3K

1G3N

1G3N

1G3N

1G3P

1G3Q

1G40

1G41

1G43

1G4I

1G4M

1G4U

1G4U

1G4Y

1G4Y

1G55

1G57

1G5A

1G5B

1G5H

1G5Q

1G5T

1G5Z

1G60

1G61

1G62

1G66

1G6A

1G6G

1G6H

1G6O

1G6Q

1G6S

1G72

1G73

1G73

1G7N

1G7S

1G87

1G8A

1G8E

1G8I

1G8K

1G8K

1G8L

1G8M

1G8P

1G8S

1G99

1G9G

1G9K

1GA6

1GA8

1GAD

1GAK

1GC5

1GCA

1GCI

1GCV

1GCV

1GCY

1GD0

1GD1

1GD6

1GD7

1GD8

1GDE

1GDH

1GEE

1GEF

1GEG

1GEH

1GEN

1GEQ

1GES

1GFF

1GFF

1GG2

1GG2

1GG3

1GG4

1GG6

1GGL

1GGP

1GGP

1GGX

1GH2

1GH6

1GH6

1GHE

1GHP

1GHQ

1GHQ

1GHR

1GHS

1GIQ

1GJ7

1GJW

1GK8

1GK8

1GK9

1GK9

1GKA

1GKA

1GKD

1GKL

1GKM

1GKP

1GKR

1GKU

1GKZ

1GL0

1GL1

1GL4

1GM6

1GME

1GMI

1GMM

1GMU

1GMX

1GMY

1GMZ

1GNG

1GNK

1GNL

1GNT

1GNU

1GNW

1GNX

1GO3

1GO3

1GO4

1GO4

1GOI

1GOJ

1GOT

1GOT

1GOX

1GP0

1GP1

1GP6

1GPC

1GPH

1GPJ

1GPL

1GPM

1GPP

1GPQ

1GPQ

1GPR

1GQ6

1GQ8

1GQE

1GQN

1GQO

1GQP

1GQV

1GQZ

1GR0

1GR3

1GRH

1GRJ

1GS0

1GS5

1GS9

1GSA

1GSK

1GSM

1GSO

1GSU

1GT1

1GTE

1GTK

1GTM

1GTT

1GTZ

1GU2

1GU6

1GU7

1GUD

1GUL

1GUQ

1GUX

1GUX

1GUZ

1GV3

1GV9

1GVE

1GVH

1GVJ

1GVK

1GVN

1GVZ

1GW5

1GW5

1GW5

1GW5

1GWC

1GWE

1GWI

1GWJ

1GWK

1GWS

1GWU

1GWY

1GX1

1GX3

1GXC

1GXJ

1GXM

1GXQ

1GXR

1GXY

1GYG

1GYH

1GYO

1GYT

1GYV

1GZ2

1GZ6

1GZG

1GZQ

1GZQ

1GZS

1GZS

1H03

1H05

1H09

1H0B

1H0C

1H0H

1H0H

1H0P

1H0X

1H12

1H16

1H1A

1H1D

1H1N

1H1O

1H1Y

1H21

1H2B

1H2E

1H2I

1H2K

1H2S

1H30

1H32

1H32

1H3D

1H3F

1H3G

1H3N

1H3Q

1H41

1H4A

1H4G

1H4R

1H4V

1H4X

1H54

1H5B

1H5Q

1H5W

1H5Y

1H65

1H6D

1H6G

1H6H

1H6K

1H6L

1H6O

1H6P

1H6T

1H6U

1H6V

1H6W

1H6Z

1H70

1H72

1H7C

1H7E

1H7M

1H7S

1H7Z

1H80

1H8D

1H8E

1H8T

1H8T

1H8T

1H8U

1H97

1H99

1H9H

1H9M

1H9S

1HA1

1HBN

1HBN

1HBN

1HC1

1HC7

1HCB

1HCV

1HCZ

1HD2

1HD7

1HDC

1HDF

1HDG

1HDH

1HDI

1HDK

1HDM

1HDM

1HDO

1HE1

1HE1

1HEK

1HEU

1HF2

1HF8

1HFC

1HFE

1HFE

1HFO

1HFU

1HFX

1HG3

1HG4

1HG8

1HGX

1HH1

1HH2

1HH8

1HHS

1HHU

1HHY

1HI9

1HIW

1HIX

1HJ8

1HJ9

1HJR

1HK8

1HKF

1HKG

1HKH

1HKK

1HKQ

1HKW

1HKX

1HL2

1HL9

1HLB

1HLC

1HLE

1HLM

1HLW

1HM6

1HM9

1HMC

1HMT

1HMY

1HN0

1HNE

1HNJ

1HNN

1HO8

1HPG

1HQ0

1HQ8

1HQS

1HQV

1HQZ

1HR6

1HR6

1HR8

1HR8

1HRK

1HRO

1HRU

1HS6

1HSB

1HSK

1HSS

1HT6

1HT8

1HTJ

1HTM

1HTP

1HTQ

1HTR

1HTW

1HU3

1HUF

1HUL

1HUP

1HUS

1HUW

1HUX

1HV5

1HV8

1HV9

1HVX

1HVY

1HW1

1HW5

1HW6

1HW7

1HWX

1HX0

1HX1

1HX1

1HX6

1HX8

1HXH

1HXI

1HXM

1HXM

1HXN

1HXR

1HXX

1HY5

1HY7

1HYE

1HYH

1HYN

1HYO

1HYQ

1HZ4

1HZD

1HZF

1HZI

1HZP

1HZT

1I0D

1I0R

1I0Z

1I12

1I19

1I1G

1I1I

1I1J

1I1N

1I1R

1I1R

1I1W

1I24

1I2A

1I2K

1I2M

1I2M

1I2S

1I31

1I36

1I39

1I3C

1I3U

1I3Z

1I40

1I4A

1I4D

1I4D

1I4J

1I4M

1I4N

1I4O

1I4O

1I4U

1I4W

1I52

1I58

1I5E

1I5G

1I5N

1I5P

1I60

1I6A

1I6P

1I6V

1I6V

1I6V

1I76

1I78

1I7G

1I7H

1I7K

1I7N

1I7Q

1I7Q

1I7W

1I7W

1I8A

1I8D

1I8J

1I8K

1I8K

1I8L

1I8L

1I8N

1I8O

1I8T

1I9G

1I9S

1I9W

1I9Z

1IA6

1IA8

1IA9

1IAE

1IAP

1IAR

1IAR

1IAT

1IAY

1IAZ

1IBJ

1IBY

1IC6

1ICP

1ICR

1ICX

1ID0

1ID1

1ID2

1IDK

1IDP

1IDR

1IDS

1IE9

1IEJ

1IFC

1IFQ

1IFR

1IG0

1IG3

1IG8

1IGM

1IGM

1IGW

1IH7

1IHB

1IHG

1IHK

1IHM

1IHN

1IHO

1IHP

1IHS

1IHU

1II2

1II5

1II7

1IIB

1IIC

1IIR

1IJ5

1IJB

1IJQ

1IJT

1IJX

1IJY

1IK6

1IK9

1IKN

1IKN

1IKN

1IKP

1IKT

1ILR

1IM3

1IM3

1IM4

1IM5

1IM8

1IMJ

1IN0

1IN4

1INL

1INP

1IO0

1IO1

1IO2

1IO7

1IOD

1IOD

1IOF

1IOM

1IOW

1IQ0

1IQ4

1IQ5

1IQ6

1IQ8

1IQA

1IQC

1IQP

1IQR

1IQV

1IR6

1IRD

1IRD

1IRJ

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IRU

1IS1

1IS2

1IS3

1IS8

1IS9

1ISC

1ISE

1ISP

1ISS

1IST

1IT2

1ITB

1ITB

1ITH

1ITK

1ITV

1ITW

1ITX

1ITZ

1IU4

1IU8

1IUG

1IUH

1IUJ

1IUK

1IUQ

1IV3

1IV8

1IVH

1IW0

1IWD

1IWE

1IWH

1IWH

1IWL

1IWM

1IWP

1IWP

1IWP

1IX9

1IXC

1IXH

1IXK

1IXL

1IXM

1IXS

1IXV

1IXZ

1IY8

1IY9

1IYB

1IYE

1IYH

1IYK

1IYN

1IYS

1IYX

1IZ0

1IZ5

1IZ6

1IZC

1IZM

1IZN

1IZN

1IZO

1J05

1J05

1J09

1J0A

1J0H

1J0M

1J0P

1J0W

1J1B

1J1D

1J1D

1J1D

1J1I

1J1J

1J1L

1J1M

1J1T

1J1Y

1J20

1J24

1J27

1J2G

1J2J

1J2P

1J2Q

1J2Q

1J2R

1J2Y

1J2Z

1J30

1J31

1J32

1J33

1J34

1J34

1J36

1J3A

1J3B

1J3K

1J3K

1J3L

1J3N

1J3U

1J3V

1J3W

1J48

1J4N

1J4T

1J54

1J58

1J5P

1J5S

1J5U

1J5V

1J5W

1J5X

1J5Y

1J6O

1J6R

1J6U

1J6W

1J6X

1J71

1J72

1J77

1J79

1J7D

1J7D

1J7G

1J7J

1J7N

1J7X

1J83

1J8B

1J8F

1J8M

1J8S

1J8U

1J93

1J97

1J98

1J9A

1J9B

1J9L

1JA1

1JA9

1JAD

1JAE

1JAG

1JAK

1JAL

1JAT

1JAT

1JAY

1JB0

1JB0

1JB0

1JB0

1JB0

1JB2

1JB3

1JB9

1JBE

1JBG

1JBK

1JBO

1JBO

1JBW

1JC4

1JC9

1JCF

1JD0

1JD1

1JD5

1JDH

1JDL

1JDR

1JDW

1JE5

1JE6

1JEB

1JEB

1JEH

1JEO

1JEQ

1JEQ

1JER

1JET

1JF8

1JFB

1JFL

1JFM

1JFR

1JFU

1JFX

1JFZ

1JG1

1JGC

1JGS

1JGT

1JH6

1JHD

1JHF

1JHG

1JHJ

1JHL

1JHL

1JHL

1JHN

1JHS

1JI0

1JI1

1JI2

1JI4

1JI5

1JI6

1JIA

1JIG

1JIH

1JIL

1JIW

1JIW

1JIX

1JJ7

1JJF

1JJI

1JJO

1JJT

1JJV

1JK0

1JK0

1JK3

1JK7

1JKE

1JKG

1JKG

1JKM

1JKM

1JKS

1JKX

1JL0

1JL1

1JL3

1JL5

1JLJ

1JLN

1JLT

1JLV

1JLW

1JLY

1JM1

1JM6

1JMK

1JMM

1JMS

1JMT

1JMU

1JMU

1JMV

1JMX

1JMX

1JNI

1JNP

1JNR

1JNR

1JNU

1JOC

1JOG

1JOP

1JOS

1JOT

1JOV

1JPA

1JPD

1JPM

1JPZ

1JQ5

1JQB

1JQE

1JQG

1JQI

1JQK

1JQL

1JQL

1JQN

1JQO

1JR0

1JR1

1JR2

1JR7

1JR8

1JR9

1JRL

1JRO

1JRO

1JRR

1JS1

1JS3

1JS9

1JSF

1JSG

1JSS

1JSU

1JSU

1JSW

1JSX

1JT6

1JTD

1JTD

1JTG

1JTG

1JTV

1JU3

1JUB

1JUG

1JUO

1JUQ

1JUV

1JV1

1JVB

1JVN

1JVQ

1JVQ

1JVW

1JW7

1JW9

1JWI

1JWI

1JWQ

1JX2

1JX2

1JX6

1JX7

1JXG

1JXH

1JXN

1JY1

1JY5

1JYA

1JYE

1JYH

1JYK

1JYO

1JYO

1JZ8

1JZN

1JZT

1K04

1K07

1K0D

1K0G

1K0M

1K0R

1K0W

1K0Z

1K12

1K1B

1K1D

1K1E

1K1X

1K20

1K24

1K28

1K28

1K2E

1K2F

1K2W

1K2X

1K2X

1K32

1K38

1K3E

1K3I

1K3P

1K3R

1K3S

1K3T

1K3V

1K3Y

1K44

1K47

1K4I

1K4M

1K4N

1K4Z

1K55

1K5D

1K5D

1K5D

1K5J

1K5N

1K5N

1K66

1K68

1K6K

1K75

1K77

1K7H

1K7I

1K7J

1K7K

1K87

1K8F

1K8K

1K8K

1K8K

1K8K

1K8K

1K8K

1K8K

1K8R

1K8R

1K8T

1K92

1K94

1K9V

1K9X

1KA1

1KA9

1KA9

1KAC

1KAC

1KAF

1KAG

1KAM

1KAO

1KAP

1KAS

1KB0

1KB5

1KB5

1KB5

1KB5

1KB9

1KB9

1KB9

1KB9

1KB9

1KB9

1KB9

1KB9

1KBL

1KBV

1KCF

1KCG

1KCG

1KCM

1KCQ

1KCV

1KCV

1KCX

1KCZ

1KDJ

1KEA

1KEK

1KEQ

1KEW

1KEX

1KEZ

1KF6

1KF6

1KF6

1KF6

1KFI

1KFW

1KG0

1KG0

1KG0

1KG2

1KGA

1KGC

1KGC

1KGD

1KGN

1KGS

1KHB

1KHC

1KHD

1KHI

1KHQ

1KHT

1KHV

1KHX

1KHY

1KI0

1KIC

1KID

1KIG

1KIJ

1KIY

1KJ1

1KJN

1KJQ

1KJW

1KJY

1KK1

1KKC

1KKE

1KKH

1KKM

1KKM

1KKO

1KL1

1KL7

1KL9

1KLF

1KLF

1KLI

1KLL

1KLO

1KLT

1KLU

1KLU

1KLU

1KLX

1KM4

1KM8

1KMH

1KMH

1KMI

1KMI

1KMJ

1KMM

1KMO

1KMQ

1KMT

1KMV

1KN1

1KN1

1KNB

1KNC

1KNG

1KNV

1KNW

1KNX

1KNY

1KO3

1KO6

1KO7

1KO9

1KOA

1KOB

1KOE

1KOL

1KON

1KOP

1KP0

1KPF

1KPG

1KPI

1KPS

1KPS

1KPT

1KQ3

1KQ6

1KQF

1KQF

1KQF

1KQN

1KQP

1KQR

1KQW

1KR4

1KR7

1KRH

1KRQ

1KRR

1KS5

1KS8

1KS9

1KSH

1KSH

1KSK

1KSO

1KT1

1KT6

1KTE

1KTG

1KTK

1KTK

1KU0

1KU1

1KU9

1KUF

1KUT

1KV3

1KV5

1KV7

1KV9

1KVK

1KW3

1KWG

1KWH

1KWI

1KWM

1KWS

1KXG

1KXO

1KXP

1KXP

1KXU

1KXV

1KXV

1KY3

1KY9

1KYA

1KYF

1KYH

1KYQ

1KYZ

1KZ1

1KZ7

1KZ7

1KZH

1KZL

1KZQ

1KZY

1KZY

1L0B

1L0O

1L0O

1L0Q

1L0W

1L1D

1L1J

1L1L

1L1N

1L1O

1L1O

1L1O

1L1Q

1L1Y

1L2H

1L2L

1L2T

1L2W

1L3I

1L3P

1L4I

1L4U

1L5J

1L5O

1L5V

1L5X

1L6J

1L6M

1L6P

1L6R

1L6W

1L7A

1L7D

1L7L

1L7V

1L7V

1L8A

1L8D

1L8K

1L8N

1L8Q

1L8W

1L9K

1L9V

1L9X

1LA1

1LA6

1LA6

1LAM

1LAR

1LB3

1LB6

1LBA

1LBQ

1LBU

1LBV

1LC0

1LC5

1LCI

1LCT

1LCY

1LDJ

1LDM

1LDN

1LDT

1LE6

1LEH

1LF2

1LF6

1LF7

1LFD

1LFK

1LFO

1LFP

1LFW

1LG7

1LGP

1LGY

1LH0

1LHP

1LHT

1LI4

1LI5

1LII

1LIT

1LIU

1LJ2

1LJ5

1LJ8

1LJ9

1LK5

1LKF

1LKI

1LKK

1LKP

1LKT

1LL2

1LL7

1LLA

1LLC

1LLD

1LLN

1LLU

1LM4

1LM5

1LM6

1LM7

1LM8

1LM8

1LME

1LMI

1LML

1LMO

1LNL

1LNQ

1LNS

1LNW

1LNZ

1LO6

1LO7

1LOP

1LOX

1LP3

1LP9

1LP9

1LP9

1LP9

1LPB

1LPG

1LPG

1LPJ

1LQ9

1LQA

1LQL

1LQS

1LQS

1LQT

1LQY

1LRV

1LRW

1LRZ

1LS1

1LS6

1LSH

1LSH

1LSS

1LST

1LSU

1LSW

1LT7

1LTK

1LTL

1LTO

1LTZ

1LU1

1LU4

1LUA

1LUC

1LUC

1LUF

1LUG

1LUQ

1LUR

1LV7

1LVA

1LVF

1LVG

1LVL

1LVM

1LVO

1LVW

1LW3

1LW7

1LWB

1LWD

1LWJ

1LXA

1LXJ

1LY1

1LYC

1LYQ

1LYV

1LYW

1LZJ

1LZL

1M0D

1M0K

1M0S

1M0U

1M0W

1M0Z

1M15

1M1C

1M1E

1M1F

1M1L

1M1N

1M1N

1M1S

1M22

1M2D

1M2K

1M2O

1M2O

1M2R

1M2X

1M2Z

1M32

1M33

1M3K

1M3S

1M3U

1M3Y

1M40

1M45

1M46

1M48

1M4J

1M4L

1M4R

1M4V

1M4Y

1M4Z

1M53

1M55

1M56

1M56

1M56

1M5H

1M5I

1M5N

1M5Q

1M5S

1M5Y

1M61

1M65

1M6D

1M6E

1M6H

1M6I

1M6J

1M6K

1M6P

1M6S

1M6Y

1M70

1M72

1M7B

1M7G

1M7S

1M7V

1M7X

1M7Y

1M85

1M8N

1M8P

1M8T

1M8Z

1M93

1M98

1M9I

1M9U

1M9X

1M9X

1M9Z

1MA1

1MA3

1MAI

1MAS

1MB3

1MB4

1MBA

1MBM

1MBX

1MBX

1MC2

1MC3

1MCP

1MCP

1MCT

1MD6

1MD8

1MDA

1MDA

1MDA

1MDB

1MDC

1MDW

1ME4

1MEM

1MEO

1MFM

1MFO

1MG2

1MG2

1MG2

1MG2

1MG4

1MG7

1MGP

1MGT

1MH1

1MHH

1MHH

1MHM

1MHQ

1MHY

1MHY

1MHY

1MI3

1MI8

1MIJ

1MIL

1MIO

1MIO

1MIU

1MIW

1MIX

1MJ0

1MJ3

1MJ5

1MJF

1MJH

1MJN

1MJT

1MJU

1MJU

1MK4

1MKA

1MKF

1MKH

1MKI

1MKM

1MKP

1MKY

1MKZ

1ML0

1ML4

1ML8

1ML9

1MLA

1MLD

1MLW

1MML

1MMQ

1MN4

1MN8

1MNA

1MNG

1MO0

1MO3

1MO9

1MOQ

1MOU

1MOZ

1MP8

1MP9

1MPG

1MPP

1MPX

1MPY

1MQ0

1MQ4

1MQB

1MQE

1MQI

1MQK

1MQK

1MQS

1MR1

1MR7

1MRG

1MRJ

1MRZ

1MS6

1MS9

1MSC

1MSK

1MSL

1MSP

1MT0

1MT5

1MTP

1MTY

1MTY

1MTY

1MTZ

1MU2

1MU2

1MU5

1MUC

1MUG

1MUK

1MUW

1MV5

1MV8

1MVE

1MVF

1MVH

1MVL

1MVO

1MW5

1MW7

1MW9

1MWM

1MWQ

1MWV

1MWW

1MX3

1MX9

1MXE

1MXG

1MXH

1MXI

1MXR

1MXS

1MY6

1MY7

1MYT

1MZ4

1MZ8

1MZA

1MZB

1MZG

1MZH

1MZJ

1MZN

1MZR

1MZU

1MZW

1MZY

1N00

1N08

1N0E

1N0U

1N0W

1N0X

1N0X

1N11

1N12

1N13

1N1B

1N1C

1N1L

1N1Q

1N28

1N2A

1N2E

1N2F

1N2S

1N2Z

1N3L

1N3Y

1N40

1N45

1N46

1N4K

1N4Q

1N4W

1N4X

1N4X

1N57

1N5D

1N5N

1N5U

1N62

1N62

1N62

1N67

1N6A

1N7H

1N7K

1N7V

1N7Z

1N81

1N82

1N83

1N8F

1N8I

1N8J

1N8P

1N8V

1N93

1N97

1N9B

1N9L

1N9P

1N9W

1NA5

1NA6

1NA8

1NAQ

1NAR

1NB2

1NB9

1NBA

1NBC

1NBF

1NBQ

1NBU

1NBW

1NBW

1NC5

1NC7

1NCI

1NCN

1NCQ

1NCQ

1NCQ

1NCW

1NCW

1NCX

1ND1

1ND2

1ND2

1ND2

1ND7

1NE2

1NE6

1NE7

1NE8

1NE9

1NEK

1NEK

1NEK

1NEK

1NEU

1NEX

1NEX

1NEY

1NF1

1NF2

1NF3

1NF3

1NF9

1NFG

1NFP

1NFV

1NG0

1NG2

1NG4

1NG6

1NGK

1NGN

1NGV

1NGV

1NH1

1NH8

1NHK

1NHP

1NHY

1NHZ

1NI3

1NI4

1NI4

1NI5

1NI9

1NIG

1NIJ

1NIR

1NIW

1NJ1

1NJ8

1NJF

1NJH

1NJK

1NJR

1NKG

1NKI

1NKO

1NKQ

1NKR

1NKS

1NKT

1NKV

1NLF

1NLN

1NLQ

1NLS

1NLT

1NLX

1NM2

1NM3

1NM8

1NME

1NMM

1NMM

1NMO

1NMU

1NMU

1NN4

1NN5

1NN7

1NNA

1NNF

1NNG

1NNH

1NNI

1NNL

1NNQ

1NNS

1NNW

1NNX

1NO1

1NO5

1NO7

1NOA

1NOF

1NOG

1NOS

1NOX

1NOZ

1NP3

1NP6

1NP7

1NPB

1NPE

1NPE

1NPL

1NPP

1NPU

1NPY

1NQ6

1NQ7

1NQE

1NQJ

1NQK

1NQN

1NQU

1NQZ

1NR0

1NR9

1NRF

1NRI

1NRJ

1NRJ

1NRK

1NRL

1NRR

1NRV

1NRW

1NRZ

1NS5

1NSJ

1NSL

1NST

1NSW

1NSZ

1NT2

1NT2

1NT4

1NTF

1NTG

1NTH

1NTM

1NTM

1NTM

1NTM

1NTM

1NTM

1NTV

1NTY

1NU0

1NU5

1NU7

1NU7

1NU7

1NUE

1NUK

1NUL

1NUN

1NUN

1NUU

1NUY

1NV8

1NVM

1NVM

1NVT

1NVU

1NVU

1NW1

1NW3

1NW9

1NWA

1NWP

1NWW

1NWZ

1NX8

1NX9

1NXH

1NXJ

1NXK

1NXM

1NXP

1NXQ

1NXU

1NY1

1NY5

1NY7

1NY7

1NYC

1NYK

1NYL

1NYR

1NYT

1NZ0

1NZ6

1NZA

1NZE

1NZI

1NZJ

1NZN

1NZO

1NZY

1O04

1O08

1O0E

1O0I

1O0S

1O0U

1O0W

1O0X

1O0Y

1O12

1O13

1O14

1O17

1O1X

1O1Y

1O1Z

1O20

1O22

1O26

1O2D

1O3U

1O3Y

1O4R

1O4S

1O4T

1O4U

1O4V

1O4W

1O4Y

1O4Z

1O50

1O51

1O54

1O58

1O59

1O5H

1O5I

1O5K

1O5L

1O5O

1O5U

1O5X

1O5Z

1O60

1O63

1O65

1O66

1O69

1O6B

1O6C

1O6D

1O6E

1O6L

1O6O

1O6O

1O6S

1O6S

1O6Y

1O6Z

1O73

1O75

1O7E

1O7F

1O7I

1O7J

1O7N

1O7N

1O7Q

1O7X

1O88

1O89

1O8B

1O8V

1O8X

1O91

1O94

1O94

1O94

1O97

1O97

1O98

1O9D

1O9G

1O9I

1O9J

1O9R

1O9W

1OA4

1OA8

1OAA

1OAC

1OAF

1OAH

1OAL

1OAO

1OAO

1OAP

1OAQ

1OAQ

1OB1

1OB1

1OB3

1OB8

1OBB

1OBD

1OBF

1OBO

1OBR

1OC0

1OC2

1OCK

1OCS

1OCV

1OCX

1OCY

1OD3

1OD5

1OD9

1ODF

1ODK

1ODM

1ODO

1ODZ

1OE0

1OE1

1OE8

1OE9

1OE9

1OEJ

1OEP

1OEQ

1OEW

1OEY

1OF3

1OF5

1OF5

1OF8

1OFC

1OFD

1OFH

1OFH

1OFU

1OFU

1OFV

1OFW

1OFZ

1OGA

1OGA

1OGA

1OGA

1OGD

1OGI

1OGL

1OGO

1OGP

1OGQ

1OGY

1OGY

1OH0

1OH2

1OHE

1OHF

1OHG

1OHL

1OHT

1OHU

1OHV

1OHZ

1OI0

1OI1

1OI2

1OI4

1OI6

1OI7

1OIH

1OIS

1OIU

1OIU

1OIV

1OJ1

1OJ4

1OJ5

1OJ7

1OJQ

1OJR

1OJS

1OJT

1OJX

1OK7

1OKC

1OKG

1OKI

1OKJ

1OKK

1OKK

1OKQ

1OKR

1OKT

1OL0

1OL5

1OLL

1OLM

1OLP

1OLR

1OLT

1OLZ

1OM4

1OMI

1OMO

1OMR

1OMW

1OMW

1OMZ

1ON0

1ON2

1ON3

1ONC

1ONF

1ONL

1ONR

1ONW

1OO0

1OO0

1OO2

1OOE

1OOH

1OOP

1OOP

1OOP

1OOY

1OPC

1OPO

1OQ1

1OQ9

1OQC

1OQE

1OQF

1OQQ

1OQV

1OR0

1OR0

1OR4

1OR7

1OR8

1ORE

1ORF

1ORJ

1ORR

1ORS

1ORS

1ORS

1ORU

1ORY

1OS8

1OSC

1OSH

1OSM

1OSN

1OSP

1OSP

1OSP

1OSY

1OT8

1OTG

1OTH

1OTJ

1OTK

1OTS

1OTS

1OTS

1OTV

1OU0

1OU5

1OU8

1OUO

1OUT

1OUT

1OUV

1OUW

1OVL

1OVM

1OVN

1OW1

1OW4

1OWL

1OX0

1OX3

1OX8

1OXD

1OXJ

1OXK

1OXK

1OXW

1OXX

1OY0

1OY3

1OY3

1OY5

1OYC

1OYE

1OYG

1OYJ

1OYS

1OYW

1OYZ

1OZ2

1OZ6

1OZ7

1OZ7

1OZ9

1OZB

1OZH

1P0F

1P0H

1P0K

1P0Y

1P0Z

1P15

1P16

1P1J

1P1L

1P1M

1P1X

1P22

1P22

1P27

1P27

1P2F

1P2Z

1P32

1P35

1P3C

1P3D

1P3R

1P3W

1P3Y

1P42

1P4A

1P4C

1P4D

1P4K

1P4L

1P4L

1P4O

1P4P

1P4T

1P4U

1P4X

1P57

1P57

1P5D

1P5F

1P5J

1P5Q

1P5S

1P5T

1P5V

1P5V

1P5Z

1P6O

1P6P

1P6X

1P77

1P7G

1P7K

1P7K

1P7O

1P80

1P8T

1P90

1P91

1P99

1P9B

1P9E

1P9H

1P9L

1P9O

1P9R

1P9S

1P9Y

1PA1

1PA2

1PA7

1PAM

1PAQ

1PAZ

1PB1

1PB6

1PB7

1PBE

1PBG

1PBJ

1PBK

1PBW

1PBY

1PBY

1PC6

1PCL

1PCQ

1PCX

1PCZ

1PDG

1PDK

1PDK

1PDO

1PDU

1PE1

1PE9

1PEA

1PEQ

1PEW

1PEX

1PF5

1PFF

1PFK

1PFO

1PFV

1PFZ

1PG4

1PG5

1PG5

1PG6

1PGI

1PGJ

1PGR

1PGR

1PGS

1PGT

1PGU

1PGV

1PGW

1PGW

1PHK

1PHO

1PHP

1PHS

1PI1

1PI4

1PIE

1PII

1PIN

1PIW

1PIX

1PJ3

1PJ5

1PJA

1PJC

1PJH

1PJM

1PJN

1PJQ

1PJR

1PJX

1PK5

1PK6

1PK6

1PK6

1PKH

1PKL

1PKO

1PKP

1PL4

1PL5

1PL8

1PLQ

1PM1

1PM4

1PMA

1PMA

1PME

1PMI

1PMJ

1PMN

1PMP

1PMT

1PMY

1PN0

1PN2

1PN3

1PNE

1PNO

1PO5

1POA

1POC

1POI

1POI

1POT

1POX

1PP0

1PP2

1PPJ

1PPJ

1PPJ

1PPJ

1PPJ

1PPJ

1PPO

1PPR

1PQ1

1PQ3

1PQ4

1PQ7

1PQH

1PQW

1PQZ

1PR9

1PRE

1PRT

1PRT

1PRT

1PRX

1PRZ

1PS1

1PS9

1PSD

1PSQ

1PSR

1PSU

1PSW

1PSZ

1PT6

1PTM

1PU5

1PU6

1PUC

1PUI

1PUJ

1PUO

1PV1

1PV5

1PV8

1PV9

1PVA

1PVC

1PVC

1PVC

1PVD

1PVG

1PVM

1PVN

1PVT

1PVV

1PW4

1PWA

1PWB

1PWG

1PWV

1PX0

1PXF

1PXV

1PXV

1PXW

1PXY

1PXZ

1PY5

1PYA

1PYB

1PYF

1PYK

1PYO

1PYO

1PYT

1PYT

1PYT

1PZ1

1PZ4

1PZ7

1PZG

1PZL

1PZM

1PZN

1PZS

1PZT

1PZV

1PZX

1Q06

1Q0B

1Q0P

1Q0Q

1Q0R

1Q0S

1Q0U

1Q12

1Q13

1Q15

1Q16

1Q16

1Q16

1Q1C

1Q1F

1Q1H

1Q1L

1Q1R

1Q1S

1Q1U

1Q20

1Q23

1Q2W

1Q2Y

1Q32

1Q33

1Q35

1Q3B

1Q3E

1Q3I

1Q3O

1Q3Q

1Q3X

1Q40

1Q40

1Q42

1Q44

1Q45

1Q46

1Q4M

1Q4R

1Q4U

1Q52

1Q5D

1Q5H

1Q5N

1Q5Q

1Q5Q

1Q5X

1Q5Z

1Q67

1Q6H

1Q6O

1Q6W

1Q6X

1Q6Z

1Q74

1Q77

1Q79

1Q7B

1Q7E

1Q7F

1Q7H

1Q7L

1Q7R

1Q7S

1Q88

1Q8A

1Q8B

1Q8C

1Q8D

1Q8F

1Q8I

1Q8R

1Q8U

1Q8Y

1Q90

1Q90

1Q90

1Q90

1Q92

1Q98

1Q9C

1Q9I

1Q9J

1Q9U

1QA7

1QA7

1QA9

1QAD

1QAH

1QAP

1QAU

1QAV

1QAZ

1QB0

1QB2

1QB3

1QB7

1QBA

1QBE

1QBK

1QBK

1QBZ

1QC6

1QC7

1QC9

1QCQ

1QCS

1QCX

1QCZ

1QD1

1QD6

1QD9

1QDL

1QDL

1QDM

1QE0

1QE3

1QE5

1QEZ

1QF8

1QF9

1QFH

1QFJ

1QFM

1QFT

1QFT

1QG3

1QGD

1QGH

1QGJ

1QGN

1QGO

1QGQ

1QGR

1QGT

1QGV

1QH4

1QH5

1QH8

1QH8

1QHD

1QHF

1QHL

1QHO

1QHQ

1QHT

1QHV

1QHX

1QI9

1QIB

1QID

1QIP

1QJ4

1QJ8

1QJB

1QJP

1QJV

1QK1

1QKI

1QKK

1QKM

1QKR

1QKS

1QL0

1QLA

1QLA

1QLA

1QLE

1QLE

1QLE

1QLE

1QLE

1QLM

1QLP

1QLW

1QM4

1QME

1QMG

1QMJ

1QMO

1QMO

1QMV

1QMY

1QN2

1QNG

1QNI

1QNT

1QNX

1QO0

1QO0

1QO3

1QO3

1QO3

1QO5

1QO7

1QO8

1QOI

1QOP

1QOP

1QOR

1QOU

1QOX

1QOY

1QP8

1QPC

1QPG

1QPO

1QPX

1QQ5

1QQ9

1QQE

1QQF

1QQG

1QQK

1QQL

1QQQ

1QQR

1QR2

1QR4

1QRE

1QS0

1QS0

1QSA

1QSD

1QSM

1QST

1QTF

1QTJ

1QTN

1QTO

1QTW

1QTX

1QU1

1QU7

1QU9

1QUA

1QUQ

1QUQ

1QUS

1QUU

1QV0

1QV9

1QVB

1QVC

1QVE

1QW2

1QW9

1QWD

1QWG

1QWI

1QWJ

1QWK

1QWL

1QWR

1QWT

1QWY

1QWZ

1QX1

1QXH

1QXM

1QXO

1QXY

1QY1

1QY5

1QY6

1QY7

1QY9

1QYC

1QYD

1QYI

1QYN

1QYR

1QYS

1QZ5

1QZ7

1QZ9

1QZF

1QZN

1QZT

1QZU

1QZZ

1R03

1R0D

1R0K

1R0M

1R0P

1R0R

1R0T

1R0U

1R0V

1R0W

1R12

1R13

1R17

1R18

1R1K

1R1K

1R1M

1R1Q

1R26

1R29

1R2F

1R2J

1R2Q

1R2R

1R30

1R31

1R3C

1R3D

1R3F

1R3H

1R3J

1R3J

1R3J

1R3N

1R3S

1R3U

1R44

1R45

1R4C

1R4P

1R4Q

1R4U

1R4V

1R4W

1R4X

1R53

1R59

1R5A

1R5B

1R5I

1R5I

1R5I

1R5J

1R5L

1R5P

1R5Q

1R5R

1R5T

1R5Y

1R61

1R62

1R6D

1R6F

1R6L

1R6N

1R6V

1R6W

1R6X

1R75

1R76

1R7A

1R7L

1R85

1R88

1R89

1R8G

1R8J

1R8N

1R8S

1R8S

1R9C

1R9D

1R9G

1R9H

1R9J

1R9L

1R9O

1R9W

1RA0

1RA4

1RA6

1RA9

1RBD

1RBL

1RBL

1RC2

1RC6

1RC9

1RCD

1RCQ

1RCU

1RCW

1RD5

1RDO

1RDS

1RDT

1RDT

1RE5

1RE9

1REG

1REQ

1REQ

1REW

1REW

1RF3

1RF6

1RFE

1RFM

1RFN

1RFS

1RFY

1RFZ

1RG8

1RG9

1RGX

1RGZ

1RH1

1RH2

1RHC

1RHF

1RHS

1RHY

1RI5

1RI6

1RI7

1RIE

1RIF

1RII

1RIL

1RIQ

1RJ1

1RJ8

1RJB

1RJD

1RJO

1RJW

1RK6

1RK8

1RK8

1RKB

1RKD

1RKQ

1RKT

1RKU

1RKX

1RL0

1RL2

1RL4

1RL6

1RLH

1RLI

1RLJ

1RLK

1RLM

1RLR

1RLW

1RM4

1RM6

1RM6

1RM6

1RM8

1RMD

1RMW

1RNF

1RO2

1RO5

1RO7

1ROC

1ROW

1RP3

1RP4

1RPM

1RPN

1RPX

1RPY

1RQ0

1RQ2

1RQB

1RQJ

1RQP

1RQW

1RR7

1RRE

1RRM

1RRO

1RRP

1RRP

1RSS

1RSY

1RT8

1RTF

1RTQ

1RTR

1RTT

1RTU

1RTV

1RTW

1RTY

1RU0

1RU4

1RU7

1RU7

1RU8

1RUR

1RUR

1RUT

1RV3

1RV9

1RVE

1RVG

1RVK

1RVV

1RW0

1RW1

1RW6

1RW7

1RWH

1RWI

1RWR

1RWT

1RWY

1RWZ

1RX0

1RXD

1RXQ

1RXX

1RXY

1RY2

1RY6

1RY9

1RYA

1RYB

1RYL

1RYO

1RYP

1RYP

1RYP

1RYP

1RYP

1RYP

1RYP

1RYP

1RYP

1RYP

1RYP

1RZ1

1RZ2

1RZ3

1RZ4

1RZ6

1RZF

1RZF

1RZH

1RZH

1RZH

1RZM

1RZN

1RZO

1RZU

1S0A

1S0P

1S0U

1S14

1S16

1S1C

1S1D

1S1F

1S1M

1S1P

1S1Q

1S21

1S28

1S2E

1S2K

1S2W

1S2X

1S35

1S3E

1S3G

1S3I

1S3J

1S3M

1S3S

1S3S

1S48

1S4B

1S4D

1S4E

1S4K

1S4Q

1S4V

1S4Y

1S57

1S58

1S5A

1S5D

1S5D

1S5J

1S5L

1S5L

1S5L

1S5L

1S5L

1S5L

1S5L

1S5P

1S5T

1S5U

1S68

1S69

1S6C

1S6Y

1S70

1S70

1S7I

1S7J

1S7M

1S7O

1S7Z

1S8N

1S95

1S96

1S98

1S99

1S9A

1S9J

1S9P

1S9R

1S9U

1S9V

1S9V

1SA0

1SA0

1SA0

1SAC

1SAT

1SAW

1SB8

1SBF

1SBP

1SBQ

1SBW

1SBX

1SBZ

1SC3

1SCF

1SCJ

1SCT

1SCT

1SCZ

1SD4

1SDI

1SDM

1SDO

1SDW

1SE0

1SE8

1SEB

1SEB

1SEB

1SED

1SEF

1SEI

1SEK

1SEN

1SES

1SEZ

1SF8

1SF9

1SFD

1SFE

1SFF

1SFL

1SFN

1SFP

1SFR

1SFX

1SG1

1SG1

1SG6

1SGH

1SGJ

1SGL

1SGM

1SGP

1SGW

1SGX

1SH0

1SH5

1SH8

1SHE

1SHS

1SHU

1SHY

1SHY

1SI5

1SI8

1SIG

1SIQ

1SIX

1SJ2

1SJD

1SJW

1SJY

1SK4

1SK7

1SKQ

1SKY

1SKY

1SKZ

1SL8

1SLQ

1SLU

1SLU

1SM2

1SMB

1SML

1SMO

1SMP

1SMP

1SMR

1SMT

1SMV

1SNC

1SNN

1SNR

1SNY

1SNZ

1SO2

1SO7

1SOT

1SOX

1SP3

1SP8

1SPG

1SPG

1SPP

1SPP

1SPV

1SPX

1SQ1

1SQ2

1SQ2

1SQ4

1SQ5

1SQ9

1SQE

1SQH

1SQI

1SQJ

1SQK

1SQL

1SQS

1SQU

1SR4

1SR4

1SR4

1SR7

1SR8

1SRA

1SRD

1SRQ

1SRR

1SRV

1SS4

1SSQ

1SSX

1ST9

1STF

1STM

1STZ

1SU0

1SU1

1SU8

1SUM

1SUR

1SUU

1SUW

1SV6

1SV8

1SVA

1SVI

1SVM

1SVP

1SVS

1SVV

1SVY

1SW5

1SW6

1SWV

1SWX

1SX7

1SXG

1SXJ

1SXJ

1SXJ

1SXJ

1SXJ

1SXJ

1SXR

1SY7

1SYR

1SYY

1SZ2

1SZ6

1SZ6

1SZ9

1SZB

1SZH

1SZI

1SZO

1SZP

1SZQ

1SZW

1T01

1T06

1T0A

1T0B

1T0F

1T0H

1T0H

1T0I

1T0J

1T0J

1T0L

1T0N

1T0Q

1T0Q

1T0T

1T10

1T11

1T15

1T16

1T1D

1T1G

1T1J

1T2A

1T2D

1T2L

1T2W

1T33

1T35

1T3B

1T3C

1T3D

1T3E

1T3I

1T3Q

1T3Q

1T3Q

1T3T

1T3U

1T3W

1T43

1T46

1T47

1T4B

1T4G

1T4H

1T4O

1T4W

1T56

1T57

1T5B

1T5H

1T5I

1T5J

1T5L

1T5O

1T5R

1T5Y

1T61

1T61

1T62

1T64

1T6A

1T6C

1T6E

1T6G

1T6G

1T6J

1T6L

1T6S

1T6T

1T6U

1T70

1T71

1T72

1T73

1T77

1T79

1T7F

1T7M

1T7R

1T7V

1T8P

1T8Q

1T8T

1T92

1T94

1T95

1T9B

1T9F

1T9H

1T9K

1TA0

1TA8

1TAZ

1TBF

1TBM

1TBR

1TBR

1TC1

1TC5

1TD4

1TD5

1TD6

1TDH

1TDJ

1TDQ

1TDQ

1TE2

1TE5

1TE6

1TED

1TEL

1TEV

1TEX

1TF0

1TF1

1TF4

1TF5

1TF7

1TFE

1TFF

1TFR

1TFU

1TFX

1TFZ

1TG8

1TGS

1TGZ

1TH0

1TH1

1TH1

1TH8

1TH8

1THF

1THM

1THQ

1THT

1THX

1TIA

1TIB

1TID

1TID

1TIE

1TII

1TIQ

1TIS

1TJ7

1TJC

1TJG

1TJG

1TJL

1TJN

1TJO

1TJV

1TJY

1TK4

1TK9

1TKE

1TKI

1TKS

1TL2

1TL9

1TLJ

1TLQ

1TLT

1TLU

1TLY

1TM0

1TM8

1TME

1TME

1TME

1TMK

1TML

1TMO

1TMY

1TN3

1TN6

1TN6

1TNR

1TNR

1TO0

1TO3

1TO6

1TO9

1TOA

1TOC

1TOC

1TOI

1TOL

1TON

1TP6

1TQ4

1TQ8

1TQG

1TQH

1TQI

1TQN

1TQX

1TQY

1TQY

1TR0

1TR9

1TRB

1TRE

1TRK

1TS9

1TSJ

1TT5

1TT5

1TT7

1TT8

1TU1

1TU3

1TU7

1TU9

1TUA

1TUE

1TUE

1TUH

1TUL

1TUV

1TUW

1TVD

1TVF

1TVG

1TVL

1TW3

1TW4

1TW6

1TW9

1TWD

1TWF

1TWF

1TWF

1TWF

1TWF

1TWF

1TWF

1TWF

1TWI

1TWL

1TWU

1TWY

1TX2

1TX4

1TX4

1TXD

1TXG

1TXJ

1TXK

1TXN

1TXO

1TXU

1TY0

1TY4

1TY9

1TYF

1TYG

1TYV

1TYY

1TZ0

1TZ9

1TZA

1TZF

1TZJ

1TZL

1TZP

1TZY

1TZY

1TZY

1TZY

1TZZ

1U00

1U02

1U04

1U08

1U0J

1U0K

1U0M

1U0S

1U0V

1U11

1U14

1U1I

1U1J

1U1Z

1U24

1U2C

1U2K

1U2M

1U2X

1U2Z

1U3D

1U3Y

1U4G

1U4J

1U4N

1U59

1U5H

1U5K

1U5P

1U5R

1U5U

1U60

1U61

1U69

1U6D

1U6G

1U6G

1U6G

1U6L

1U6M

1U79

1U7B

1U7G

1U7I

1U7K

1U7L

1U7N

1U7P

1U83

1U8S

1U8V

1U8W

1U8X

1U8Z

1U94

1U9A

1U9C

1U9D

1U9J

1U9K

1UA2

1UAC

1UAC

1UAC

1UAD

1UAI

1UAL

1UAN

1UAR

1UAS

1UAX

1UAY

1UAZ

1UB0

1UB2

1UB3

1UB4

1UB7

1UB9

1UBK

1UBK

1UBY

1UC2

1UC8

1UCT

1UCY

1UCY

1UCY

1UD0

1UD2

1UD9

1UDC

1UDD

1UDH

1UDS

1UDX

1UDZ

1UE5

1UE8

1UEA

1UEA

1UEB

1UED

1UEH

1UES

1UF2

1UF2

1UF2

1UF3

1UF5

1UF9

1UFA

1UFB

1UFK

1UFO

1UFR

1UFY

1UG6

1UGM

1UGN

1UGP

1UGP

1UGX

1UH5

1UHN

1UHV

1UI0

1UI5

1UIK

1UIR

1UIS

1UIU

1UIY

1UIZ

1UJ2

1UJ6

1UJK

1UJM

1UJN

1UJP

1UK8

1UKF

1UKG

1UKJ

1UKK

1UKL

1UKU

1UKV

1UKV

1UKW

1UKZ

1UL9

1ULH

1ULI

1ULI

1ULK

1ULQ

1ULS

1ULU

1ULV

1ULY

1ULZ

1UM0

1UM2

1UM5

1UM5

1UM8

1UMD

1UMD

1UMG

1UMH

1UMK

1UMM

1UMN

1UMR

1UMR

1UMV

1UMW

1UN0

1UN2

1UN3

1UN7

1UN8

1UNA

1UNF

1UNL

1UNL

1UNN

1UNN

1UNQ

1UOC

1UOH

1UOK

1UOL

1UOU

1UOW

1UOZ

1UP7

1UP8

1UP9

1UPB

1UPI

1UPK

1UPQ

1UPS

1UPT

1UPV

1UQR

1UQT

1UQW

1UQX

1UR3

1UR4

1UR5

1URH

1URJ

1URR

1URS

1URU

1URV

1URZ

1US0

1US3

1US5

1US7

1US7

1USC

1USG

1USJ

1USP

1USU

1USU

1USX

1USY

1USY

1UT1

1UT7

1UT9

1UTH

1UTN

1UTY

1UU1

1UU3

1UUF

1UUH

1UUL

1UUQ

1UUR

1UUY

1UUZ

1UUZ

1UV0

1UV7

1UW4

1UW6

1UW7

1UWH

1UWK

1UWS

1UWV

1UWZ

1UX5

1UX6

1UX8

1UXA

1UXO

1UXT

1UXY

1UXZ

1UY2

1UYJ

1UYL

1UYN

1UYP

1UYR

1UZ1

1UZ5

1UZB

1UZE

1UZP

1UZV

1UZX

1V00

1V02

1V04

1V0D

1V0E

1V0L

1V10

1V1A

1V1O

1V1Q

1V25

1V29

1V29

1V2A

1V2D

1V2X

1V2Z

1V30

1V33

1V37

1V3V

1V3W

1V3Y

1V43

1V47

1V4A

1V4P

1V4S

1V4V

1V4X

1V4X

1V54

1V54

1V54

1V54

1V54

1V58

1V5D

1V5V

1V5X

1V6C

1V6I

1V6S

1V6T

1V6Z

1V70

1V74

1V7C

1V7L

1V7R

1V7W

1V7Z

1V84

1V8A

1V8B

1V8C

1V8D

1V8F

1V8G

1V8P

1V8Y

1V93

1V97

1V9C

1V9D

1V9L

1V9M

1V9T

1V9Y

1VA4

1VA6

1VAK

1VAP

1VAV

1VB5

1VBF

1VC1

1VC4

1VCA

1VCL

1VCP

1VD5

1VDC

1VDH

1VDK

1VDR

1VDU

1VDW

1VE6

1VE9

1VEA

1VEC

1VEI

1VES

1VET

1VET

1VF5

1VF5

1VF5

1VF5

1VF7

1VFJ

1VFR

1VFS

1VFV

1VG0

1VG0

1VG8

1VGG

1VGQ

1VGS

1VGW

1VGY

1VH0

1VH1

1VH4

1VH5

1VH6

1VH9

1VHC

1VHE

1VHH

1VHI

1VHK

1VHM

1VHN

1VHO

1VHQ

1VHR

1VHS

1VHU

1VHV

1VHW

1VHX

1VHY

1VHZ

1VI0

1VI1

1VI2

1VI4

1VI6

1VI7

1VI9

1VIA

1VIC

1VIM

1VIO

1VIP

1VIU

1VIZ

1VJ1

1VJ2

1VJ7

1VJE

1VJF

1VJG

1VJH

1VJL

1VJN

1VJO

1VJP

1VJR

1VJT

1VJU

1VJV

1VJX

1VJZ

1VK0

1VK1

1VK2

1VK3

1VK4

1VK6

1VKC

1VKD

1VKE

1VKF

1VKH

1VKI

1VKJ

1VKK

1VKM

1VKN

1VKO

1VKP

1VKU

1VKV

1VKW

1VKY

1VKZ

1VL0

1VL1

1VL2

1VL4

1VL5

1VL6

1VL7

1VL8

1VLA

1VLB

1VLC

1VLF

1VLF

1VLG

1VLH

1VLI

1VLJ

1VLM

1VLO

1VLP

1VLQ

1VLR

1VLS

1VLU

1VLV

1VLW

1VM6

1VM7

1VM9

1VMA

1VMB

1VMD

1VME

1VMF

1VMH

1VMI

1VMJ

1VMK

1VMO

1VNS

1VOK

1VOM

1VP2

1VP4

1VP5

1VP6

1VP7

1VP8

1VPA

1VPB

1VPD

1VPE

1VPH

1VPJ

1VPK

1VPL

1VPM

1VPN

1VPP

1VPQ

1VPX

1VPY

1VQ0

1VQ2

1VQQ

1VQR

1VQS

1VQV

1VQW

1VR2

1VRT

1VRT

1VSR

1VYB

1VYD

1VYF

1VYI

1VYR

1VYU

1VZ0

1VZ6

1VZE

1VZO

1VZV

1VZY

1W07

1W0D

1W0I

1W0M

1W0N

1W0P

1W15

1W1H

1W1W

1W1W

1W1Z

1W23

1W25

1W27

1W2F

1W2W

1W2W

1W2Y

1W30

1W32

1W3B

1W3I

1W3O

1W3U

1W44

1W4X

1W5B

1W5F

1W5T

1W6K

1W6N

1W6S

1W6U

1W74

1W7B

1W7L

1W7W

1W85

1W85

1W8A

1W8I

1W8M

1W8O

1W96

1W97

1W9A

1W9C

1W9P

1W9S

1WA5

1WA5

1WA5

1WAD

1WB1

1WBA

1WC3

1WC9

1WCH

1WD5

1WD6

1WD7

1WDA

1WDC

1WDD

1WDD

1WDE

1WDI

1WDJ

1WDK

1WDK

1WDN

1WDU

1WDV

1WDY

1WE1

1WEH

1WEK

1WER

1WF3

1WF4

1WF4

1WFX

1WG8

1WGB

1WHI

1WIW

1WJ9

1WJG

1WJX

1WK2

1WK4

1WKC

1WKQ

1WKR

1WLF

1WLG

1WLJ

1WM1

1WMD

1WMG

1WMS

1WMU

1WMU

1WMX

1WMZ

1WND

1WOH

1WOQ

1WOS

1WOU

1WP1

1WP5

1WP6

1WPB

1WPG

1WPH

1WPN

1WPO

1WPW

1WQ8

1WR8

1WS8

1WSA

1WTD

1WTL

1WTY

1WU2

1WU3

1WUB

1WUE

1WUF

1WUU

1WV2

1WVI

1WWB

1WWC

1WWW

1WWW

1X6M

1X6O

1X72

1X79

1X7D

1X7F

1X7G

1X7Y

1X7Y

1X82

1X87

1X8H

1X8M

1X8Q

1X8V

1X91

1X92

1X94

1X99

1X9F

1X9F

1X9F

1X9F

1X9G

1X9I

1X9Z

1XA0

1XA1

1XA3

1XA6

1XAA

1XAR

1XAU

1XB2

1XB2

1XB3

1XB7

1XBB

1XBN

1XBT

1XBW

1XC2

1XC3

1XCB

1XCC

1XCG

1XCG

1XCL

1XCO

1XD3

1XD5

1XD7

1XDI

1XDN

1XDT

1XDY

1XDZ

1XE0

1XE1

1XE3

1XE7

1XEA

1XEB

1XED

1XER

1XEW

1XEW

1XEY

1XFH

1XFI

1XFJ

1XFK

1XFO

1XFP

1XFP

1XFS

1XG0

1XG5

1XG7

1XG8

1XG9

1XGK

1XGS

1XHC

1XHD

1XHK

1XHL

1XHO

1XHX

1XI3

1XI6

1XI9

1XIA

1XIM

1XIO

1XIP

1XIQ

1XIW

1XIW

1XIW

1XIZ

1XJ5

1XJC

1XJD

1XJU

1XKI

1XKK

1XKL

1XKN

1XKQ

1XKR

1XKS

1XKT

1XL4

1XLY

1XM3

1XM5

1XM7

1XM8

1XMA

1XMB

1XMC

1XMP

1XMR

1XMT

1XMX

1XNB

1XNF

1XNI

1XNV

1XNZ

1XO5

1XO7

1XOR

1XOU

1XP4

1XP8

1XPC

1XPJ

1XPM

1XPP

1XQ1

1XQ4

1XQ5

1XQ5

1XQ6

1XQ9

1XQA

1XQB

1XQG

1XQM

1XQO

1XQU

1XR4

1XR5

1XR7

1XRG

1XRH

1XRI

1XRK

1XS1

1XS5

1XSJ

1XSM

1XSO

1XSQ

1XSV

1XSZ

1XT9

1XTC

1XTC

1XTE

1XTG

1XTO

1XTP

1XU1

1XU2

1XU9

1XUB

1XUU

1XUV

1XV1

1XV2

1XVA

1XVH

1XVI

1XVP

1XVP

1XVQ

1XVS

1XVX

1XW6

1XW8

1XWA

1XWL

1XWM

1XWS

1XWV

1XX4

1XX6

1XX7

1XXF

1XXF

1XXL

1XY7

1XYG

1XYN

1XYP

1XYZ

1XZP

1XZP

1Y01

1Y01

1Y02

1Y08

1Y0E

1Y0G

1Y0H

1Y0Z

1Y13

1Y14

1Y14

1Y1L

1Y1O

1Y1X

1Y23

1Y2I

1Y2T

1Y4T

1Y60

1Y63

1Y6H

1Y6J

1Y6L

1Y7E

1Y88

1Y8C

1Y9I

1Y9Q

1Y9U

1YAA

1YAC

1YAL

1YAT

1YAV

1YB1

1YB5

1YBE

1YBF

1YCC

1YCN

1YCQ

1YCR

1YCS

1YCS

1YDG

1YDH

1YDW

1YEM

1YEY

1YFM

1YFO

1YFQ

1YG2

1YGE

1YGH

1YGP

1YGS

1YNA

1YPR

1YRG

1YTT

1ZBD

1ZBD

1ZFJ

1ZIN

1ZNC

1ZPD

1ZRN

1ZYM

256B

2A0B

2A2U

2AAA

2AAK

2ABK

2ACT

2ADM

2AE2

2AHJ

2AHJ

2AK3

2APS

2ARC

2ASR

2AT2

2AY1

2AYH

2AYQ

2AZA

2BAA

2BB2

2BBK

2BBK

2BC2

2BCE

2BEM

2BES

2BTM

2BTV

2BTV

2CAS

2CAU

2CB5

2CBL

2CCY

2CEV

2CHR

2CKB

2CKB

2CKB

2CMD

2CND

2CUA

2CY3

2CYH

2DLD

2DPM

2DRI

2E2A

2E2C

2EBN

2EIF

2END

2ENG

2FCB

2FCR

2FHE

2FOK

2FRV

2FRV

2GDM

2GMF

2GPR

2GSA

2GSQ

2HBG

2HFT

2HGS

2HHM

2HLC

2HMZ

2HRV

2HVM

2ILA

2ILK

2LDX

2LHB

2LIS

2LJR

2LTN

2MAD

2MAD

2MCM

2MEV

2MEV

2MEV

2MHR

2MNR

2MYS

2MYS

2MYS

2NAC

2NAP

2NCD

2OAT

2PF2

2PGD

2PGK

2PIA

2PII

2PKA

2PLC

2POR

2PRD

2PSP

2PTD

2PTH

2PVA

2PVB

2RHE

2RIG

2RMC

2RSL

2RSP

2SAK

2SAS

2SCP

2SCU

2SCU

2SGA

2SLI

2SPC

2SQC

2STV

2TBV

2TCT

2TGI

2TMG

2TNF

2TOH

2TPS

2TPT

2TRC

2TRC

2TRX

2UCZ

2VHB

2VSG

2XAT

2ZNC

3ADK

3APP

3C2C

3CAO

3CBH

3CLA

3CMS

3COX

3CSU

3CTS

3DFR

3EUG

3EZM

3FAP

3FIB

3GCB

3GRS

3KVT

3LAD

3LYN

3LZT

3MAG

3MDD

3NUL

3PCC

3PGA

3PMG

3PRN

3PRO

3PSG

3RAB

3RP2

3SDH

3SEB

3SIL

3SXL

3TAT

3TDT

3TGL

3THI

3TSS

3ULL

3VUB

4BCL

4CAT

4CPA

4FIV

4HB1

4MDH

4PFK

4PGA

4SBV

4SGB

4TS1

4UAG

4UBP

4XIA

5CSM

5CYT

5EAU

5NUL

5PAL

5RUB

5TMP

7A3H

7AAT

7AHL

7FD1

7LYZ

7MDH

7ODC

7TAA

830C

8ABP

8ACN

8TLN

9WGA

PDB ID


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.doc
Type: application/octet-stream
Size: 296960 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/0ed1e79d/1-0001.obj
From s0460205 at sms.ed.ac.uk  Mon Mar  7 09:24:11 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Mon Mar  7 13:38:25 2005
Subject: [Bioperl-l] (no subject)
Message-ID: <1110205451.422c640b5a5ba@sms.ed.ac.uk>


Hi,

I am writing a perl program that will extract data from a UniProt
flatfile so that I can automatically put data into
my PostgreSQL database. I am taking out name, protein ID number,
references etc from the file.

 Does anyone know if there is a script available to do this already?

 Many thanks,

 Stephen
From s0460205 at sms.ed.ac.uk  Mon Mar  7 09:24:57 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Mon Mar  7 13:38:26 2005
Subject: [Bioperl-l] Extraction from UniProt flatfile
Message-ID: <1110205497.422c64392756e@sms.ed.ac.uk>


Hi,

I am writing a perl program that will extract data from a UniProt
flatfile so that I can automatically put data into
my PostgreSQL database. I am taking out name, protein ID number,
references etc from the file.

 Does anyone know if there is a script available to do this already?

 Many thanks,

 Stephen

From lstein at cshl.edu  Mon Mar  7 12:56:34 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Mar  7 13:38:27 2005
Subject: [Bioperl-l] bump in GD::SVG
In-Reply-To: <1108754778.4216415a6dea9@webmail.njit.edu>
References: <1108754778.4216415a6dea9@webmail.njit.edu>
Message-ID: <200503071256.34583.lstein@cshl.edu>

Sorry for the delay.  I have forwarded this bug report to Todd Harris, 
who maintains GD::SVG.  Offhand I don't see a good explanation for 
this behavior, as GD::SVG is at a level below the code that does the 
bumping.

It would help to send a script that elicits the behavior.

Lincoln

On Friday 18 February 2005 02:26 pm, hz5@njit.edu wrote:
> Hi everybody,
> My bump setting in GD::SVG for generic glyph doesn't work. Has this
> happen to anyone or it is just me?
> (setting bump to 0 doesn't make exons align in one line)
> Thanks!
> haibo
> =========================================================
> Haibo Zhang, PhD student
> Computational Biology, NJIT & Rutgers University
> Center for Applied Genomics, PHRI
> http://afs13.njit.edu/~hz5
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/f21bb6c5/attachment-0001.bin
From lstein at cshl.edu  Mon Mar  7 13:02:58 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Mar  7 13:38:28 2005
Subject: [Bioperl-l] two colors on the same track, still not work
In-Reply-To: <1108946376.42192dc8f1bb2@webmail.njit.edu>
References: <1108946376.42192dc8f1bb2@webmail.njit.edu>
Message-ID: <200503071302.59272.lstein@cshl.edu>

Here's one guess. You have to make sure that the %colors hash is 
visible from the callback under Perl lexical scoping rules.  If you 
use "use strict" and the -w switch then Perl will warn about this 
type of common error.

If this doesn't work please send a complete test case script that 
shows the problem.  Make the script as short as possible and remove 
dependencies on other aspects of your system.  Also send the version 
numbers for Perl and BioPerl.

Lincoln

On Sunday 20 February 2005 07:39 pm, hz5@njit.edu wrote:
> Hi everybody,
> I am still struggling with two colors on the same track, if anybody
> can help me, I would appreciate it a lot!
>
> I want to have utr blue, coding seq brown, so I have utr
> splitlocations in one feature, and coding seq splitlocations in
> another:
> ######################################################
> 		my $f1 = Bio::SeqFeature::Generic->new(
> 				      -primary_tag => $geneid,
> 				      -seq_id      => $nm,
> 				      -source_tag  => $UTR_str,
> 				      -location    => $splitlocation_utr,
> 		);
> 		my $f = Bio::SeqFeature::Generic->new(
> 				      -primary_tag => $geneid,
> 				      -source_tag  => $coding_str,
> 				      -seq_id      => $nm,
> 				      -location    => $splitlocation,
> 		);
> 		push @allft, $f1;
> 		push @allft, $f;
>
> then I try to render @allft on one track, but color utr and coding
> sequence differently use a subroutine for bgcolor:
> #################################################
> 	my $track_nm = $panel ->add_track(\@allft,
> 			 -glyph   => 'generic',
> 			 -font2color     => 'blue',
> 			 -connector   => 'solid',
> 			 -bump => $bump,
> 			 -description => sub{
> 				my $f_tmp = shift;
> 				if($f_tmp->source_tag eq $dHSP_str){
> 					return '';
> 				}else{
> 					return $f_tmp->seq_id;
> 				}
> 				},
> 			-bgcolor => sub{
> 				my $f_tmp = shift;
> 				print "**".$colors{$f_tmp->source_tag}."\n";
> 				return $colors{$f_tmp->source_tag};
> 				},
> 			);
>
> I have %colors keyed by the source_tag I put in features. But it
> doesn't work.
>
> Anybody knows how to fulfill this kind of functions?
> Thanks!!!!
> haibo
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/65dc5bce/attachment-0001.bin
From Ned.Young at tufts.edu  Mon Mar  7 13:27:05 2005
From: Ned.Young at tufts.edu (Ned Young)
Date: Mon Mar  7 13:38:29 2005
Subject: [Bioperl-l] Need help using AlignIO
Message-ID: <9d0af755cfda0c309264e58c73a9f248@tufts.edu>

Hi,

I must not be using AlignIO right, for when I try to read in an 
alignment and then output it to a file, I get an empty file.

I'm trying to write a script for the design of multiplex SNP primers, 
and, after looking at several modules, thought that AlignIO would be 
good.

Can someone give me a pointer?  Here's a trimmed down version of my 
script, to show the problem, as well as the input file I've been using. 
  I run the script by typing:
./test3.pl test.fasta

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test3.pl
Type: application/text
Size: 535 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/30f90540/test3-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.fasta
Type: application/text
Size: 73730 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/30f90540/test-0001.bin
-------------- next part --------------


Any other modules I should look at?

Yours truly,

Ned Young	
Department of Biomedical Sciences
Division of Infectious Diseases
Tufts University School of Veterinary Medicine
200 Westboro Rd.
N. Grafton, MA 01536
508-887-4540
From jason.stajich at duke.edu  Mon Mar  7 13:48:49 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Mar  7 13:47:40 2005
Subject: [Bioperl-l] gap/ambiguous character only sequences:
	Bio::PrimarySeq
In-Reply-To: <1109863560.20641.154.camel@tick.compbio.dundee.ac.uk>
References: <1109863560.20641.154.camel@tick.compbio.dundee.ac.uk>
Message-ID: <d9d035a2ec3942bd597fb9cdcc533c5d@duke.edu>

I think you are talking about _guess_alphabet?

You can always override the _guess_alphabet method - I posted a soln to 
this last month.
http://portal.open-bio.org/pipermail/bioperl-l/2005-February/018253.html

Does that work for you?  It warns instead of throws when it is all 
gapped.  You can make it even quieter if you like of course.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 3, 2005, at 10:26 AM, Jon manning wrote:

> Hi All,
>
> For a lot of the stuff I'm doing at the moment I'm chopping up
> alignments and playing with the bits etc. I've had to nobble
> Bio::PrimarySeq to allow the resulting gap-only sequences in
> Bio::LocatableSeq- I understand the rationale behind this check, and
> it's a useful default, but could we perhaps have an option to allow
> tolerance instead? If such exists, I'd be grateful if someone could
> point me in the right direction!
>
> Thanks,
>
> Jon
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/a5842925/PGP.bin
From jason.stajich at duke.edu  Mon Mar  7 13:51:09 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Mar  7 13:51:11 2005
Subject: [Bioperl-l] GFF question
In-Reply-To: <42270CFF.1080502@iq.usp.br>
References: <42270CFF.1080502@iq.usp.br>
Message-ID: <4dd8e4a7f8f5c914952ff9ab3a2ac063@duke.edu>

All of the Parsers produce Bio::SeqFeatureI objects (well nearly all of 
them).  SeqFeatureI objects can be written out to GFF with 
Bio::Tools::GFF (and presumably Bio::FeatureIO).   Some of the 
genefeature parsers try and build Gene objects so you may have to 
untangle them some to get at the underlying exons and write each of 
those out to GFF as well.

There isn't a Repeatmasker parser in Bioperl that I know of although 
Ensembl has one which could be ported some day.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 3, 2005, at 8:11 AM, Thiago Motta Venancio wrote:

> Hi folks.
> I would like to get a more detailed explanation about how to construct 
> GFF files with the outputs of several programs, like genescan, 
> repeatmasker...
> thanks in advance.
> Thiago
>
> -- 
> Thiago Motta Venancio - PhD student in Bioinformatics
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/80368bda/PGP.bin
From jason.stajich at duke.edu  Mon Mar  7 13:56:01 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Mar  7 13:53:28 2005
Subject: [Bioperl-l] Need help using AlignIO
In-Reply-To: <9d0af755cfda0c309264e58c73a9f248@tufts.edu>
References: <9d0af755cfda0c309264e58c73a9f248@tufts.edu>
Message-ID: <d7cabe46bb8c5b6d3715d01bb8682312@duke.edu>

You are not providing the input file so no alignments are being read in

my $in  = Bio::AlignIO->new();

Should be
# or whatever format you have it stored in.
my $in  = Bio::AlignIO->new(-file => 'filename.aln', -format => 
'fasta');
Or if you want it to be the cmdline you need to specify it
my $in = Bio::AlignIO->new(-file => shift @ARGV, -format => 'fasta');

If you wanted to revert to the old behavior (> Bioperl 1.2) where 
either cmdline or STDIN would be re-directed as input you need the 
special ARGV handle.

my $in  = Bio::AlignIO->new(-fh => \*ARGV, -format => 'fasta'); # or 
whatever format you have it in, can't mix formats...


--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 7, 2005, at 1:27 PM, Ned Young wrote:

> Hi,
>
> I must not be using AlignIO right, for when I try to read in an 
> alignment and then output it to a file, I get an empty file.
>
> I'm trying to write a script for the design of multiplex SNP primers, 
> and, after looking at several modules, thought that AlignIO would be 
> good.
>
> Can someone give me a pointer?  Here's a trimmed down version of my 
> script, to show the problem, as well as the input file I've been 
> using.  I run the script by typing:
> ./test3.pl test.fasta
>
> <test3.pl><test.fasta>
>
> Any other modules I should look at?
>
> Yours truly,
>
> Ned Young	
> Department of Biomedical Sciences
> Division of Infectious Diseases
> Tufts University School of Veterinary Medicine
> 200 Westboro Rd.
> N. Grafton, MA 01536
> 508-887-4540
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050307/14136e2b/PGP.bin
From barry.moore at genetics.utah.edu  Mon Mar  7 18:03:29 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Mon Mar  7 17:58:18 2005
Subject: [Bioperl-l] need some help about  pqs
In-Reply-To: <3481.172.28.124.134.1110178851.squirrel@nwebmail.iitk.ac.in>
References: <3481.172.28.124.134.1110178851.squirrel@nwebmail.iitk.ac.in>
Message-ID: <422CDDC1.9090701@genetics.utah.edu>

Ashish-

I'll take a stab at this.  If I understand correctly you want to get 
about pdb files for 6,000 proteins.  There is a perl script on the PDB 
website that will download PDB, mmCIF, structure factors, NMR restraints 
files from the database.  You can find that here:
ftp://ftp.rcsb.org/pub/pdb/software/getPdbStructures.pl

Barry

-----------------------------

Dear sir,

Im student of Bioinformatics
                    ,sir im sending u ,a problem which im facing at this
time , in PQS, it is attached with this mail,
 the problem is ..
1 . i have 6000 proteins which i selected for my research ( see attachment
as list_id ), in first case i run it on pqs page  of pdb id ,which give me
out put in     2. for mate  & then on going to   3ed step it will give me
result in  .mol file , which was i needed .

 It is all correct , but it is good for 100 or 200 proteins , it can be
done manually , but for more than 6000 proteins it is ,tedious  job , so ,
 can u help me to do this job by any other method other than manually , or
is their any script for downloading all these files.


              waiting for reply ..

                                      thanks..


----------------------------------------------------
        Ashish Kumar Jaiswal
        MScBioinformatics
        c/o Dr. Balaji Prakash
        Structural Biology Lab,
        Department of Biological Sciences
        and Bioengineering,
        Indian Institute of Technology, Kanpur,
        UP-208016, INDIA

        Ph:  +91-512-2594024
        FAX: +91-512-2594010
        Email: jaiswal@iitk.ac.in
----------------------------------------------------

From heikki at ebi.ac.uk  Tue Mar  8 04:11:19 2005
From: heikki at ebi.ac.uk (Heikki Lehvaslaiho)
Date: Tue Mar  8 04:08:26 2005
Subject: [Bioperl-l] Extraction from UniProt flatfile
In-Reply-To: <1110205497.422c64392756e@sms.ed.ac.uk>
References: <1110205497.422c64392756e@sms.ed.ac.uk>
Message-ID: <200503080911.20036.heikki@ebi.ac.uk>

Take a look at the BioSQL project. There is a cvs repository called 
bioperl-db. It contains the script load_seqdatabase.pl, that does what you 
need. The database schema is in a repository biosql-schema as it is shared 
among several language projects.

	-Heikki


> I am writing a perl program that will extract data from a UniProt
> flatfile so that I can automatically put data into
> my PostgreSQL database. I am taking out name, protein ID number,
> references etc from the file.
>
>  Does anyone know if there is a script available to do this already?
>
>  Many thanks,
>
>  Stephen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
From sdavis2 at mail.nih.gov  Tue Mar  8 05:57:38 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue Mar  8 05:52:21 2005
Subject: [Bioperl-l] Extraction from UniProt flatfile
In-Reply-To: <200503080911.20036.heikki@ebi.ac.uk>
References: <1110205497.422c64392756e@sms.ed.ac.uk>
	<200503080911.20036.heikki@ebi.ac.uk>
Message-ID: <fbae656479b2e13c6405feac0fa720c3@mail.nih.gov>

Stephen,

There is another alternative that may meet your needs.  The folks at 
the UCSC genome browser maintain a relationalized version of uniprot 
(i.e., a MySQL database) here:

http://hgdownload.cse.ucsc.edu/goldenPath/uniProt/database/

that is available for download.

You can connect directly to it (for SQL queries) via their genome mysql 
server (open to the public).  Connection information is:

host:  genome-mysql.cse.ucsc.edu
User:  genome
password:  There isn't one (leave it blank)

Hope this helps.

Sean

On Mar 8, 2005, at 4:11 AM, Heikki Lehvaslaiho wrote:

> Take a look at the BioSQL project. There is a cvs repository called
> bioperl-db. It contains the script load_seqdatabase.pl, that does what 
> you
> need. The database schema is in a repository biosql-schema as it is 
> shared
> among several language projects.
>
> 	-Heikki
>
>
>
>> I am writing a perl program that will extract data from a UniProt
>> flatfile so that I can automatically put data into
>> my PostgreSQL database. I am taking out name, protein ID number,
>> references etc from the file.
>>
>>  Does anyone know if there is a script available to do this already?
>>
>>  Many thanks,
>>
>>  Stephen
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From mail2doreen at gmx.de  Tue Mar  8 06:15:59 2005
From: mail2doreen at gmx.de (mail2doreen@gmx.de)
Date: Tue Mar  8 06:14:00 2005
Subject: [Bioperl-l] (no subject)
Message-ID: <3768.1110280559@www63.gmx.net>

Hello all, 
 
i need to train a gene prediction programm and for that i have to convert 
a multi gff file into genbank format! 
Does anyone know a programm to use for that? 
 
Many thanks   

-- 
SMS bei wichtigen e-mails und Ihre Gedanken sind frei ...
Alle Infos zur SMS-Benachrichtigung: http://www.gmx.net/de/go/sms
From jason.stajich at duke.edu  Tue Mar  8 08:21:07 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Mar  8 08:16:49 2005
Subject: [Bioperl-l] (no subject)
In-Reply-To: <3768.1110280559@www63.gmx.net>
References: <3768.1110280559@www63.gmx.net>
Message-ID: <f6b908e7f99828ab70830edc1618c32f@duke.edu>

get the features with Bio::Tools::GFF
read in the sequence with Bio::SeqIO
call $seq->add_SeqFeature will all the features you got from 
Bio::Tools::GFF
write out the sequence with its new features with Bio::SeqIO using the 
'genbank' format.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 8, 2005, at 6:15 AM, mail2doreen@gmx.de wrote:

> Hello all,
>
> i need to train a gene prediction programm and for that i have to 
> convert
> a multi gff file into genbank format!
> Does anyone know a programm to use for that?
>
> Many thanks
>
> -- 
> SMS bei wichtigen e-mails und Ihre Gedanken sind frei ...
> Alle Infos zur SMS-Benachrichtigung: http://www.gmx.net/de/go/sms
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050308/5094cb13/PGP.bin
From akarger at CGR.Harvard.edu  Tue Mar  8 12:12:26 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue Mar  8 12:06:22 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to help
	biologists d o simple formatting and analysis
Message-ID: <339D68B133EAD311971E009027DC4797022FD7BD@montecarlo.cgr.harvard.edu>

Hi.

I've gotten the impression - in my short time in bioinformatics - that
biologists get very frustrated with data formatting and analysis tasks.
Which is too bad, because many of these tasks are trivial for someone with a
bit of Perl knowledge. Then again, we can't force them to learn Perl, even
if it would be For Their Own Good.

I was thinking it would be useful to have a toolkit of outrageously simple
Perl one-liners.  Here's one:

    # Merge two lists, removing duplicates (logical OR)
    perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile

A biologist (call her Sue) would look through a website containing a bunch
of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix
(from a website), then backspace over the filenames and type in their own
filenames, and end up with something like this on the command line:

myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 >
all_genes

The biologist hits return & voil?! Instant data munging!

Of course, I'm not the first one to identify this problem or try to solve
it.  But I think I'm working on a slightly different problem than previous
solutions, and my (complete lack of) interface is different too.  Here's the
"prior art" I've seen in this area, compared and contrasted with my idea.
- EMBOSS et al.: solving harder bioinformatics problems; Interface is Unix
executables
- Bioperl's bioscripts: harder problems; Perl executables
- Taverna / myGrid: fancy GUI interface (but I do think of my scripts as
"shims")

I'm really aiming for the lowest of low-hanging fruit here. I don't want
scripts that run Blast or do fancy analysis. Rather, we'll have scripts like
the above to merge lists, or get the standard deviation of column 7 of
tabular data, or get the GenBank IDs of the top 10 hits from a BLAST output,
or whatever. These are all tasks that're trivial in (Bio)Perl - and some you
can even do in Excel - but most biologists won't know either Perl or fancy
Excel.  Think of it as pipelining software for your vterm100.

Why one-liners?
- really, really fast development of new tools (especially compared with GUI
tools)
- no installation necessary, no dependencies (except Perl)
- no download necessary; just cut and paste a tool from the web page
- biologist doesn't need to learn an interface
- if a biologist learns just a bit of Perl, they can tweak the one-liners:
much easier than writing from scratch, but makes tools much more flexible
- take advantage of existing tools' APIs: perl -MBio::Perl -e '...'

Potential problems:
- psychological barrier to using command line (I figure I'll aim first at
the Unix-aware subset of biologists first, and leave complete World
Domination to Phase 2.)
- we can't fit error-handling into one-liners. Caveat scriptor

So my questions for you bioperlers (finally!):
- Are there other projects that have tried to solve this niche of problems
i.e., allowing biologists to do simple formatting & analysis of biological
or tabular data?
- Are there at least discussions of this issue that I could read somewhere
for ideas (e.g., bioperl-l archive)?
- Does anyone have any free advice (positive or negative or both) to offer
for this project?
- Are there any other lists I should post these questions to?

The working name for my toolbox of bio scripts is "Scriptome".  If it ever
gets off the ground (and anyone cares), I'll post more info about it, along
with a request for more advice, I'm sure.  

Thanks,
-Amir Karger
akarger@cgr.harvard.edu

From skirov at utk.edu  Tue Mar  8 13:20:04 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar  8 13:16:03 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to help
	biologists d o simple formatting and analysis
In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7BD@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC4797022FD7BD@montecarlo.cgr.harvard.edu>
Message-ID: <422DECD4.5080406@utk.edu>

I like a lot this idea.
First my answer to your first 2 questions: no, no.
But I bet may biologists would scream in pain just hearing the word 
console (as you mentioned). So I offer 0 step (bait to learn a little UNIX).
Imagine a simple web form that is hooked to the perl interpreter (might 
be tricky from a security point, still it could be restricted in several 
ways) and does (amazingly) what the biologist types in. This would have 
to include file uploads/downloads as well. Of course the capabilities 
will be quite restricted, but the appetite comes with eating as some say 
and suddenly the console might be not a bad idea (thus Mac shares would 
go up :-) ).

Amir Karger wrote:

>Hi.
>
>I've gotten the impression - in my short time in bioinformatics - that
>biologists get very frustrated with data formatting and analysis tasks.
>Which is too bad, because many of these tasks are trivial for someone with a
>bit of Perl knowledge. Then again, we can't force them to learn Perl, even
>if it would be For Their Own Good.
>
>I was thinking it would be useful to have a toolkit of outrageously simple
>Perl one-liners.  Here's one:
>
>    # Merge two lists, removing duplicates (logical OR)
>    perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
>
>A biologist (call her Sue) would look through a website containing a bunch
>of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix
>(from a website), then backspace over the filenames and type in their own
>filenames, and end up with something like this on the command line:
>
>myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 >
>all_genes
>
>The biologist hits return & voil?! Instant data munging!
>
>Of course, I'm not the first one to identify this problem or try to solve
>it.  But I think I'm working on a slightly different problem than previous
>solutions, and my (complete lack of) interface is different too.  Here's the
>"prior art" I've seen in this area, compared and contrasted with my idea.
>- EMBOSS et al.: solving harder bioinformatics problems; Interface is Unix
>executables
>- Bioperl's bioscripts: harder problems; Perl executables
>- Taverna / myGrid: fancy GUI interface (but I do think of my scripts as
>"shims")
>
>I'm really aiming for the lowest of low-hanging fruit here. I don't want
>scripts that run Blast or do fancy analysis. Rather, we'll have scripts like
>the above to merge lists, or get the standard deviation of column 7 of
>tabular data, or get the GenBank IDs of the top 10 hits from a BLAST output,
>or whatever. These are all tasks that're trivial in (Bio)Perl - and some you
>can even do in Excel - but most biologists won't know either Perl or fancy
>Excel.  Think of it as pipelining software for your vterm100.
>
>Why one-liners?
>- really, really fast development of new tools (especially compared with GUI
>tools)
>- no installation necessary, no dependencies (except Perl)
>- no download necessary; just cut and paste a tool from the web page
>- biologist doesn't need to learn an interface
>- if a biologist learns just a bit of Perl, they can tweak the one-liners:
>much easier than writing from scratch, but makes tools much more flexible
>- take advantage of existing tools' APIs: perl -MBio::Perl -e '...'
>
>Potential problems:
>- psychological barrier to using command line (I figure I'll aim first at
>the Unix-aware subset of biologists first, and leave complete World
>Domination to Phase 2.)
>- we can't fit error-handling into one-liners. Caveat scriptor
>
>So my questions for you bioperlers (finally!):
>- Are there other projects that have tried to solve this niche of problems
>i.e., allowing biologists to do simple formatting & analysis of biological
>or tabular data?
>- Are there at least discussions of this issue that I could read somewhere
>for ideas (e.g., bioperl-l archive)?
>- Does anyone have any free advice (positive or negative or both) to offer
>for this project?
>- Are there any other lists I should post these questions to?
>
>The working name for my toolbox of bio scripts is "Scriptome".  If it ever
>gets off the ground (and anyone cares), I'll post more info about it, along
>with a request for more advice, I'm sure.  
>
>Thanks,
>-Amir Karger
>akarger@cgr.harvard.edu
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From akarger at CGR.Harvard.edu  Tue Mar  8 14:05:26 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Tue Mar  8 13:58:42 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to h
	elp biologists d o simple formatting and analysis
Message-ID: <339D68B133EAD311971E009027DC4797022FD7BF@montecarlo.cgr.harvard.edu>

> From: Stefan Kirov [mailto:skirov@utk.edu] 
> 
> I like a lot this idea.
> First my answer to your first 2 questions: no, no.
> But I bet may biologists would scream in pain just hearing the word 
> console (as you mentioned). 

You make an excellent point. There are a number of avenues we've thought of
for making this tool more accessible to non-UNIX folks.  However, each one
of them requires some extra work in planning issues like security, command
paths, accessing input files, shell variables, etc. Because we've already
got a bunch of users here who need to use UNIX to use our computing cluster,
I figured I could have a prototype that only works from the UNIX command
line. If it works well on those folks, we can think about extending things
later (with the caveat that I want to be careful to keep the interface VERY
lightweight, because I don't trust myself to build a portable, "intuitive"
GUI.)

> So I offer 0 step (bait to learn 
> a little UNIX).
> Imagine a simple web form that is hooked to the perl 
> interpreter (might 
> be tricky from a security point, still it could be restricted 
> in several 
> ways) and does (amazingly) what the biologist types in. This 
> would have 
> to include file uploads/downloads as well. Of course the capabilities 
> will be quite restricted, but the appetite comes with eating 
> as some say 
> and suddenly the console might be not a bad idea (thus Mac 
> shares would 
> go up :-) ).

-Amir

> Amir Karger wrote:
> > >
> >I was thinking it would be useful to have a toolkit of 
> outrageously simple
> >Perl one-liners.  > >
> 
From harris at cshl.edu  Tue Mar  8 13:29:17 2005
From: harris at cshl.edu (Todd Harris)
Date: Tue Mar  8 17:34:45 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to
	help biologists d o simple formatting and analysis
In-Reply-To: <422DECD4.5080406@utk.edu>
Message-ID: <BE533D0D.17D3A%harris@cshl.edu>

Hi Amir - 

I like this idea.  You could also have the scripts process @ARGV so no
hand-editing would be necessary.   You might even just make the scripts
executable droplets which would be even easier to use.

Todd

> On 3/8/05 11:20 AM, Stefan Kirov wrote:

> I like a lot this idea.
> First my answer to your first 2 questions: no, no.
> But I bet may biologists would scream in pain just hearing the word
> console (as you mentioned). So I offer 0 step (bait to learn a little UNIX).
> Imagine a simple web form that is hooked to the perl interpreter (might
> be tricky from a security point, still it could be restricted in several
> ways) and does (amazingly) what the biologist types in. This would have
> to include file uploads/downloads as well. Of course the capabilities
> will be quite restricted, but the appetite comes with eating as some say
> and suddenly the console might be not a bad idea (thus Mac shares would
> go up :-) ).
> 
> Amir Karger wrote:
> 
>> Hi.
>> 
>> I've gotten the impression - in my short time in bioinformatics - that
>> biologists get very frustrated with data formatting and analysis tasks.
>> Which is too bad, because many of these tasks are trivial for someone with a
>> bit of Perl knowledge. Then again, we can't force them to learn Perl, even
>> if it would be For Their Own Good.
>> 
>> I was thinking it would be useful to have a toolkit of outrageously simple
>> Perl one-liners.  Here's one:
>> 
>>    # Merge two lists, removing duplicates (logical OR)
>>    perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
>> 
>> A biologist (call her Sue) would look through a website containing a bunch
>> of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix
>> (from a website), then backspace over the filenames and type in their own
>> filenames, and end up with something like this on the command line:
>> 
>> myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 >
>> all_genes
>> 
>> The biologist hits return & voil?! Instant data munging!
>> 
>> Of course, I'm not the first one to identify this problem or try to solve
>> it.  But I think I'm working on a slightly different problem than previous
>> solutions, and my (complete lack of) interface is different too.  Here's the
>> "prior art" I've seen in this area, compared and contrasted with my idea
From nelsonrt at iastate.edu  Tue Mar  8 18:36:57 2005
From: nelsonrt at iastate.edu (Rex Nelson)
Date: Tue Mar  8 18:33:53 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to 
	help biologists d o simple formatting and analysis
In-Reply-To: <BE533D0D.17D3A%harris@cshl.edu>
References: <BE533D0D.17D3A%harris@cshl.edu>
Message-ID: <p06020417be53e412bf2f@[129.186.230.13]>

Todd and Amir:

If you are running OS X there is a program 
Platypus which makes applications which would be 
quite suitable for simple perl/shell scripts. 
The output and input options are a little limited 
but to do defined little jobs it would work.  It 
allows you to put scripts inside an OS X 
application with or without drag and drop 
ability.  I don't know about "does (amazingly) 
what the biologist types in" but it would do 
defined jobs by clicking or drag-n-drop.

Rex

>Hi Amir -
>
>I like this idea.  You could also have the scripts process @ARGV so no
>hand-editing would be necessary.   You might even just make the scripts
>executable droplets which would be even easier to use.
>
>Todd
>
>>  On 3/8/05 11:20 AM, Stefan Kirov wrote:
>
>>  I like a lot this idea.
>>  First my answer to your first 2 questions: no, no.
>>  But I bet may biologists would scream in pain just hearing the word
>>  console (as you mentioned). So I offer 0 step (bait to learn a little UNIX).
>>  Imagine a simple web form that is hooked to the perl interpreter (might
>>  be tricky from a security point, still it could be restricted in several
>>  ways) and does (amazingly) what the biologist types in. This would have
>>  to include file uploads/downloads as well. Of course the capabilities
>>  will be quite restricted, but the appetite comes with eating as some say
>>  and suddenly the console might be not a bad idea (thus Mac shares would
>>  go up :-) ).
>>
>>  Amir Karger wrote:
>>
>>>  Hi.
>>>
>>>  I've gotten the impression - in my short time in bioinformatics - that
>>>  biologists get very frustrated with data formatting and analysis tasks.
>>>  Which is too bad, because many of these tasks 
>>>are trivial for someone with a
>>>  bit of Perl knowledge. Then again, we can't force them to learn Perl, even
>>>  if it would be For Their Own Good.
>>>
>>>  I was thinking it would be useful to have a toolkit of outrageously simple
>>>  Perl one-liners.  Here's one:
>>>
>>>     # Merge two lists, removing duplicates (logical OR)
>>>     perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
>>>
>>>  A biologist (call her Sue) would look through a website containing a bunch
>>>  of (searchable, categorized, etc.) scripts, cut & paste the Perl into Unix
>>>  (from a website), then backspace over the filenames and type in their own
>>>  filenames, and end up with something like this on the command line:
>>>
>>>  myhost>perl -ne '$seen{$_}++; END {print keys %seen}' genes1 genes2 >
>>>  all_genes
>>>
>>>  The biologist hits return & voil?! Instant data munging!
>>>
>>>  Of course, I'm not the first one to identify this problem or try to solve
>>>  it.  But I think I'm working on a slightly different problem than previous
>>>  solutions, and my (complete lack of) 
>>>interface is different too.  Here's the
>  >> "prior art" I've seen in this area, compared and contrasted with my idea


-- 
Rex Nelson Ph.D.
Postdoctoral Scientist
nelsonrt@iastate.edu
(515) 294-1297
~~~_/) ~~~

From daniel.lang at biologie.uni-freiburg.de  Wed Mar  9 05:20:13 2005
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed Mar  9 05:16:26 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
Message-ID: <422ECDDD.40404@biologie.uni-freiburg.de>

Hi,
I?m retrieving seq objects from a local biosql db (using the latest cvs 
verion of bioperl-db) and e.g. writing them with SeqIO. After changing 
from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the 
following error:

Operation `ne': no method found,!!left argument in overloaded package 
Bio::Annotation::Reference,!!right argument has no overloaded magic at 
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm 
line 534, <GEN1> line 1.!

The module PersistentObject.pm hasn?t changed and in Reference.pm there 
is only this change:

diff bioperl-live-Dec04/Bio/Annotation/Reference.pm 
bioperl-live/Bio/Annotation/Reference.pm
1c1
< # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
---
 > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
56c56,57
< # use overload '""' => \&as_text;
---
 > use overload '""' => sub { $_[0]->title || ''};
 > use overload 'eq' => sub { "$_[0]" eq "$_[1]" };

I?ve reversed this, but no positive result - the error remains...
Any hints?

Thanks in advance,
Daniel


From chad at dieselwurks.com  Tue Mar  8 22:15:07 2005
From: chad at dieselwurks.com (Chad Matsalla)
Date: Wed Mar  9 13:18:58 2005
Subject: [Bioperl-l] Aggressive aggregation?
Message-ID: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>


Subject: Aggressive Aggregators

Greetings all,

I'm looking for help in presenting Blast hits in GBrowse.

I blasted Brassica EST sequences against the Arabidopsis
pseudochromosome assemblies in order to store them in a Bio::DB::GFF
database. I used a tool based bp_search2gff.pl to `convert' blast
reports into gff. A sample of that gff is below[1].

My problem is partly based on a peculiarity of Blast and partly based on
the behavior of the aggregators in GBrowse and I'm wondering if someone
else has seen this.

Arabidopsis has five chromosomes. In order to get the coordinates
necessary to place ESTs on the chromosomes I created a blast database
containing 5 query sequences - chr1, chr2, chr3, chr4, chr5.

My problem presents itself when an EST hits at more than once place on a
Chromosome.  Let us say that on chr1 there is a cluster of HSPs for the
est chad1 at position 1000, a second cluster at position 10,000 and a
third cluster at 50,000. Blast will indicate a SINGLE hit on chr1.

SO, I manually find clusters of HSPs and create GFF that resembles that
below[1]. Yes I know that wublast has an option to prevent that
behavior.

The problem is that the `match' aggregator joins all of the `matches'
together.  I understand that it's because all of the matches have the
same Target - that's necessary to have the proper sequence appear while
viewing base-base alignments.

HSPs:        <-->  <-->  <-->                 <-->  <-->  <-->
matches:     <-------------->                 <-------------->

What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
What I want: <-->--<-->--<-->                 <-->--<-->--<-->

How do I get what I want? In my gbrowse.conf I tried the standard
`match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch}


Chad Matsalla


[1]
chr1 aafcest     HSP   1     75    .     +     .     Target "Sequence:chad1" 1 75
chr1 aafcest     HSP   100   150   .     +     .     Target "Sequence:chad1" 100 150
chr1 aafcest     match 1     150   .     +     .     Target "Sequence:chad1" 1 150

chr1 aafcest     HSP   200   275   .     -     .     Target "Sequence:chad1" 200 275
chr1 aafcest     HSP   300   450   .     -     .     Target "Sequence:chad1" 300 450
chr1 aafcest     match 200   450   .     -     .     Target "Sequence:chad1" 200 450


From s0460205 at sms.ed.ac.uk  Wed Mar  9 09:24:05 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Wed Mar  9 13:19:07 2005
Subject: [Bioperl-l] uniprot flatfile extraction
Message-ID: <1110378245.422f07057f49c@sms.ed.ac.uk>

Hi, sorry if this is basic but I've read the documentation and am still
confused!!

I wish to extract uniprot flatfile data into my database. I want to get the
following variables:

Protein ID, length, description, molecular weight, sequence, comments, cross
references, disulphide bonds, species, entered date, last modified, last
annotated, protein synonyms.

I know that I can get some of these (e.g. protein ID, length) using Bioperl but
can I get all of the data also or am I better writing my own from scratch?

Thanks

From jason.stajich at duke.edu  Wed Mar  9 13:42:47 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Mar  9 13:37:47 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
Message-ID: <9fed82865c1265db7eedb183112cd228@duke.edu>

So personally, I wouldn't use default BLASTN.

I'd use WU-BLAST with the -links option (this has worked well for  
mapping Brassica ESTs to Arabidopsis in my experience).  Then you can  
parse the BLAST (writing your own slightly customized version of  
search2gff which looks at the $hsp->link option to group things.  I  
just lectured on this today in fact:

http://people.genome.duke.edu/~jes12/BGT203.2005/projects/ 
find_duplicates/scripts/draw_hits.pl
http://people.genome.duke.edu/~jes12/BGT203.2005/projects/ 
find_duplicates/scripts/draw_hits_perlink.pl


Or if you are willing to have a little more overhead - exonerate  
(http://www.ebi.ac.uk/~guy/exonerate/) with the est2genome model which  
will try and splice the EST onto the genome for you as well.  You can  
dump out GFF directly which needs to be massaged a little before  
loading into Bio::DB::GFF.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 8, 2005, at 10:15 PM, Chad Matsalla wrote:

>
>
> Subject: Aggressive Aggregators
>
> Greetings all,
>
> I'm looking for help in presenting Blast hits in GBrowse.
>
> I blasted Brassica EST sequences against the Arabidopsis
> pseudochromosome assemblies in order to store them in a Bio::DB::GFF
> database. I used a tool based bp_search2gff.pl to `convert' blast
> reports into gff. A sample of that gff is below[1].
>
> My problem is partly based on a peculiarity of Blast and partly based  
> on
> the behavior of the aggregators in GBrowse and I'm wondering if someone
> else has seen this.
>
> Arabidopsis has five chromosomes. In order to get the coordinates
> necessary to place ESTs on the chromosomes I created a blast database
> containing 5 query sequences - chr1, chr2, chr3, chr4, chr5.
>
> My problem presents itself when an EST hits at more than once place on  
> a
> Chromosome.  Let us say that on chr1 there is a cluster of HSPs for the
> est chad1 at position 1000, a second cluster at position 10,000 and a
> third cluster at 50,000. Blast will indicate a SINGLE hit on chr1.
>
> SO, I manually find clusters of HSPs and create GFF that resembles that
> below[1]. Yes I know that wublast has an option to prevent that
> behavior.
>
> The problem is that the `match' aggregator joins all of the `matches'
> together.  I understand that it's because all of the matches have the
> same Target - that's necessary to have the proper sequence appear while
> viewing base-base alignments.
>
> HSPs:        <-->  <-->  <-->                 <-->  <-->  <-->
> matches:     <-------------->                 <-------------->
>
> What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
> What I want: <-->--<-->--<-->                 <-->--<-->--<-->
>
> How do I get what I want? In my gbrowse.conf I tried the standard
> `match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch}
>
>
> Chad Matsalla
>
>
> [1]
> chr1 aafcest     HSP   1     75    .     +     .     Target  
> "Sequence:chad1" 1 75
> chr1 aafcest     HSP   100   150   .     +     .     Target  
> "Sequence:chad1" 100 150
> chr1 aafcest     match 1     150   .     +     .     Target  
> "Sequence:chad1" 1 150
>
> chr1 aafcest     HSP   200   275   .     -     .     Target  
> "Sequence:chad1" 200 275
> chr1 aafcest     HSP   300   450   .     -     .     Target  
> "Sequence:chad1" 300 450
> chr1 aafcest     match 200   450   .     -     .     Target  
> "Sequence:chad1" 200 450
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/e805c467/PGP.bin
From akarger at CGR.Harvard.edu  Wed Mar  9 13:46:17 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Wed Mar  9 13:39:37 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to help 
	biologists d o simple formatting and analysis
Message-ID: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu>

In a private mail, Richard Copley wrote:
>Amir Karger wrote:
>> I was thinking it would be useful to have a toolkit of outrageously
simple
>> Perl one-liners.  Here's one:
>> 
>>     # Merge two lists, removing duplicates (logical OR)
>>     perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
>
>sort -u file1 file2

I know that many of the tasks proposed for the Scriptome can be done with
grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort,
join, and lots of others. But how many experimental biologists are familiar
with Unix cut? How many bother to learn even the least fancy Excel
functions?  I think not many, because they have other things to worry about.


One reason so many people have created integrated toolboxes is so that
biologists only need to learn how to use one tool, rather than learning 30
or whatever Unix commands.  The goal of Scriptome is that they only need to
learn one tool AND that the learning curve for that tool is very small. And
we make the learning curve small by using an extremely lightweight interface
(most of solving a problem involves searching on a website) rather than by
trying to create an intuitive GUI.  After all, how many folks  other than
Apple have created GUIs that are intuitive for more than a small subset of
people?

-Amir

From amackey at pcbi.upenn.edu  Wed Mar  9 14:00:40 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Mar  9 13:55:20 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
Message-ID: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu>


> My problem is partly based on a peculiarity of Blast and partly based 
> on
> the behavior of the aggregators in GBrowse and I'm wondering if someone
> else has seen this.

Welcome to the party ;)

> My problem presents itself when an EST hits at more than once place on 
> a
> Chromosome.

Besides Jason's recommendation to use a splicing-aware tool (exonerate 
is one, but Spidey is also a good one, and is based on BLASTN already), 
you have another issue which is that your GFF Target's need to be 
uniquely named.  This is a well-known drawback of GFF prior to GFF3, 
and a continuing issue with GBrowse when using the current Bio::DB:GFF 
(which is not yet GFF3-savvy).

> chr1 aafcest     HSP   1     75    .     +     .     Target 
> "Sequence:chad1" 1 75
> chr1 aafcest     HSP   100   150   .     +     .     Target 
> "Sequence:chad1" 100 150
> chr1 aafcest     match 1     150   .     +     .     Target 
> "Sequence:chad1" 1 150
>
> chr1 aafcest     HSP   200   275   .     -     .     Target 
> "Sequence:chad1" 200 275
> chr1 aafcest     HSP   300   450   .     -     .     Target 
> "Sequence:chad1" 300 450
> chr1 aafcest     match 200   450   .     -     .     Target 
> "Sequence:chad1" 200 450


These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" or 
some such.  This also means that if you're saving the ESTs in the 
database (for sequence alignment display), you'll have to save them 
redundantly under chad1-1, chad1-2, etc.  The same problem arises with 
BLASTX searches again protein databases.

Now, you could write a custom aggregator that de-aggregated multiple 
chad1 "match" features, assigning the contained HSPs to each, but there 
is no such "default" behavior.  Let me know if there's general interest 
for this ...

Anxiously awaiting GFF3-support in Bio::DB::GFF,
-Aaron

--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From amackey at pcbi.upenn.edu  Wed Mar  9 14:04:12 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Wed Mar  9 13:58:49 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <9fed82865c1265db7eedb183112cd228@duke.edu>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
	<9fed82865c1265db7eedb183112cd228@duke.edu>
Message-ID: <a423489fd8530fd020adc8f09ade11d9@pcbi.upenn.edu>

I also recommend -span1 to ensure consistent HSP ordering and 
orientation ... FYI, DPS (part of AAT package) is another good option 
for fast intron-savvy alignments.

-Aaron

On Mar 9, 2005, at 1:42 PM, Jason Stajich wrote:

> I'd use WU-BLAST with the -links option
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From echuong at gmail.com  Wed Mar  9 17:23:56 2005
From: echuong at gmail.com (Edward Chuong)
Date: Wed Mar  9 17:18:56 2005
Subject: [Bioperl-l] PAML nssites model result object
Message-ID: <244d2e0e050309142370997ce4@mail.gmail.com>

Hi all,

I'm trying to parse PAML results, and running into some trouble. I'm
using branch specific omega model, and I want to get the branch
specific ka/ks values out.
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup
says that $node->param('omega') should work, but Data::Dumper shows
that this value isn't stored in the node (only branch lengths and seq
IDs appear to be stored).

I'm assuming that I can get these values out of the
get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but
I'm not sure how to call it. The current synopsis uses
"get_model_params" but it seems to be out of date because it's not in
the current souce. The docs at
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content-type=text/vnd.viewcvs-markup
say to use my 
@results = @{$self->get_NSSite_results};
--that looks like a mistake, and I've tried 
@result = $result->get_NSSite_results but that doesn't work either
(just get undefined objs).

Am I doing something wrong, or is this functionality still being
worked on? I've tried using both 1.4 and the LIVE versions. Any help
is appreciated, thanks!

-Ed
From lstein at cshl.edu  Wed Mar  9 15:47:29 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed Mar  9 17:36:20 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
Message-ID: <200503091547.29838.lstein@cshl.edu>

Each of the multiple hits should have its own unique target name.  You 
can do this by appending a .01, .02, etc to the end of the Target 
name.

Lincoln


On Tuesday 08 March 2005 10:15 pm, Chad Matsalla wrote:
> Subject: Aggressive Aggregators
>
> Greetings all,
>
> I'm looking for help in presenting Blast hits in GBrowse.
>
> I blasted Brassica EST sequences against the Arabidopsis
> pseudochromosome assemblies in order to store them in a
> Bio::DB::GFF database. I used a tool based bp_search2gff.pl to
> `convert' blast reports into gff. A sample of that gff is below[1].
>
> My problem is partly based on a peculiarity of Blast and partly
> based on the behavior of the aggregators in GBrowse and I'm
> wondering if someone else has seen this.
>
> Arabidopsis has five chromosomes. In order to get the coordinates
> necessary to place ESTs on the chromosomes I created a blast
> database containing 5 query sequences - chr1, chr2, chr3, chr4,
> chr5.
>
> My problem presents itself when an EST hits at more than once place
> on a Chromosome.  Let us say that on chr1 there is a cluster of
> HSPs for the est chad1 at position 1000, a second cluster at
> position 10,000 and a third cluster at 50,000. Blast will indicate
> a SINGLE hit on chr1.
>
> SO, I manually find clusters of HSPs and create GFF that resembles
> that below[1]. Yes I know that wublast has an option to prevent
> that behavior.
>
> The problem is that the `match' aggregator joins all of the
> `matches' together.  I understand that it's because all of the
> matches have the same Target - that's necessary to have the proper
> sequence appear while viewing base-base alignments.
>
> HSPs:        <-->  <-->  <-->                 <-->  <-->  <-->
> matches:     <-------------->                 <-------------->
>
> What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
> What I want: <-->--<-->--<-->                 <-->--<-->--<-->
>
> How do I get what I want? In my gbrowse.conf I tried the standard
> `match' aggregator and a custom aggregator:
> csmmatch{csmhsp/csmmatch}
>
>
> Chad Matsalla
>
>
> [1]
> chr1 aafcest     HSP   1     75    .     +     .     Target
> "Sequence:chad1" 1 75 chr1 aafcest     HSP   100   150   .     +   
>  .     Target "Sequence:chad1" 100 150 chr1 aafcest     match 1    
> 150   .     +     .     Target "Sequence:chad1" 1 150
>
> chr1 aafcest     HSP   200   275   .     -     .     Target
> "Sequence:chad1" 200 275 chr1 aafcest     HSP   300   450   .     -
>     .     Target "Sequence:chad1" 300 450 chr1 aafcest     match
> 200   450   .     -     .     Target "Sequence:chad1" 200 450
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/5b4ea539/attachment.bin
From lstein at cshl.edu  Wed Mar  9 15:47:29 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed Mar  9 17:36:22 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
Message-ID: <200503091547.29838.lstein@cshl.edu>

Each of the multiple hits should have its own unique target name.  You 
can do this by appending a .01, .02, etc to the end of the Target 
name.

Lincoln


On Tuesday 08 March 2005 10:15 pm, Chad Matsalla wrote:
> Subject: Aggressive Aggregators
>
> Greetings all,
>
> I'm looking for help in presenting Blast hits in GBrowse.
>
> I blasted Brassica EST sequences against the Arabidopsis
> pseudochromosome assemblies in order to store them in a
> Bio::DB::GFF database. I used a tool based bp_search2gff.pl to
> `convert' blast reports into gff. A sample of that gff is below[1].
>
> My problem is partly based on a peculiarity of Blast and partly
> based on the behavior of the aggregators in GBrowse and I'm
> wondering if someone else has seen this.
>
> Arabidopsis has five chromosomes. In order to get the coordinates
> necessary to place ESTs on the chromosomes I created a blast
> database containing 5 query sequences - chr1, chr2, chr3, chr4,
> chr5.
>
> My problem presents itself when an EST hits at more than once place
> on a Chromosome.  Let us say that on chr1 there is a cluster of
> HSPs for the est chad1 at position 1000, a second cluster at
> position 10,000 and a third cluster at 50,000. Blast will indicate
> a SINGLE hit on chr1.
>
> SO, I manually find clusters of HSPs and create GFF that resembles
> that below[1]. Yes I know that wublast has an option to prevent
> that behavior.
>
> The problem is that the `match' aggregator joins all of the
> `matches' together.  I understand that it's because all of the
> matches have the same Target - that's necessary to have the proper
> sequence appear while viewing base-base alignments.
>
> HSPs:        <-->  <-->  <-->                 <-->  <-->  <-->
> matches:     <-------------->                 <-------------->
>
> What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
> What I want: <-->--<-->--<-->                 <-->--<-->--<-->
>
> How do I get what I want? In my gbrowse.conf I tried the standard
> `match' aggregator and a custom aggregator:
> csmmatch{csmhsp/csmmatch}
>
>
> Chad Matsalla
>
>
> [1]
> chr1 aafcest     HSP   1     75    .     +     .     Target
> "Sequence:chad1" 1 75 chr1 aafcest     HSP   100   150   .     +   
>  .     Target "Sequence:chad1" 100 150 chr1 aafcest     match 1    
> 150   .     +     .     Target "Sequence:chad1" 1 150
>
> chr1 aafcest     HSP   200   275   .     -     .     Target
> "Sequence:chad1" 200 275 chr1 aafcest     HSP   300   450   .     -
>     .     Target "Sequence:chad1" 300 450 chr1 aafcest     match
> 200   450   .     -     .     Target "Sequence:chad1" 200 450
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/5b4ea539/attachment-0001.bin
From jason.stajich at duke.edu  Wed Mar  9 17:41:24 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Mar  9 17:36:26 2005
Subject: [Bioperl-l] PAML nssites model result object
In-Reply-To: <244d2e0e050309142370997ce4@mail.gmail.com>
References: <244d2e0e050309142370997ce4@mail.gmail.com>
Message-ID: <896034a8342912841a4a0d0a0686353e@duke.edu>

Skipped content of type multipart/mixed-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050309/f71e1689/PGP.bin
From jason.stajich at duke.edu  Wed Mar  9 18:01:34 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed Mar  9 17:56:11 2005
Subject: [Bioperl-l] PAML nssites model result object
In-Reply-To: <896034a8342912841a4a0d0a0686353e@duke.edu>
References: <244d2e0e050309142370997ce4@mail.gmail.com>
	<896034a8342912841a4a0d0a0686353e@duke.edu>
Message-ID: <f18fbe9b369789b605a28574c8628c1b@duke.edu>

Resend with code pasted....

#!/usr/bin/perl -w
use strict;
use Bio::Tools::Phylo::PAML;

my $outcodeml = shift(@ARGV);

my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
					      -dir => "./");
my $result = $paml_parser->next_result();
my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
my @otus = $result->get_seqs;
if( $#{$MLmatrix} < 0 ) {
     for my $tree ($result->next_tree ) {
	for my $node ( $tree->get_nodes ) {
	    my $id;
	    if( $node->is_Leaf() ) {
		$id = $node->id;
	    } else {
		$id = "(".join(",", map { $_->id } grep { $_->is_Leaf }
			   $node->get_all_Descendents) .")";
	    }
	    if( ! $node->ancestor || ! $node->has_tag('t') ) {
		# skip when no values have been associated with this node
		# (like the root node)
		next;
	    }
             # I know this looks complicated
	    # but we use the get_tag_values method to pull out the annotations
	    # for each branch
	    # The ()[0] around the call is because get_tag_values returns a  
list
	    # if we want to just get the 1st item in the list we have
	    # to tell Perl we are treating it like an array.
	    # in the future get_tag_values needs to be smart and just
	    # return the 1st item in the array if called in scalar
	    # context
	
	    printf  
"%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ 
dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
	    $id,
	    map { ($node->get_tag_values($_))[0] }
	    qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
	}
     }
} else {
     my $i =0;
     my @seqs = $result->get_seqs;
     for my $row ( @$MLmatrix ) {
	print $seqs[$i++]->display_id, join("\t",@$row), "\n";
     }
}

On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote:

> I just updated things last week so this is brand-spanking-new.  I  
> don't know if I connected everything up for NSsites stuff quite yet   
> as that is handled in - the branch-specific parsing should work now.    
> I don't know if the synopsis code is really up to snuff either.  When  
> I get around to it I will try and see what still needs to be connected  
> in NSsites parsing.
>
> I don't think $node->param() is going to work -  
> $node->get_tag_values() is the way I've implemented it.
>
> <00parse_codeml.pl>
>
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote:
>
>> Hi all,
>>
>> I'm trying to parse PAML results, and running into some trouble. I'm
>> using branch specific omega model, and I want to get the branch
>> specific ka/ks values out.
>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/ 
>> vnd.viewcvs-markup
>> says that $node->param('omega') should work, but Data::Dumper shows
>> that this value isn't stored in the node (only branch lengths and seq
>> IDs appear to be stored).
>>
>> I'm assuming that I can get these values out of the
>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but
>> I'm not sure how to call it. The current synopsis uses
>> "get_model_params" but it seems to be out of date because it's not in
>> the current souce. The docs at
>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content- 
>> type=text/vnd.viewcvs-markup
>> say to use my
>> @results = @{$self->get_NSSite_results};
>> --that looks like a mistake, and I've tried
>> @result = $result->get_NSSite_results but that doesn't work either
>> (just get undefined objs).
>>
>> Am I doing something wrong, or is this functionality still being
>> worked on? I've tried using both 1.4 and the LIVE versions. Any help
>> is appreciated, thanks!
>>
>> -Ed
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>

From echuong at gmail.com  Wed Mar  9 18:37:21 2005
From: echuong at gmail.com (Edward Chuong)
Date: Wed Mar  9 18:32:24 2005
Subject: [Bioperl-l] PAML nssites model result object
In-Reply-To: <f18fbe9b369789b605a28574c8628c1b@duke.edu>
References: <244d2e0e050309142370997ce4@mail.gmail.com>
	<896034a8342912841a4a0d0a0686353e@duke.edu>
	<f18fbe9b369789b605a28574c8628c1b@duke.edu>
Message-ID: <244d2e0e0503091537d5f283d@mail.gmail.com>

Hi Jason,

Thanks for the help.

The code seems to get stuck at 

 if( ! $node->ancestor || ! $node->has_tag('t') ) {
(this condition turns out true for every node, not just root, so it
always hits "next")

I used Data::Dumper to check on the node and I've pasted the
results--it seems like those tags aren't being sent in?


Thanks!
-Ed

'_root_cleanup_methods' => [
		  sub { "DUMMY" }
		],
'_creation_id' => 0,
'_branch_length' => '0.613722',
'_desc' => {},
'_id' => 'NP_033437.2_mus',
'_ancestor' => bless( {
	 '_root_cleanup_methods' => [
					  $VAR1->{'_root_cleanup_methods'}[0]
					],
	 '_creation_id' => 3,
	 '_desc' => {
				  '2' => bless( {
					  '_root_cleanup_methods' => [
						   $VAR1->{'_root_cleanup_methods'}[0]
						 ],
					  '_creation_id' => 2,
					  '_branch_length' => '0.768322',
					  '_desc' => {},
					  '_id' => 'PM_BWp0001H02f',
					  '_ancestor' => $VAR1->{'_ancestor'},
					  '_root_verbose' => 0
					}, 'Bio::Tree::Node' ),
				  '0' => $VAR1,
				  '1' => bless( {
					  '_root_cleanup_methods' => [
												   $VAR1->{'_root_cleanup_methods'}[0]
												 ],
					  '_creation_id' => 1,
					  '_branch_length' => '0.366319',
					  '_desc' => {},
					  '_id' => 'NP_742070.1_rat',
					  '_ancestor' => $VAR1->{'_ancestor'},
					  '_root_verbose' => 0
					}, 'Bio::Tree::Node' )
				},
	 '_id' => '',
	 '_height' => undef,
	 '_root_verbose' => 0
   }, 'Bio::Tree::Node' ),
'_root_verbose' => 0
}, 'Bio::Tree::Node' );


On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich <jason.stajich@duke.edu> wrote:
> Resend with code pasted....
> 
> #!/usr/bin/perl -w
> use strict;
> use Bio::Tools::Phylo::PAML;
> 
> my $outcodeml = shift(@ARGV);
> 
> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
>                                               -dir => "./");
> my $result = $paml_parser->next_result();
> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
> my @otus = $result->get_seqs;
> if( $#{$MLmatrix} < 0 ) {
>      for my $tree ($result->next_tree ) {
>         for my $node ( $tree->get_nodes ) {
>             my $id;
>             if( $node->is_Leaf() ) {
>                 $id = $node->id;
>             } else {
>                 $id = "(".join(",", map { $_->id } grep { $_->is_Leaf }
>                            $node->get_all_Descendents) .")";
>             }
>             if( ! $node->ancestor || ! $node->has_tag('t') ) {
>                 # skip when no values have been associated with this node
>                 # (like the root node)
>                 next;
>             }
>              # I know this looks complicated
>             # but we use the get_tag_values method to pull out the annotations
>             # for each branch
>             # The ()[0] around the call is because get_tag_values returns a
> list
>             # if we want to just get the 1st item in the list we have
>             # to tell Perl we are treating it like an array.
>             # in the future get_tag_values needs to be smart and just
>             # return the 1st item in the array if called in scalar
>             # context
> 
>             printf
> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/
> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
>             $id,
>             map { ($node->get_tag_values($_))[0] }
>             qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
>         }
>      }
> } else {
>      my $i =0;
>      my @seqs = $result->get_seqs;
>      for my $row ( @$MLmatrix ) {
>         print $seqs[$i++]->display_id, join("\t",@$row), "\n";
>      }
> }
> 
> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote:
> 
> > I just updated things last week so this is brand-spanking-new.  I
> > don't know if I connected everything up for NSsites stuff quite yet
> > as that is handled in - the branch-specific parsing should work now.
> > I don't know if the synopsis code is really up to snuff either.  When
> > I get around to it I will try and see what still needs to be connected
> > in NSsites parsing.
> >
> > I don't think $node->param() is going to work -
> > $node->get_tag_values() is the way I've implemented it.
> >
> > <00parse_codeml.pl>
> >
> > -jason
> > --
> > Jason Stajich
> > jason.stajich at duke.edu
> > http://www.duke.edu/~jes12/
> >
> > On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote:
> >
> >> Hi all,
> >>
> >> I'm trying to parse PAML results, and running into some trouble. I'm
> >> using branch specific omega model, and I want to get the branch
> >> specific ka/ks values out.
> >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> >> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/
> >> vnd.viewcvs-markup
> >> says that $node->param('omega') should work, but Data::Dumper shows
> >> that this value isn't stored in the node (only branch lengths and seq
> >> IDs appear to be stored).
> >>
> >> I'm assuming that I can get these values out of the
> >> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but
> >> I'm not sure how to call it. The current synopsis uses
> >> "get_model_params" but it seems to be out of date because it's not in
> >> the current souce. The docs at
> >> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> >> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content-
> >> type=text/vnd.viewcvs-markup
> >> say to use my
> >> @results = @{$self->get_NSSite_results};
> >> --that looks like a mistake, and I've tried
> >> @result = $result->get_NSSite_results but that doesn't work either
> >> (just get undefined objs).
> >>
> >> Am I doing something wrong, or is this functionality still being
> >> worked on? I've tried using both 1.4 and the LIVE versions. Any help
> >> is appreciated, thanks!
> >>
> >> -Ed
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> 
> 


-- 
Edward Chuong
(949) 939-2732
AIM: edawad85
From davila at ioc.fiocruz.br  Wed Mar  9 21:42:55 2005
From: davila at ioc.fiocruz.br (davila)
Date: Wed Mar  9 21:42:51 2005
Subject: [Bioperl-l] Mysql columns and Blast evalues
Message-ID: <8D44604203DAF9438BF9123B4A08C779575FC1@alpha.ioc.fiocruz.br>

Hi All,
 
Not sure you already discussed this but I was not able to find anything by using google...
 
I am trying to store parsed Blast e-values (parsed with SearchIO) into mysql tables (MyISAM), the column in question is double(11,2)... would it be ok for really small e-values (eg: 1e-197) ? I am using MySQL 4.1.10 and only see "0" (zero) in the tables... when I set the column to double(11,3) then can see smaller evalues (like the above mentioned) ...
 
Another problem is to print in the screen those evalues, actually we are using CGI and sprintf like this:
 
$e_value = sprintf ( "%0.1e", $blast_hits->[$i][2]/1e200 );
 
But can only see values as "0.0e+00" in the screen... 
 
Any tips, would be greatly appreciated.
 
Kindest regards, Alberto
 
 
From skirov at utk.edu  Wed Mar  9 22:25:46 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Wed Mar  9 22:21:15 2005
Subject: [Bioperl-l] Mysql columns and Blast evalues
In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC1@alpha.ioc.fiocruz.br>
References: <8D44604203DAF9438BF9123B4A08C779575FC1@alpha.ioc.fiocruz.br>
Message-ID: <422FBE3A.3080006@utk.edu>

Try to store it as a varchar, it will not make much difference.
You printf is OK I guess, I guess either $blast or $i is actually undef 
and undef/anything (except undef or 0) is 0. You can use CGI::Debug (for 
example use CGI::Debug (report=>'everything', on=>'anything'); to trace 
the vars in a CGI script. There are other ways to debug CGI script, 
including command line -d (try searching google for debug CGI).
Hope this helps.

davila wrote:

>Hi All,
> 
>Not sure you already discussed this but I was not able to find anything by using google...
> 
>I am trying to store parsed Blast e-values (parsed with SearchIO) into mysql tables (MyISAM), the column in question is double(11,2)... would it be ok for really small e-values (eg: 1e-197) ? I am using MySQL 4.1.10 and only see "0" (zero) in the tables... when I set the column to double(11,3) then can see smaller evalues (like the above mentioned) ...
> 
>Another problem is to print in the screen those evalues, actually we are using CGI and sprintf like this:
> 
>$e_value = sprintf ( "%0.1e", $blast_hits->[$i][2]/1e200 );
> 
>But can only see values as "0.0e+00" in the screen... 
> 
>Any tips, would be greatly appreciated.
> 
>Kindest regards, Alberto
> 
> 
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>
From ak at ebi.ac.uk  Thu Mar 10 04:43:29 2005
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Thu Mar 10 04:38:28 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to help
	biologists d o simple formatting and analysis
In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu>
Message-ID: <20050310094329.GB29547@ebi.ac.uk>

I'm not quite sure what this has to do with bioperl...

On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote:
> In a private mail, Richard Copley wrote:

Forwarding private emails to mailing lists are we?

> >Amir Karger wrote:
> >> I was thinking it would be useful to have a toolkit of outrageously
> simple
> >> Perl one-liners.  Here's one:

http://www.oreilly.com/catalog/cookbook/

> >> 
> >>     # Merge two lists, removing duplicates (logical OR)
> >>     perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
> >
> >sort -u file1 file2
> 
> I know that many of the tasks proposed for the Scriptome can be done with
> grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort,
> join, and lots of others. But how many experimental biologists are familiar
> with Unix cut? How many bother to learn even the least fancy Excel
> functions?  I think not many, because they have other things to worry about.

Hmmm, comparing 'cut' and 'sed' with Word and Excel?  Oh well.

The philosophy of Unix utilities is to do only one thing,
but to do it very well.  In the case with the 'sort' utility
for example, it will most likely use an out-of-core sorting
algorithm to cope with files larger than the available memory
of the machine, and will probably be a fair bit quicker and
flexible than your own implementation.

> One reason so many people have created integrated toolboxes is so that
> biologists only need to learn how to use one tool, rather than learning 30
> or whatever Unix commands.  The goal of Scriptome is that they only need to
> learn one tool AND that the learning curve for that tool is very small. And
> we make the learning curve small by using an extremely lightweight interface
> (most of solving a problem involves searching on a website) rather than by
> trying to create an intuitive GUI.  After all, how many folks  other than
> Apple have created GUIs that are intuitive for more than a small subset of
> people?

The reason why so many people are creating integrating toolboxes
(really, are they?) is probably just because so many people
before them didn't do it right.  Mind you, doing it "right" is
not possible.

I do understand that there is a need for integrated utilities
with easy-to-press buttons, and I won't try to put you off
working on those kind of projects, but...

What would an experimental biologists, who is not familiar with
'sort', 'cut' or 'join', do with a Perl script that implemented
those functionalities?  Wouldn't it be better to provide a
high-level interface to common tasks, like parsing the output
from various programs and providing simple ways of accessing
and manipulating sequence features etc.  If you find ways to
expand the application area of BioPerl, or if you rationalize
and improve existing BioPerl code, then I'm sure the BioPerl
maintainers would be happy to consider commiting your code to
the project.


Regards,
Andreas

-- 
Andreas K?h?ri
EMBL-EBI/ensembl

1024D/C2E163CB
From davila at ioc.fiocruz.br  Thu Mar 10 04:42:19 2005
From: davila at ioc.fiocruz.br (davila)
Date: Thu Mar 10 04:42:29 2005
Subject: RES: [Bioperl-l] Mysql columns and Blast evalues
Message-ID: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br>

Hi Stefan,
 
Thanks for the tips !
 
I guess the problem of using VARCHAR could be the limitations to compare the real evalues, so if I want to do something or only show evalues greater or smaller than 1e-50 would it work ok ?
 
I wonder to know what other (mysql) column types (any further details would be appreciated) colleagues are using to store their Blast evalues ?
 
Thanks.
 
 
-----Mensagem original----- 
De: Stefan Kirov [mailto:skirov@utk.edu] 
Enviada: qui 10/3/2005 00:25 
Para: davila 
Cc: bioperl-l@portal.open-bio.org 
Assunto: Re: [Bioperl-l] Mysql columns and Blast evalues


From avilella at gmail.com  Thu Mar 10 08:08:57 2005
From: avilella at gmail.com (Albert Vilella)
Date: Thu Mar 10 08:10:18 2005
Subject: [Bioperl-l] stockholm AlignIO write_aln method
Message-ID: <1110460137.8027.6.camel@magneto>

Hi,

I'm willing to use the unimplemented write_aln method in stockholm
format.

As it isn't implemented, I would like to ask where could I find the doc
files for the format and the minimal the set of rules write_aln method
should obey,

Anyone?

Bests,

    Albert.

From skirov at utk.edu  Thu Mar 10 08:21:28 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Thu Mar 10 08:16:34 2005
Subject: RES: [Bioperl-l] Mysql columns and Blast evalues
In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br>
References: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br>
Message-ID: <423049D8.3070407@utk.edu>

I think it should work OK. And I think the values you want to store are 
too small for float (53 is the smallest: you should use float(53,53), 
which defines both storage and display precision).
I am personally using Oracle so it is a differnt game. One thing you can 
do is store the exponent, even just as an integer:
my $exp=int(log($blast_eval));
or
my $exp=log($blast_eval);
and store it as float.
When you want to work with the number again:
my $blast_eval=1**$exp;
Stefan

davila wrote:

>Hi Stefan,
> 
>Thanks for the tips !
> 
>I guess the problem of using VARCHAR could be the limitations to compare the real evalues, so if I want to do something or only show evalues greater or smaller than 1e-50 would it work ok ?
> 
>I wonder to know what other (mysql) column types (any further details would be appreciated) colleagues are using to store their Blast evalues ?
> 
>Thanks.
> 
> 
>-----Mensagem original----- 
>De: Stefan Kirov [mailto:skirov@utk.edu] 
>Enviada: qui 10/3/2005 00:25 
>Para: davila 
>Cc: bioperl-l@portal.open-bio.org 
>Assunto: Re: [Bioperl-l] Mysql columns and Blast evalues
>
>
> 
>  
>

From avilella at ub.edu  Thu Mar 10 08:26:31 2005
From: avilella at ub.edu (Albert Vilella)
Date: Thu Mar 10 08:20:58 2005
Subject: [Bioperl-l] stockholm AlignIO write_aln method
Message-ID: <1110461192.8193.0.camel@magneto>

Hi,

I'm willing to use the unimplemented write_aln method in stockholm
format.

As it isn't implemented, I would like to ask where could I find the doc
files for the format and the minimal the set of rules write_aln method
should obey,

Anyone?

Bests,

    Albert.
From avilella at ub.edu  Thu Mar 10 08:31:16 2005
From: avilella at ub.edu (Albert Vilella)
Date: Thu Mar 10 08:25:45 2005
Subject: [Bioperl-l] hapmap.pm startingcol now 11?
Message-ID: <1110461476.8193.6.camel@magneto>

Hi all,

AFAICS, Hapmap dump files have (since Dec 2004?) an extra field previous
to the starting column for the first genotype, so the $startingcol in
hapmap.pm should change from 10 to 11 (see end of message).

Can anyone confirm? 

I'm getting a MSG:

-------------------- WARNING ---------------------
MSG: cannot add NA06993 as a genotype skipping
--------------------------------------------------

And I'm not sure is related to this or not,

Bests,

    Albert.

hapmap.pm
---------------------------
sub _pivot {
    my ($self) = @_;

    my (@cols,@rows,@idheader);
    while ($_ = $self->_readline){
	chomp($_);
	next if( /^\s*\#/ || /^\s+$/ || ! length($_) );
	if( /^rs\#\s+alleles\s+chrom\s+pos\s+strand/ ) {
	    @idheader = split $self->flag('field_delimiter');
	} else { 
	    push @cols, [split $self->flag('field_delimiter')];
	}
    }
    #Post Dec 2004. Previously was 10
    my $startingcol = 11;

    $self->{'_header'} = [ map { $_->[0] } @cols];
    for my $n ($startingcol.. $#{ $cols[ 0 ]}) { 
	my $column = [ $idheader[$n],
		       map{ $_->[ $n ] } @cols ];	
	push (@rows, $column); 
    }
    $self->{'_pivot'} = [@rows];
    $self->{'_i'} = 0;
}
---------------------------
From jason.stajich at duke.edu  Thu Mar 10 08:30:32 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 10 08:30:21 2005
Subject: [Bioperl-l] stockholm AlignIO write_aln method
In-Reply-To: <1110460137.8027.6.camel@magneto>
References: <1110460137.8027.6.camel@magneto>
Message-ID: <98e7cc2b3a0c45ccb1f8b674181850d3@duke.edu>

Albert

write_aln should take a list of  Bio::Align::AlignI objects (more 
concretely, a Bio::SimpleAlign objects) and write them out using 
$self->_print.
AlignIO::clustalw is a good example.

Stockholm format is documented here:
http://www.cgb.ki.se/cgb/groups/sonnhammer/Stockholm.html

We don't really support the notion of #=GF, #=GC, #=GS, #=GR lines in 
Bioperl at this point although that would be nice so we can store and 
manipulate data like secondary structure strings within Bioperl.  I 
know some people had talked about this a long time ago, I don't think 
anything was done...

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 10, 2005, at 8:08 AM, Albert Vilella wrote:

> Hi,
>
> I'm willing to use the unimplemented write_aln method in stockholm
> format.
>
> As it isn't implemented, I would like to ask where could I find the doc
> files for the format and the minimal the set of rules write_aln method
> should obey,
>
> Anyone?
>
> Bests,
>
>     Albert.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050310/4925255d/PGP.bin
From fernan at iib.unsam.edu.ar  Thu Mar 10 08:48:14 2005
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Thu Mar 10 08:43:21 2005
Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file
In-Reply-To: <4f12c65ac919697fd8a7e9220db182fd@tigem.it>
References: <200502141205.52256.jswanson@iastate.edu>
	<200502281005.06990.jswanson@iastate.edu>
	<4f12c65ac919697fd8a7e9220db182fd@tigem.it>
Message-ID: <20050310134814.GE27364@iib.unsam.edu.ar>

+----[ Elia Stupka <elia@tigem.it> (01.Mar.2005 10:17):
|
| Hi Jordan, 
| 
| I have been doing some work on Contig::Assembly myself recently, and 
| have also been in touch with the author (Robson) about it. Perhaps the 
| best thing would be for the three of us to have a chat about this 
| object, try to revamp it a little with our improvements, and then 
| Robson or I can check it in? 
| 
| regards, 
| 
| Elia 
|
+----]

Hi!

We have just got a need to produce .ace files and noticed
that this functionality was lacking. 

I also saw the recent thread about this topic on the list.
Question: has this moved forward since the last message was
posted (March 1st)?

If so, are the  proposed changes in a form that can be
applied and tested by others (a recursive diff, perhaps
against a recent CVS checkout or against the 1.5-release)

Thanks in advance,

Fernan
From skirov at utk.edu  Thu Mar 10 09:14:40 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Thu Mar 10 09:09:42 2005
Subject: [Bioperl-l] Entrez Gene ASN
Message-ID: <42305650.30403@utk.edu>

Hi guys!
I have done some (mostly) serious thinking about ASN Entrez Gene parsing 
and I propose we do my favorite thing- postpone everything we cannot 
deal with right now. If you want it to sound better: take a gradual 
approach where we store the data we can deal with in the existing 
Bioperl objects and skipping the rest for now.
In details:
ASN gene record can be correctly represented as a tree. I have written a 
simple parser for my own purposes which is storing the following:
node_id---|
                  --parent
                  --level
                  --tag
                  --values
What I do then is get specific levels and tags and build different 
objects. So level 2 with parent EntrezGene (which is the root level and 
has no information) is gene description and has tags such as gene, name, 
etc; at level 3, 5 and 6 you can get the complete specie definition by 
looking for orgname and org as tags and records with parent mod (which 
is a value for orgname, descend down the branch).
I am using this approach to store most of the data in a relational 
database without going through Bioperl. What I ultimately want to do is 
use standard Bioperl modules. However, I don't think we have an object 
that can efficiently represent the structure (correct me if I am wrong). 
I think it may be a good idea to have a container object, possibly 
Bio::Gene that may contain multiple Bio::Seq objects (with or without 
real sequence). I believe we can borrow some structure and code from 
EnsEMBL gene representation (way to contain multiple transcripts, etc., 
not the database interactions certainly).
Please let me know what you think.
Stefan
From skirov at utk.edu  Thu Mar 10 09:20:49 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Thu Mar 10 09:15:38 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to help
	biologists d o simple formatting and analysis
In-Reply-To: <20050310094329.GB29547@ebi.ac.uk>
References: <339D68B133EAD311971E009027DC4797022FD7CB@montecarlo.cgr.harvard.edu>
	<20050310094329.GB29547@ebi.ac.uk>
Message-ID: <423057C1.1090700@utk.edu>

Allow me disagree. My understanding is this project is more about making 
biologist "computational hungry" rather than creating effective 
applications from a computation point of view. So I think it is more of 
an outreach project (did I get this right Amir).  Bioperl. The next 
logical thing for any biologist who is starting to use the computer as 
something more than a typewriter is to use something like Bioperl, 
because it is quite easy to understand and use (in many cases anyway).
Stefan

Andreas Kahari wrote:

>I'm not quite sure what this has to do with bioperl...
>
>On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote:
>  
>
>>In a private mail, Richard Copley wrote:
>>    
>>
>
>Forwarding private emails to mailing lists are we?
>
>  
>
>>>Amir Karger wrote:
>>>      
>>>
>>>>I was thinking it would be useful to have a toolkit of outrageously
>>>>        
>>>>
>>simple
>>    
>>
>>>>Perl one-liners.  Here's one:
>>>>        
>>>>
>
>http://www.oreilly.com/catalog/cookbook/
>
>  
>
>>>>    # Merge two lists, removing duplicates (logical OR)
>>>>    perl -ne '$seen{$_}++; END {print keys %seen}' file1 file2 > outfile
>>>>        
>>>>
>>>sort -u file1 file2
>>>      
>>>
>>I know that many of the tasks proposed for the Scriptome can be done with
>>grep, sed, cut, Word, or Excel. I'm planning on implementing head, sort,
>>join, and lots of others. But how many experimental biologists are familiar
>>with Unix cut? How many bother to learn even the least fancy Excel
>>functions?  I think not many, because they have other things to worry about.
>>    
>>
>
>Hmmm, comparing 'cut' and 'sed' with Word and Excel?  Oh well.
>
>The philosophy of Unix utilities is to do only one thing,
>but to do it very well.  In the case with the 'sort' utility
>for example, it will most likely use an out-of-core sorting
>algorithm to cope with files larger than the available memory
>of the machine, and will probably be a fair bit quicker and
>flexible than your own implementation.
>
>  
>
>>One reason so many people have created integrated toolboxes is so that
>>biologists only need to learn how to use one tool, rather than learning 30
>>or whatever Unix commands.  The goal of Scriptome is that they only need to
>>learn one tool AND that the learning curve for that tool is very small. And
>>we make the learning curve small by using an extremely lightweight interface
>>(most of solving a problem involves searching on a website) rather than by
>>trying to create an intuitive GUI.  After all, how many folks  other than
>>Apple have created GUIs that are intuitive for more than a small subset of
>>people?
>>    
>>
>
>The reason why so many people are creating integrating toolboxes
>(really, are they?) is probably just because so many people
>before them didn't do it right.  Mind you, doing it "right" is
>not possible.
>
>I do understand that there is a need for integrated utilities
>with easy-to-press buttons, and I won't try to put you off
>working on those kind of projects, but...
>
>What would an experimental biologists, who is not familiar with
>'sort', 'cut' or 'join', do with a Perl script that implemented
>those functionalities?  Wouldn't it be better to provide a
>high-level interface to common tasks, like parsing the output
>from various programs and providing simple ways of accessing
>and manipulating sequence features etc.  If you find ways to
>expand the application area of BioPerl, or if you rationalize
>and improve existing BioPerl code, then I'm sure the BioPerl
>maintainers would be happy to consider commiting your code to
>the project.
>
>
>
>Regards,
>Andreas
>
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From amackey at pcbi.upenn.edu  Thu Mar 10 09:34:47 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Thu Mar 10 09:29:59 2005
Subject: RES: [Bioperl-l] Mysql columns and Blast evalues
In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br>
References: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.br>
Message-ID: <f1cb0064b221a2e83c7fad234d40a105@pcbi.upenn.edu>

Many databases store mantissa and exponent separately, e.g. 4.5e-100 
gets stored as 4.5 (low-precision float) and -100 (signed "medium" 
integer)

That way you can continue to use native database filters:

SELECT *
FROM   hit
WHERE  hit.exponent <= -6
    OR  (hit.exponent = -5 AND hit.mantissa = 1)

This will identify all hits with E values less than or equal to 1e-5 
(if you don't care about those equal to exactly 1e-5, you can drop the 
OR clause).

This mechanism also allows you to format the mantissa for printing 
precision, without worrying about converting the entire thing to a 
less-precise double:

    my $evalue = sprintf(%0.1fe%d, $mantissa, $exponent);

-Aaron

On Mar 10, 2005, at 4:42 AM, davila wrote:

> I wonder to know what other (mysql) column types (any further details 
> would be appreciated) colleagues are using to store their Blast 
> evalues ?
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From jason.stajich at duke.edu  Thu Mar 10 09:48:02 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 10 09:42:40 2005
Subject: [Bioperl-l] PAML nssites model result object
In-Reply-To: <244d2e0e0503091537d5f283d@mail.gmail.com>
References: <244d2e0e050309142370997ce4@mail.gmail.com>
	<896034a8342912841a4a0d0a0686353e@duke.edu>
	<f18fbe9b369789b605a28574c8628c1b@duke.edu>
	<244d2e0e0503091537d5f283d@mail.gmail.com>
Message-ID: <4ad236e4a716973b61ce63f1aa251a31@duke.edu>

The script needs to be adjusted for NSsites because their are trees are  
associated with each model result so you need one more loop on the  
get_NSSite_results. I added some code to the script to print out the  
positively selected sites as well.

#!/usr/bin/perl -w
use strict;
use Bio::Tools::Phylo::PAML;

my $outcodeml = shift(@ARGV);

my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
					      -dir => "./");
my $result = $paml_parser->next_result();
my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
my @otus = $result->get_seqs;
# process the NSsites results
for my $ns_result ( $result->get_NSSite_results ) {
     print "model ", $ns_result->model_num, " ",  
$ns_result->model_description, "\n";
     while ( my $tree = $ns_result->next_tree ) {
	for my $node ( $tree->get_nodes ) {
	    my $id;
	    if( $node->is_Leaf() ) {
		$id = $node->id;
	    } else {
		$id = "(".join(",", map { $_->id } grep { $_->is_Leaf }
			       $node->get_all_Descendents) .")";
	    }
	    if( ! $node->ancestor || ! $node->has_tag('t') ) {
		# skip when no values have been associated with this node
		# (like the root node)
		next;
	    }
		printf  
"%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/ 
dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
	    $id,
	    map { ($node->get_tag_values($_))[0] }
	    qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
	}
     }
     print "positively selected sites:\n";
     #  get the positively select sites
     for my $site ( $ns_result->get_pos_selected_sites ) {
	print join(" ", @$site, "\n");
     }
     print "\n";
}


--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 9, 2005, at 6:37 PM, Edward Chuong wrote:

> Hi Jason,
>
> Thanks for the help.
>
> The code seems to get stuck at
>
>  if( ! $node->ancestor || ! $node->has_tag('t') ) {
> (this condition turns out true for every node, not just root, so it
> always hits "next")
>
> I used Data::Dumper to check on the node and I've pasted the
> results--it seems like those tags aren't being sent in?
>
>
> Thanks!
> -Ed
>
> '_root_cleanup_methods' => [
> 		  sub { "DUMMY" }
> 		],
> '_creation_id' => 0,
> '_branch_length' => '0.613722',
> '_desc' => {},
> '_id' => 'NP_033437.2_mus',
> '_ancestor' => bless( {
> 	 '_root_cleanup_methods' => [
> 					  $VAR1->{'_root_cleanup_methods'}[0]
> 					],
> 	 '_creation_id' => 3,
> 	 '_desc' => {
> 				  '2' => bless( {
> 					  '_root_cleanup_methods' => [
> 						   $VAR1->{'_root_cleanup_methods'}[0]
> 						 ],
> 					  '_creation_id' => 2,
> 					  '_branch_length' => '0.768322',
> 					  '_desc' => {},
> 					  '_id' => 'PM_BWp0001H02f',
> 					  '_ancestor' => $VAR1->{'_ancestor'},
> 					  '_root_verbose' => 0
> 					}, 'Bio::Tree::Node' ),
> 				  '0' => $VAR1,
> 				  '1' => bless( {
> 					  '_root_cleanup_methods' => [
> 												   $VAR1->{'_root_cleanup_methods'}[0]
> 												 ],
> 					  '_creation_id' => 1,
> 					  '_branch_length' => '0.366319',
> 					  '_desc' => {},
> 					  '_id' => 'NP_742070.1_rat',
> 					  '_ancestor' => $VAR1->{'_ancestor'},
> 					  '_root_verbose' => 0
> 					}, 'Bio::Tree::Node' )
> 				},
> 	 '_id' => '',
> 	 '_height' => undef,
> 	 '_root_verbose' => 0
>    }, 'Bio::Tree::Node' ),
> '_root_verbose' => 0
> }, 'Bio::Tree::Node' );
>
>
> On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich  
> <jason.stajich@duke.edu> wrote:
>> Resend with code pasted....
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::Tools::Phylo::PAML;
>>
>> my $outcodeml = shift(@ARGV);
>>
>> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
>>                                               -dir => "./");
>> my $result = $paml_parser->next_result();
>> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
>> my @otus = $result->get_seqs;
>> if( $#{$MLmatrix} < 0 ) {
>>      for my $tree ($result->next_tree ) {
>>         for my $node ( $tree->get_nodes ) {
>>             my $id;
>>             if( $node->is_Leaf() ) {
>>                 $id = $node->id;
>>             } else {
>>                 $id = "(".join(",", map { $_->id } grep { $_->is_Leaf  
>> }
>>                            $node->get_all_Descendents) .")";
>>             }
>>             if( ! $node->ancestor || ! $node->has_tag('t') ) {
>>                 # skip when no values have been associated with this  
>> node
>>                 # (like the root node)
>>                 next;
>>             }
>>              # I know this looks complicated
>>             # but we use the get_tag_values method to pull out the  
>> annotations
>>             # for each branch
>>             # The ()[0] around the call is because get_tag_values  
>> returns a
>> list
>>             # if we want to just get the 1st item in the list we have
>>             # to tell Perl we are treating it like an array.
>>             # in the future get_tag_values needs to be smart and just
>>             # return the 1st item in the array if called in scalar
>>             # context
>>
>>             printf
>> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/
>> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
>>             $id,
>>             map { ($node->get_tag_values($_))[0] }
>>             qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
>>         }
>>      }
>> } else {
>>      my $i =0;
>>      my @seqs = $result->get_seqs;
>>      for my $row ( @$MLmatrix ) {
>>         print $seqs[$i++]->display_id, join("\t",@$row), "\n";
>>      }
>> }
>>
>> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote:
>>
>>> I just updated things last week so this is brand-spanking-new.  I
>>> don't know if I connected everything up for NSsites stuff quite yet
>>> as that is handled in - the branch-specific parsing should work now.
>>> I don't know if the synopsis code is really up to snuff either.  When
>>> I get around to it I will try and see what still needs to be  
>>> connected
>>> in NSsites parsing.
>>>
>>> I don't think $node->param() is going to work -
>>> $node->get_tag_values() is the way I've implemented it.
>>>
>>> <00parse_codeml.pl>
>>>
>>> -jason
>>> --
>>> Jason Stajich
>>> jason.stajich at duke.edu
>>> http://www.duke.edu/~jes12/
>>>
>>> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm trying to parse PAML results, and running into some trouble. I'm
>>>> using branch specific omega model, and I want to get the branch
>>>> specific ka/ks values out.
>>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>>>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/
>>>> vnd.viewcvs-markup
>>>> says that $node->param('omega') should work, but Data::Dumper shows
>>>> that this value isn't stored in the node (only branch lengths and  
>>>> seq
>>>> IDs appear to be stored).
>>>>
>>>> I'm assuming that I can get these values out of the
>>>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but
>>>> I'm not sure how to call it. The current synopsis uses
>>>> "get_model_params" but it seems to be out of date because it's not  
>>>> in
>>>> the current souce. The docs at
>>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>>>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content-
>>>> type=text/vnd.viewcvs-markup
>>>> say to use my
>>>> @results = @{$self->get_NSSite_results};
>>>> --that looks like a mistake, and I've tried
>>>> @result = $result->get_NSSite_results but that doesn't work either
>>>> (just get undefined objs).
>>>>
>>>> Am I doing something wrong, or is this functionality still being
>>>> worked on? I've tried using both 1.4 and the LIVE versions. Any help
>>>> is appreciated, thanks!
>>>>
>>>> -Ed
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>
>>
>
>
> -- 
> Edward Chuong
> (949) 939-2732
> AIM: edawad85
>

From akarger at CGR.Harvard.edu  Thu Mar 10 10:08:27 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu Mar 10 10:02:08 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to h
	elp biologists d o simple formatting and analysis
Message-ID: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu>

[snipped throughout for "brevity"]

> From: Andreas Kahari [mailto:ak@ebi.ac.uk] 
> 
> I'm not quite sure what this has to do with bioperl...

1. From http://www.bioperl.org: "The Bioperl server provides an online
resource for modules, scripts, and web links for developers of Perl-based
software for life science research." I assumed bioperl-l was for disucssions
of doing Bio with Perl.  

2. I asked in my original mail: "Are there any other lists I should post
these questions to?" but no one has suggested any lists or newsgroups yet.

3. My original mail also said, "take advantage of existing tools' APIs: perl
-MBio::Perl -e '...'"  

> On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote:
> 
> > >Amir Karger wrote:
> > >> I was thinking it would be useful to have a 
> > >> toolkit of outrageously simple
> > >> Perl one-liners.  Here's one:
> 
> http://www.oreilly.com/catalog/cookbook/

How many biologists who don't use Perl will read the Perl cookbook? Or were
you just making a suggestion of where I could take scripts from?

Actually, looking through the table of contents, I see only a few recipes
that would fit.  In any case, writing the scripts is not the hard part; it's
knowing which scripts will be useful and helping biologists find the right
ones to solve their particular problems.

> > I know that many of the tasks proposed for the Scriptome 
> > can be done with
> > grep, sed, cut, Word, or Excel.  But how many experimental 
> > biologists are familiar
> > with Unix cut? I think not many, because they have other 
> things to worry about.
> 
> Hmmm, comparing 'cut' and 'sed' with Word and Excel?  Oh well.

I'm not comparing the quality of sed vs. Find/Replace. Most biologists (at
least here) prefer Windows. They already use Excel to look at their data.
Excel has functions to do simple data analysis, but my impression is that
few biologists use those functions.

> The philosophy of Unix utilities is to do only one thing,
> but to do it very well.  In the case with the 'sort' utility
> for example, it will most likely use an out-of-core sorting
> algorithm to cope with files larger than the available memory
> of the machine, and will probably be a fair bit quicker and
> flexible than your own implementation.

The Scriptome is not aiming at sorting gigabyte files; does a biologist want
to sort an entire Genbank file? I think much more often they'll want to sort
< 10 MB lists of genes or whatever.  On small files, the sorting algorithm
doesn't matter. If they do try to sort too big a file, the script will
break, and they'll need to try a different tool. I'm not claiming that my
solution will solve every conceivable task, just the easy ones. 

> I do understand that there is a need for integrated utilities
> with easy-to-press buttons, and I won't try to put you off
> working on those kind of projects, but...
> 
> What would an experimental biologists, who is not familiar with
> 'sort', 'cut' or 'join', do with a Perl script that implemented
> those functionalities?

sort, cut, or join files! I don't think I understand your question.
An experimental biologist who knows just a little Unix can take a sorting
script, paste it to the command line, and use it.  We're talking about use
cases where the biologist knows exactly what they want to do - sort a file,
merge files together, pull out the 8th column from the data into a new file,
etc. - but not how to implement a solution.

Who knows? Maybe eventually we'll decide to put "sort -u file1 file2" as a
"script". But we wouldn't want to use *only* Unix commands because that
ignores all the stuff Unix can't (easily) do.  

>  Wouldn't it be better to provide a
> high-level interface to common tasks, like parsing the output
> from various programs and providing simple ways of accessing
> and manipulating sequence features etc.

That's exactly what I want to do. My interface is searching for a tool on a
website and pasting it onto the Unix command line.  

>  If you find ways to
> expand the application area of BioPerl, or if you rationalize
> and improve existing BioPerl code, then I'm sure the BioPerl
> maintainers would be happy to consider commiting your code to
> the project.

I believe my project is complementary to Bioperl's bioscripts, but it aims
at a different set of tasks, namely, tasks that are so simple that
Bioperlers haven't bothered to commit the scripts to CVS. If I want to count
how many microarray hits have names and how many just have CG numbers, I'll
do it in a Perl one-liner that takes 3 minutes to write and maybe 10 for
debugging and formatting. Why bother committing that to CVS? Well, an
experimental biologist in my group gave me that exact example, and told me
she spent 20 minutes counting and double-checking. If she had had 1000 hits
instead of 100, she would have needed hours to count.  More likely, she
would have just given up.

To put it another way, I'm aiming to make hard things possible -
specifically things that are hard for biologists who aren't programmers.
Bioperl, on the other hand, is focusing on things that are hard (or hard to
do right, or at least annoying) even for programmers.

I am making at least a couple assumptions about the niche I'm aiming for:
people who know how to use the command line but don't know Perl.
1. There are many such people (or enough to care about)
2. They will be able to put the "atomic" scripts together to solve real
problems (first join two files with a script, sort with another script,
remove duplicates with a third)

I may be wrong about either of these.  It may be that even with the
Scriptome tools, you have to "think like a programmer" to do these sorts of
tasks, and that many biologists' brains just don't work that way. But I
think it's worth trying.

-Amir
From akarger at CGR.Harvard.edu  Thu Mar 10 10:14:01 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu Mar 10 10:08:06 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to h
	elp biologists d o simple formatting and analysis
Message-ID: <339D68B133EAD311971E009027DC4797022FD7DB@montecarlo.cgr.harvard.edu>

Stefan Kirov wrote [responding to Andreas]:
> 
> Allow me disagree. My understanding is this project is more 
> about making biologist "computational hungry" rather than 
> creating effective applications from a computation point of view. 
> So I think it is more of 
> an outreach project (did I get this right Amir).  

Well, not quite. I absolutely want these tools to be useful, but I don't
expect them to solve all problems. On the other hand, there is definitely
the potential that the scripts will:

1. Provide very useful examples that biologists can "tweak". This is much
easier than writing programs from scratch, so it can provide a much less
threatening intro to programming.

2. Demonstrate to biologists that Perl is extremely useful.

> Bioperl. The next 
> logical thing for any biologist who is starting to use the 
> computer as 
> something more than a typewriter is to use something like Bioperl, 
> because it is quite easy to understand and use (in many cases anyway).

Absolutely.  Don't get me wrong: if we could convince everyone on Earth to
learn Perl, I'd be thrilled. And for every experimental biologist to learn
Bioperl. I'm a big fan.  It's just that a lot of people don't have the time
or will to learn programming.

-Amir
From mbasu at mail.nih.gov  Thu Mar 10 10:41:16 2005
From: mbasu at mail.nih.gov (Malay)
Date: Thu Mar 10 10:35:31 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to h
	elp biologists d o simple formatting and analysis
In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu>
Message-ID: <42306A9C.5030908@mail.nih.gov>

Hello Amir:

Without going into any arguments, I'll put my two cents into it. The 
mentality to help out biologists is a fundamental mistake. Most of the 
biologists who come into this field already knows the tricks of the 
game, if not they hire someone who knows. But toolmakers in the fields 
believe they have to help biologists, that's why there are too many 
non-specialized tools in the field.

Toolmakers should now concentrate on tools for specialists. There are 
where the main dearth is and it requires a great effort to actually 
satisfy experts in the field. Create tools for the experts if you can.

-Malay


Amir Karger wrote:
> [snipped throughout for "brevity"]
> 
> 
>>From: Andreas Kahari [mailto:ak@ebi.ac.uk] 
>>
>>I'm not quite sure what this has to do with bioperl...
> 
> 
> 1. From http://www.bioperl.org: "The Bioperl server provides an online
> resource for modules, scripts, and web links for developers of Perl-based
> software for life science research." I assumed bioperl-l was for disucssions
> of doing Bio with Perl.  
> 
> 2. I asked in my original mail: "Are there any other lists I should post
> these questions to?" but no one has suggested any lists or newsgroups yet.
> 
> 3. My original mail also said, "take advantage of existing tools' APIs: perl
> -MBio::Perl -e '...'"  
> 
> 
>>On Wed, Mar 09, 2005 at 01:46:17PM -0500, Amir Karger wrote:
>>
>>
>>>>Amir Karger wrote:
>>>>
>>>>>I was thinking it would be useful to have a 
>>>>>toolkit of outrageously simple
>>>>>Perl one-liners.  Here's one:
>>
>>http://www.oreilly.com/catalog/cookbook/
> 
> 
> How many biologists who don't use Perl will read the Perl cookbook? Or were
> you just making a suggestion of where I could take scripts from?
> 
> Actually, looking through the table of contents, I see only a few recipes
> that would fit.  In any case, writing the scripts is not the hard part; it's
> knowing which scripts will be useful and helping biologists find the right
> ones to solve their particular problems.
> 
> 
>>>I know that many of the tasks proposed for the Scriptome 
>>>can be done with
>>>grep, sed, cut, Word, or Excel.  But how many experimental 
>>>biologists are familiar
>>>with Unix cut? I think not many, because they have other 
>>
>>things to worry about.
>>
>>Hmmm, comparing 'cut' and 'sed' with Word and Excel?  Oh well.
> 
> 
> I'm not comparing the quality of sed vs. Find/Replace. Most biologists (at
> least here) prefer Windows. They already use Excel to look at their data.
> Excel has functions to do simple data analysis, but my impression is that
> few biologists use those functions.
> 
> 
>>The philosophy of Unix utilities is to do only one thing,
>>but to do it very well.  In the case with the 'sort' utility
>>for example, it will most likely use an out-of-core sorting
>>algorithm to cope with files larger than the available memory
>>of the machine, and will probably be a fair bit quicker and
>>flexible than your own implementation.
> 
> 
> The Scriptome is not aiming at sorting gigabyte files; does a biologist want
> to sort an entire Genbank file? I think much more often they'll want to sort
> < 10 MB lists of genes or whatever.  On small files, the sorting algorithm
> doesn't matter. If they do try to sort too big a file, the script will
> break, and they'll need to try a different tool. I'm not claiming that my
> solution will solve every conceivable task, just the easy ones. 
> 
> 
>>I do understand that there is a need for integrated utilities
>>with easy-to-press buttons, and I won't try to put you off
>>working on those kind of projects, but...
>>
>>What would an experimental biologists, who is not familiar with
>>'sort', 'cut' or 'join', do with a Perl script that implemented
>>those functionalities?
> 
> 
> sort, cut, or join files! I don't think I understand your question.
> An experimental biologist who knows just a little Unix can take a sorting
> script, paste it to the command line, and use it.  We're talking about use
> cases where the biologist knows exactly what they want to do - sort a file,
> merge files together, pull out the 8th column from the data into a new file,
> etc. - but not how to implement a solution.
> 
> Who knows? Maybe eventually we'll decide to put "sort -u file1 file2" as a
> "script". But we wouldn't want to use *only* Unix commands because that
> ignores all the stuff Unix can't (easily) do.  
> 
> 
>> Wouldn't it be better to provide a
>>high-level interface to common tasks, like parsing the output
>>from various programs and providing simple ways of accessing
>>and manipulating sequence features etc.
> 
> 
> That's exactly what I want to do. My interface is searching for a tool on a
> website and pasting it onto the Unix command line.  
> 
> 
>> If you find ways to
>>expand the application area of BioPerl, or if you rationalize
>>and improve existing BioPerl code, then I'm sure the BioPerl
>>maintainers would be happy to consider commiting your code to
>>the project.
> 
> 
> I believe my project is complementary to Bioperl's bioscripts, but it aims
> at a different set of tasks, namely, tasks that are so simple that
> Bioperlers haven't bothered to commit the scripts to CVS. If I want to count
> how many microarray hits have names and how many just have CG numbers, I'll
> do it in a Perl one-liner that takes 3 minutes to write and maybe 10 for
> debugging and formatting. Why bother committing that to CVS? Well, an
> experimental biologist in my group gave me that exact example, and told me
> she spent 20 minutes counting and double-checking. If she had had 1000 hits
> instead of 100, she would have needed hours to count.  More likely, she
> would have just given up.
> 
> To put it another way, I'm aiming to make hard things possible -
> specifically things that are hard for biologists who aren't programmers.
> Bioperl, on the other hand, is focusing on things that are hard (or hard to
> do right, or at least annoying) even for programmers.
> 
> I am making at least a couple assumptions about the niche I'm aiming for:
> people who know how to use the command line but don't know Perl.
> 1. There are many such people (or enough to care about)
> 2. They will be able to put the "atomic" scripts together to solve real
> problems (first join two files with a script, sort with another script,
> remove duplicates with a third)
> 
> I may be wrong about either of these.  It may be that even with the
> Scriptome tools, you have to "think like a programmer" to do these sorts of
> tasks, and that many biologists' brains just don't work that way. But I
> think it's worth trying.
> 
> -Amir
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From akarger at CGR.Harvard.edu  Thu Mar 10 10:58:43 2005
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Thu Mar 10 10:51:58 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to h
	elp biologists d o simple formatting and analysis
Message-ID: <339D68B133EAD311971E009027DC4797022FD7E1@montecarlo.cgr.harvard.edu>

Malay wrote:
> Most of the 
> biologists who come into this field already knows the tricks of the 
> game, 

Really? I haven't been in the field for very long, but I've already met
(smart, talented) people who are counting lines in Excel by hand, or merge
files and eliminate duplicates by hand. The worst part is, I suspect there
are things they give up doing just because they think it will be too much
work.

(By the way, I'm talking about *experimental* biologists who need to do work
on computers, not about computational biologists.)

> if not they hire someone who knows. 

Yes, that's me. But I would much prefer to collaborate with someone on an
interesting research project that pushes the boundaries of using
computational biology to help experimental biology rather than write a Perl
one-liner to merge two gene lists for the 800th time. We have a lot of
clients, and if we just give them the solution, then we'll need to do it
again the next time they have a marginally different problem.  (In some
ways, you can think of the Scriptome as just a FAQ.)

-Amir 
From hancy at gene.ucl.ac.be  Thu Mar 10 11:58:19 2005
From: hancy at gene.ucl.ac.be (hancy)
Date: Thu Mar 10 11:53:21 2005
Subject: RES: [Bioperl-l] Mysql columns and Blast evalues
Message-ID: <42307CAB.6080909@gene.ucl.ac.be>

I'm using DOUBLE UNSIGNED to store blast evalues in MySQL, which allows 
you to store 0 and positive values between ~2.2e-308 to 1.8e308.
Anything with an evalue between 0 and e-308 will be set to 0 
automatically (it shouldn't be a great loss of information anyway).

Hope this can help,
fred.
From chad at dieselwurks.com  Thu Mar 10 12:21:01 2005
From: chad at dieselwurks.com (Chad Matsalla)
Date: Thu Mar 10 12:34:17 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
	<3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu>
Message-ID: <Pine.LNX.4.50.0503101105050.19841-100000@sausage.usask.ca>


On Wed, 9 Mar 2005, Aaron J. Mackey wrote:
> > chr1 aafcest     HSP   200   275   .     -     .     Target
> > "Sequence:chad1" 200 275
> > chr1 aafcest     HSP   300   450   .     -     .     Target
> > "Sequence:chad1" 300 450
> > chr1 aafcest     match 200   450   .     -     .     Target
> > "Sequence:chad1" 200 450
>
>
> These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2" or
> some such.  This also means that if you're saving the ESTs in the
> database (for sequence alignment display), you'll have to save them
> redundantly under chad1-1, chad1-2, etc.

This is horrible. I want to fix this.

> Now, you could write a custom aggregator that de-aggregated multiple
> chad1 "match" features, assigning the contained HSPs to each, but there
> is no such "default" behavior.  Let me know if there's general interest
> for this ...

I think there is, and I volunteer to write it. I'm new to the Bio::DB
subsystem but I'm eager to dive in. Can you help me by providing a
general flowchart on what you'd do to create this? What should the
Aggregator be called? Hmm. Bio::DB::GFF::Aggregator::manymatch ?

Chad Matsalla

From palmeida at igc.gulbenkian.pt  Thu Mar 10 12:45:47 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Thu Mar 10 12:40:21 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to h 
	elp biologists d o simple formatting and analysis
In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7E1@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC4797022FD7E1@montecarlo.cgr.harvard.edu>
Message-ID: <28072.192.168.50.3.1110476747.squirrel@webmail.igc.gulbenkian.pt>

> I've already met (smart, talented) people who are counting lines in Excel

You mean there are other ways of doing that?! Good God! All these years
that I've been counting 1000s of lines by hand...

>> if not they hire someone who knows.
>
> We have a lot of clients, and if we just give them the solution, then
> we'll need to do it again the next time they have a marginally different
> problem.

Profit!

Seriously though, I work in a lab where everyone else is an
experimentalist biologist and I think there may be demand for a project
like yours. The only doubt I have is whether those people who count by
hand, or give up, will make the effort to find the Scriptome project. I
suppose that is a marketing issue, and the debate on demand is mostly
based on different personal experiences. If you think it's worth it (if
only to help the people you work with), and go ahead with it, I can help
you with some simple scripts I have (to split files, get a certain column
in a csv file, etc).

Paulo

From letondal at pasteur.fr  Thu Mar 10 12:54:13 2005
From: letondal at pasteur.fr (Catherine Letondal)
Date: Thu Mar 10 12:44:39 2005
Subject: [Bioperl-l] Request for advice and pointers on a project to help
	biologists do simple formatting and analysis
In-Reply-To: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu>
References: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu>
Message-ID: <c4f3367bdb25d2bc434ca2970a5ccac9@pasteur.fr>

Hi,

On Mar 10, 2005, at 4:08 PM, Amir Karger wrote:

> 2. I asked in my original mail: "Are there any other lists I should 
> post
> these questions to?" but no one has suggested any lists or newsgroups 
> yet

You can discuss your execellent idea here: edu-sig@python.org
(it's not really a python discussion list, they discuss general 
end-user issues)

--
Catherine Letondal -- Institut Pasteur

From cjfields at uiuc.edu  Thu Mar 10 14:07:21 2005
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu Mar 10 14:02:10 2005
Subject: [Bioperl-l] Re: Request for advice and pointers on a project to h
 elp biologists d o simple formatting and analysis
In-Reply-To: <42306A9C.5030908@mail.nih.gov>
References: <339D68B133EAD311971E009027DC4797022FD7DA@montecarlo.cgr.harvard.edu>
	<42306A9C.5030908@mail.nih.gov>
Message-ID: <6.1.1.1.2.20050310100611.01b44938@express.cites.uiuc.edu>

I completely disagree.  I am a biologist first and a programmer second 
(shock!!!), though I believe that many full-time bioinformaticians would 
agree with my view.  I could spend an enormous amount of time trying to 
accomplish a routine repetitive task in Perl, Java, or whatever your 
language of choice is.  However, the Bioperl (and OpenBio) community has 
made my job much easier.  ANY contribution, whether it is a module, 
package, or a script, is helpful, as long as someone can use 
it.  Furthermore, To make the snap judgement that every biologist entering 
the field already comes equipped with the tools is a bit short-sighted and 
naive.  I do agree that there are many "non-specific" tools (i.e. multiple 
methods for phylogenetic analysis, multiple alignment, etc), but I think 
that any person worth their salt would find that to be a benefit and not a 
problem.  I personally like having multiple methods available.

I could also make the argument that the "experts" in the field, if they 
live up to that title, can actually design the tools for their specific 
(specialist) needs.  Who better knows their specific needs anyway.  In 
other words, why hire a carpenter to do the plumbing?  It doesn't make much 
sense to have somebody with little to no knowledge on RNA structure, for 
example, to design an algorithm ad hoc for another RNA structure expert.

Anyway, I think we're getting a bit off topic here...

My two cents,

Chris

At 09:41 AM 3/10/2005, Malay wrote:
>Hello Amir:
>
>Without going into any arguments, I'll put my two cents into it. The 
>mentality to help out biologists is a fundamental mistake. Most of the 
>biologists who come into this field already knows the tricks of the game, 
>if not they hire someone who knows. But toolmakers in the fields believe 
>they have to help biologists, that's why there are too many 
>non-specialized tools in the field.
>
>Toolmakers should now concentrate on tools for specialists. There are 
>where the main dearth is and it requires a great effort to actually 
>satisfy experts in the field. Create tools for the experts if you can.
>
>-Malay

__________________________________

Chris Fields - Postdoctoral Researcher
Lab of Dr. Robert Switzer

Address:

University of Illinois at Urbana-Champaign
Dept. of Biochemistry - 323 RAL
600 S. Mathews Ave.
Urbana, IL 61801

Phone : (217) 333-7098
Fax : (217) 244-5858 

From jswanson at iastate.edu  Thu Mar 10 14:03:39 2005
From: jswanson at iastate.edu (Jordan Swanson)
Date: Thu Mar 10 14:04:12 2005
Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file
In-Reply-To: <20050310134814.GE27364@iib.unsam.edu.ar>
References: <200502141205.52256.jswanson@iastate.edu>
	<4f12c65ac919697fd8a7e9220db182fd@tigem.it>
	<20050310134814.GE27364@iib.unsam.edu.ar>
Message-ID: <200503101303.40339.jswanson@iastate.edu>

On Thursday 10 March 2005 07:48 am, Fernan Aguero wrote:
> +----[ Elia Stupka <elia@tigem.it> (01.Mar.2005 10:17):
> | Hi Jordan,
> |
> | I have been doing some work on Contig::Assembly myself recently, and
> | have also been in touch with the author (Robson) about it. Perhaps the
> | best thing would be for the three of us to have a chat about this
> | object, try to revamp it a little with our improvements, and then
> | Robson or I can check it in?
> |
> | regards,
> |
> | Elia
>
> +----]
>
> Hi!
>
> We have just got a need to produce .ace files and noticed
> that this functionality was lacking.
>
> I also saw the recent thread about this topic on the list.
> Question: has this moved forward since the last message was
> posted (March 1st)?
>
> If so, are the  proposed changes in a form that can be
> applied and tested by others (a recursive diff, perhaps
> against a recent CVS checkout or against the 1.5-release)

I sent a copy of the code I have written to Elia and Robson.  Robson mentioned 
that he was extremely busy this month, and that he would be willing to 
discuss it at a later time.  Later today, I can send you a zipped up file (my 
diff-skills are non-existant, so I can't do it that way without poring 
through the manual) of the code we have been using, which is working well for 
the features that our lab uses.  Of course, I would appreciate any feedback 
that you could offer, as well.

---  
Jordan M Swanson   
Department of Ecology, Evolution, and Organismal Biology 
431 Bessey Hall 
Iowa State University 
Ames, IA 50011 
Lab 515 294-7098 
FAX: 515-294-1337 
From echuong at gmail.com  Thu Mar 10 15:47:17 2005
From: echuong at gmail.com (Edward Chuong)
Date: Thu Mar 10 15:42:05 2005
Subject: [Bioperl-l] PAML nssites model result object
In-Reply-To: <4ad236e4a716973b61ce63f1aa251a31@duke.edu>
References: <244d2e0e050309142370997ce4@mail.gmail.com>
	<896034a8342912841a4a0d0a0686353e@duke.edu>
	<f18fbe9b369789b605a28574c8628c1b@duke.edu>
	<244d2e0e0503091537d5f283d@mail.gmail.com>
	<4ad236e4a716973b61ce63f1aa251a31@duke.edu>
Message-ID: <244d2e0e050310124735d62b56@mail.gmail.com>

Hi,

I think the problem is that the $result object, according to Dumper,
doesn't store any ModelResult (NSSite_results), so the for loop
condition in this code ($result->get_NSSite_results) is never true. Is
this working on a mlc file that you have, and if so, can you send it
so I can see if it's a problem on my side?

Thanks
-Ed


On Thu, 10 Mar 2005 09:48:02 -0500, Jason Stajich
<jason.stajich@duke.edu> wrote:
> The script needs to be adjusted for NSsites because their are trees are
> associated with each model result so you need one more loop on the
> get_NSSite_results. I added some code to the script to print out the
> positively selected sites as well.
> 
> #!/usr/bin/perl -w
> use strict;
> use Bio::Tools::Phylo::PAML;
> 
> my $outcodeml = shift(@ARGV);
> 
> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
>                                               -dir => "./");
> my $result = $paml_parser->next_result();
> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
> my @otus = $result->get_seqs;
> # process the NSsites results
> for my $ns_result ( $result->get_NSSite_results ) {
>      print "model ", $ns_result->model_num, " ",
> $ns_result->model_description, "\n";
>      while ( my $tree = $ns_result->next_tree ) {
>         for my $node ( $tree->get_nodes ) {
>             my $id;
>             if( $node->is_Leaf() ) {
>                 $id = $node->id;
>             } else {
>                 $id = "(".join(",", map { $_->id } grep { $_->is_Leaf }
>                                $node->get_all_Descendents) .")";
>             }
>             if( ! $node->ancestor || ! $node->has_tag('t') ) {
>                 # skip when no values have been associated with this node
>                 # (like the root node)
>                 next;
>             }
>                 printf
> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/
> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
>             $id,
>             map { ($node->get_tag_values($_))[0] }
>             qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
>         }
>      }
>      print "positively selected sites:\n";
>      #  get the positively select sites
>      for my $site ( $ns_result->get_pos_selected_sites ) {
>         print join(" ", @$site, "\n");
>      }
>      print "\n";
> }
> 
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> On Mar 9, 2005, at 6:37 PM, Edward Chuong wrote:
> 
> > Hi Jason,
> >
> > Thanks for the help.
> >
> > The code seems to get stuck at
> >
> >  if( ! $node->ancestor || ! $node->has_tag('t') ) {
> > (this condition turns out true for every node, not just root, so it
> > always hits "next")
> >
> > I used Data::Dumper to check on the node and I've pasted the
> > results--it seems like those tags aren't being sent in?
> >
> >
> > Thanks!
> > -Ed
> >
> > '_root_cleanup_methods' => [
> >                 sub { "DUMMY" }
> >               ],
> > '_creation_id' => 0,
> > '_branch_length' => '0.613722',
> > '_desc' => {},
> > '_id' => 'NP_033437.2_mus',
> > '_ancestor' => bless( {
> >        '_root_cleanup_methods' => [
> >                                         $VAR1->{'_root_cleanup_methods'}[0]
> >                                       ],
> >        '_creation_id' => 3,
> >        '_desc' => {
> >                                 '2' => bless( {
> >                                         '_root_cleanup_methods' => [
> >                                                  $VAR1->{'_root_cleanup_methods'}[0]
> >                                                ],
> >                                         '_creation_id' => 2,
> >                                         '_branch_length' => '0.768322',
> >                                         '_desc' => {},
> >                                         '_id' => 'PM_BWp0001H02f',
> >                                         '_ancestor' => $VAR1->{'_ancestor'},
> >                                         '_root_verbose' => 0
> >                                       }, 'Bio::Tree::Node' ),
> >                                 '0' => $VAR1,
> >                                 '1' => bless( {
> >                                         '_root_cleanup_methods' => [
> >                                                                                                  $VAR1->{'_root_cleanup_methods'}[0]
> >                                                                                                ],
> >                                         '_creation_id' => 1,
> >                                         '_branch_length' => '0.366319',
> >                                         '_desc' => {},
> >                                         '_id' => 'NP_742070.1_rat',
> >                                         '_ancestor' => $VAR1->{'_ancestor'},
> >                                         '_root_verbose' => 0
> >                                       }, 'Bio::Tree::Node' )
> >                               },
> >        '_id' => '',
> >        '_height' => undef,
> >        '_root_verbose' => 0
> >    }, 'Bio::Tree::Node' ),
> > '_root_verbose' => 0
> > }, 'Bio::Tree::Node' );
> >
> >
> > On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich
> > <jason.stajich@duke.edu> wrote:
> >> Resend with code pasted....
> >>
> >> #!/usr/bin/perl -w
> >> use strict;
> >> use Bio::Tools::Phylo::PAML;
> >>
> >> my $outcodeml = shift(@ARGV);
> >>
> >> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
> >>                                               -dir => "./");
> >> my $result = $paml_parser->next_result();
> >> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
> >> my @otus = $result->get_seqs;
> >> if( $#{$MLmatrix} < 0 ) {
> >>      for my $tree ($result->next_tree ) {
> >>         for my $node ( $tree->get_nodes ) {
> >>             my $id;
> >>             if( $node->is_Leaf() ) {
> >>                 $id = $node->id;
> >>             } else {
> >>                 $id = "(".join(",", map { $_->id } grep { $_->is_Leaf
> >> }
> >>                            $node->get_all_Descendents) .")";
> >>             }
> >>             if( ! $node->ancestor || ! $node->has_tag('t') ) {
> >>                 # skip when no values have been associated with this
> >> node
> >>                 # (like the root node)
> >>                 next;
> >>             }
> >>              # I know this looks complicated
> >>             # but we use the get_tag_values method to pull out the
> >> annotations
> >>             # for each branch
> >>             # The ()[0] around the call is because get_tag_values
> >> returns a
> >> list
> >>             # if we want to just get the 1st item in the list we have
> >>             # to tell Perl we are treating it like an array.
> >>             # in the future get_tag_values needs to be smart and just
> >>             # return the 1st item in the array if called in scalar
> >>             # context
> >>
> >>             printf
> >> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/
> >> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
> >>             $id,
> >>             map { ($node->get_tag_values($_))[0] }
> >>             qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
> >>         }
> >>      }
> >> } else {
> >>      my $i =0;
> >>      my @seqs = $result->get_seqs;
> >>      for my $row ( @$MLmatrix ) {
> >>         print $seqs[$i++]->display_id, join("\t",@$row), "\n";
> >>      }
> >> }
> >>
> >> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote:
> >>
> >>> I just updated things last week so this is brand-spanking-new.  I
> >>> don't know if I connected everything up for NSsites stuff quite yet
> >>> as that is handled in - the branch-specific parsing should work now.
> >>> I don't know if the synopsis code is really up to snuff either.  When
> >>> I get around to it I will try and see what still needs to be
> >>> connected
> >>> in NSsites parsing.
> >>>
> >>> I don't think $node->param() is going to work -
> >>> $node->get_tag_values() is the way I've implemented it.
> >>>
> >>> <00parse_codeml.pl>
> >>>
> >>> -jason
> >>> --
> >>> Jason Stajich
> >>> jason.stajich at duke.edu
> >>> http://www.duke.edu/~jes12/
> >>>
> >>> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I'm trying to parse PAML results, and running into some trouble. I'm
> >>>> using branch specific omega model, and I want to get the branch
> >>>> specific ka/ks values out.
> >>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> >>>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/
> >>>> vnd.viewcvs-markup
> >>>> says that $node->param('omega') should work, but Data::Dumper shows
> >>>> that this value isn't stored in the node (only branch lengths and
> >>>> seq
> >>>> IDs appear to be stored).
> >>>>
> >>>> I'm assuming that I can get these values out of the
> >>>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object, but
> >>>> I'm not sure how to call it. The current synopsis uses
> >>>> "get_model_params" but it seems to be out of date because it's not
> >>>> in
> >>>> the current souce. The docs at
> >>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
> >>>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content-
> >>>> type=text/vnd.viewcvs-markup
> >>>> say to use my
> >>>> @results = @{$self->get_NSSite_results};
> >>>> --that looks like a mistake, and I've tried
> >>>> @result = $result->get_NSSite_results but that doesn't work either
> >>>> (just get undefined objs).
> >>>>
> >>>> Am I doing something wrong, or is this functionality still being
> >>>> worked on? I've tried using both 1.4 and the LIVE versions. Any help
> >>>> is appreciated, thanks!
> >>>>
> >>>> -Ed
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l@portal.open-bio.org
> >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>
> >>
> >
> >
> > --
> > Edward Chuong
> > (949) 939-2732
> > AIM: edawad85
> >
> 
> 


-- 
Edward Chuong
(949) 939-2732
AIM: edawad85
From davila at ioc.fiocruz.br  Thu Mar 10 16:36:15 2005
From: davila at ioc.fiocruz.br (Alberto Davila)
Date: Thu Mar 10 16:21:53 2005
Subject: RES: [Bioperl-l] Mysql columns and Blast evalues
In-Reply-To: <42307CAB.6080909@gene.ucl.ac.be>
References: <42307CAB.6080909@gene.ucl.ac.be>
Message-ID: <1110490575.7016.25.camel@kineto>

Thanks to all of you guys.

Double unsigned worked very well with me ;-)

Cheers, Alberto

On Thu, 2005-03-10 at 17:58 +0100, hancy wrote:
> I'm using DOUBLE UNSIGNED to store blast evalues in MySQL, which allows 
> you to store 0 and positive values between ~2.2e-308 to 1.8e308.
> Anything with an evalue between 0 and e-308 will be set to 0 
> automatically (it shouldn't be a great loss of information anyway).
> 
> Hope this can help,
> fred.

From Peter.Robinson at t-online.de  Thu Mar 10 16:28:23 2005
From: Peter.Robinson at t-online.de (Peter.Robinson@t-online.de)
Date: Thu Mar 10 16:21:58 2005
Subject: [Bioperl-l] Entrez Gene ASN
In-Reply-To: <42305650.30403@utk.edu>
References: <42305650.30403@utk.edu>
Message-ID: <20050310212823.GB5392@anna>

On Thu, Mar 10, 2005 at 09:14:40AM -0500, Stefan Kirov wrote:
> Hi guys!
> I have done some (mostly) serious thinking about ASN Entrez Gene parsing 
> and I propose we do my favorite thing- postpone everything we cannot 
> deal with right now. If you want it to sound better: take a gradual 
> approach where we store the data we can deal with in the existing 
> Bioperl objects and skipping the rest for now.
> In details:
> ASN gene record can be correctly represented as a tree. I have written a 
> simple parser for my own purposes which is storing the following:
> node_id---|
>                  --parent
>                  --level
>                  --tag
>                  --values
> What I do then is get specific levels and tags and build different 
> objects. So level 2 with parent EntrezGene (which is the root level and 
> has no information) is gene description and has tags such as gene, name, 
> etc; at level 3, 5 and 6 you can get the complete specie definition by 
> looking for orgname and org as tags and records with parent mod (which 
> is a value for orgname, descend down the branch).
> I am using this approach to store most of the data in a relational 
> database without going through Bioperl. What I ultimately want to do is 
> use standard Bioperl modules. However, I don't think we have an object 
> that can efficiently represent the structure (correct me if I am wrong). 
> I think it may be a good idea to have a container object, possibly 
> Bio::Gene that may contain multiple Bio::Seq objects (with or without 
> real sequence). I believe we can borrow some structure and code from 
> EnsEMBL gene representation (way to contain multiple transcripts, etc., 
> not the database interactions certainly).
> Please let me know what you think.
> Stefan


Hi Stefan,

from the work I have done on this issue it would seem that your suggestion is quite promising. Let me know if you need some help on this. How is the performance that you are seeing to date?

best,
Peter


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
From Ned.Young at tufts.edu  Thu Mar 10 16:22:54 2005
From: Ned.Young at tufts.edu (Ned Young)
Date: Thu Mar 10 16:29:30 2005
Subject: [Bioperl-l] naive question about Bio::Tools::Primer3
Message-ID: <90B8234D-91AA-11D9-B249-000D93ADEE80@tufts.edu>

Dear All,

I was wondering if you could help me.  I'm not very experienced using  
bioperl objects and therefore have a question.

How do I best get the individual result lines from the primer3 output  
file, using Bio::Tools::Primer3?

I'll include the script I tried and the file it parsed.
When I run it, I get "HASH(0xccfc)".

#!/usr/bin/perl -w
use lib "/Users/Ned/Documents/Perl/bioperl_source/bioperl-1.4";
use Bio::AlignIO;
use Bio::Tools::Primer3;# read a primer3 output file
my $primer3=Bio::Tools::Primer3->new(-file=>"p3test1.out");
#put the left- and right-primer stuff into hashes.
my $primer=$primer3->next_primer;
print "The right primer in the stream is ",  
$primer->get_primer('-right_primer')->seq->seq, "\n";
# to return results
print $primer3->primer_results(0,'PRIMER_LEFT_INPUT');

primer3_core output file:
PRIMER_SEQUENCE_ID=test01
SEQUENCE=ACTTGATATAGCGTAAATCGATTTGCAGAGATCAACTTGCTATAACGTAACTCGATTGCAATG 
ATGCTTAGCCATGCGTAGTCTGATCCTGATGCCGTGATGGCACTCATGGCGTACTCTATGAGAGTC
PRIMER_LEFT_INPUT=ACTTGATATAGCGTAAATCG
PRIMER_RIGHT_INPUT=GACTCTCATAGAGTACGCCA
TARGET=21,1
PRIMER_PAIR_MAX_MISPRIMING=12
PRIMER_PAIR_MAX_TEMPLATE_MISPRIMING=24
PRIMER_PRODUCT_SIZE_RANGE=70-129
PRIMER_OPT_SIZE=20
PRIMER_MIN_SIZE=15
PRIMER_MAX_SIZE=36
PRIMER_PICK_ANYWAY=1
PRIMER_FILE_FLAG=1
PRIMER_EXPLAIN_FLAG=1
PRIMER_ERROR=1
PRIMER_WARNING=Left primer is unacceptable: Tm too low/High end self  
complementarity; Right primer is unacceptable: Tm too low/High end self  
complementarity
PRIMER_PAIR_EXPLAIN=considered 1, ok 1
PRIMER_PAIR_PENALTY=17.0819
PRIMER_LEFT_PENALTY=10.012436
PRIMER_RIGHT_PENALTY=7.069468
PRIMER_LEFT_SEQUENCE=ACTTGATATAGCGTAAATCG
PRIMER_RIGHT_SEQUENCE=GACTCTCATAGAGTACGCCA
PRIMER_LEFT=0,20
PRIMER_RIGHT=128,20
PRIMER_LEFT_TM=49.988
PRIMER_RIGHT_TM=52.931
PRIMER_LEFT_GC_PERCENT=35.000
PRIMER_RIGHT_GC_PERCENT=50.000
PRIMER_LEFT_SELF_ANY=6.00
PRIMER_RIGHT_SELF_ANY=8.00
PRIMER_LEFT_SELF_END=4.00
PRIMER_RIGHT_SELF_END=4.00
PRIMER_LEFT_END_STABILITY=8.6000
PRIMER_RIGHT_END_STABILITY=11.7000
PRIMER_LEFT_TEMPLATE_MISPRIMING=14.0000
PRIMER_RIGHT_TEMPLATE_MISPRIMING=8.0000
PRIMER_PAIR_COMPL_ANY=5.00
PRIMER_PAIR_COMPL_END=3.00
PRIMER_PRODUCT_SIZE=129
PRIMER_PAIR_TEMPLATE_MISPRIMING=22.00
=

Yours truly,
Ned Young	
Department of Biomedical Sciences
Division of Infectious Diseases
Tufts University School of Veterinary Medicine
200 Westboro Rd.
N. Grafton, MA 01536
508-887-4540

From pmiguel at purdue.edu  Thu Mar 10 17:35:01 2005
From: pmiguel at purdue.edu (Phillip San Miguel)
Date: Thu Mar 10 17:30:07 2005
Subject: [Bioperl-l] Living on the edg with 1.5?
Message-ID: <4230CB95.2000701@purdue.edu>


    Because GBROWSE wants bioperl v 1.5 I'm moved to that on most of the 
platforms I use. But I've noticed two (presumably) unrelated glitches 
just in using it today. Should I really be using v 1.4? I'm not a 
developer, and don't really welcome additional headaches. Advice?

    Here are the bugs:

perl -e 'use Bio::Perl; $seq_object = get_sequence("genbank","u11059"); 
write_sequence(">test","genbank",$seq_object);'


fetches U11059 from genbank and tries to print in genbank format but 
something is wrong in v 1.5.

Running Version 1.5:

...
FEATURES             Location/Qualifiers
     source          1..7313
                     /mol_type="Bio::Annotation::SimpleValue=HASH(0x9b3870)"
                     
/tissue_type="Bio::Annotation::SimpleValue=HASH(0x9b38b8)"
                     /db_xref="Bio::Annotation::SimpleValue=HASH(0x9b3828)"
                     
/transposon="Bio::Annotation::SimpleValue=HASH(0x9b0968)"
                     /strain="Bio::Annotation::SimpleValue=HASH(0x9b3900)"
                     
/chromosome="Bio::Annotation::SimpleValue=HASH(0x9b3948)"
                     /organism="Bio::Annotation::SimpleValue=HASH(0x9b3990)"
     LTR             1..649
                     /label=Bio::Annotation::SimpleValue=HASH(0x9aefd4)
     TATA_signal     304..310
                     /label=Bio::Annotation::SimpleValue=HASH(0x9b55fc)
     misc_feature    651..659
                     /label=Bio::Annotation::SimpleValue=HASH(0x9b5764)
...

Running Version 1.4 (no problem):

...
FEATURES             Location/Qualifiers
     source          1..7313
                     /transposon="retrotransposon"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:4577"
                     /tissue_type="leaf"
                     /strain="A188"
                     /chromosome="7"
                     /organism="Zea mays"
     LTR             1..649
                     /label=upstreamLTR
     TATA_signal     304..310
                     /label=upstream
     misc_feature    651..659
                     /label=PBSsite
...

The other 1.5 bug I found today: The following one-liner demonstrates it:

perl -e 'use Bio::SeqIO;use Bio::Seq::PrimaryQual; $qual_object = 
Bio::Seq::PrimaryQual->new(-qual=> "10 20 30 40 50 40 30 20 10", -id => 
"test", -format => 'qual'); $qual_out = Bio::SeqIO->new(-file => 
">test", -format => 'qual');$qual_out->write_seq($qual_object);'

When I run it under Version 1.5 the correct output file is produced but 
I also get the following output sent to STDOUT:

'_root_verbose' => 0
'display_id' => 'test'
'qual' => ARRAY(0x5b07e8)
   0  10
   1  20
   2  30
   3  40
   4  50
   5  40
   6  30
   7  20
   8  10

Under Version 1.4 everything is fine. (No extraneous STDOUT is created.)

This looks like someone uncommented a Data::Dumper print somewhere, but 
I wasn't able to find it.

-- 
Phillip SanMiguel
Purdue Genomics Core Facility

From sanges at biogem.it  Thu Mar 10 17:49:01 2005
From: sanges at biogem.it (Remo Sanges)
Date: Thu Mar 10 17:43:42 2005
Subject: [Bioperl-l] Living on the edg with 1.5?
In-Reply-To: <4230CB95.2000701@purdue.edu>
References: <4230CB95.2000701@purdue.edu>
Message-ID: <a9d8fd5fab5d88dae6783565248b9da1@biogem.it>

On Mar 10, 2005, at 11:35 PM, Phillip San Miguel wrote:

>
>    Because GBROWSE wants bioperl v 1.5 I'm moved to that on most of 
> the platforms I use. But I've noticed two (presumably) unrelated 
> glitches just in using it today. Should I really be using v 1.4? I'm 
> not a developer, and don't really welcome additional headaches. 
> Advice?

I'm in the same situation and basically I'm using
bioperl v 1.5 with GBROWSE v 1.62 and
bioperl v 1.4 for my production programs...

Other advices and/or suggestions?

Thanks

Remo

From echuong at gmail.com  Thu Mar 10 18:23:53 2005
From: echuong at gmail.com (Edward Chuong)
Date: Thu Mar 10 18:19:58 2005
Subject: [Bioperl-l] PAML nssites model result object
In-Reply-To: <c130d2ae21cb10760c7dfa5f751a7bdb@duke.edu>
References: <244d2e0e050309142370997ce4@mail.gmail.com>
	<896034a8342912841a4a0d0a0686353e@duke.edu>
	<f18fbe9b369789b605a28574c8628c1b@duke.edu>
	<244d2e0e0503091537d5f283d@mail.gmail.com>
	<4ad236e4a716973b61ce63f1aa251a31@duke.edu>
	<244d2e0e050310124735d62b56@mail.gmail.com>
	<c130d2ae21cb10760c7dfa5f751a7bdb@duke.edu>
Message-ID: <244d2e0e05031015235d8a64e@mail.gmail.com>

Hey,

Some progress when I use your file: it now does return a PAML::Result
object, but there's an error
Can't use an undefined value as an ARRAY reference at
/Library/Perl/5.8.1/Bio/Tools/Phylo/PAML/ModelResult.pm line 308,
<GEN0> line 329.
which appears because it's trying to access positive selection array,
which doesn't exist for the nssites = 0 object-- there is no
"dnds_site_classes" which the other nssites models ( = 1 or =2 etc)
have, even though I think there should be? I can't find where these
values are stored--they don't appear to be stored on the individual
nodes as the documentation would suggest.

I've found that when I specify only one model for nssites, (nssites =
0 or 1 or 2), the get_NSSites code doesn't work, but if I specify more
than one for the PAML run it does.

I've sent the mlc files I have in a private e-mail, if you have time to check

Thanks so much!
-Ed

On Thu, 10 Mar 2005 16:38:59 -0500, Jason Stajich
<jason.stajich@duke.edu> wrote:
> I'm using t/data/codeml_nssites.mlc
> 
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> On Mar 10, 2005, at 3:47 PM, Edward Chuong wrote:
> 
> > Hi,
> >
> > I think the problem is that the $result object, according to Dumper,
> > doesn't store any ModelResult (NSSite_results), so the for loop
> > condition in this code ($result->get_NSSite_results) is never true. Is
> > this working on a mlc file that you have, and if so, can you send it
> > so I can see if it's a problem on my side?
> >
> > Thanks
> > -Ed
> >
> >
> > On Thu, 10 Mar 2005 09:48:02 -0500, Jason Stajich
> > <jason.stajich@duke.edu> wrote:
> >> The script needs to be adjusted for NSsites because their are trees
> >> are
> >> associated with each model result so you need one more loop on the
> >> get_NSSite_results. I added some code to the script to print out the
> >> positively selected sites as well.
> >>
> >> #!/usr/bin/perl -w
> >> use strict;
> >> use Bio::Tools::Phylo::PAML;
> >>
> >> my $outcodeml = shift(@ARGV);
> >>
> >> my $paml_parser = new Bio::Tools::Phylo::PAML(-file => "./$outcodeml",
> >>                                               -dir => "./");
> >> my $result = $paml_parser->next_result();
> >> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
> >> my @otus = $result->get_seqs;
> >> # process the NSsites results
> >> for my $ns_result ( $result->get_NSSite_results ) {
> >>      print "model ", $ns_result->model_num, " ",
> >> $ns_result->model_description, "\n";
> >>      while ( my $tree = $ns_result->next_tree ) {
> >>         for my $node ( $tree->get_nodes ) {
> >>             my $id;
> >>             if( $node->is_Leaf() ) {
> >>                 $id = $node->id;
> >>             } else {
> >>                 $id = "(".join(",", map { $_->id } grep { $_->is_Leaf
> >> }
> >>                                $node->get_all_Descendents) .")";
> >>             }
> >>             if( ! $node->ancestor || ! $node->has_tag('t') ) {
> >>                 # skip when no values have been associated with this
> >> node
> >>                 # (like the root node)
> >>                 next;
> >>             }
> >>                 printf
> >> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/
> >> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
> >>             $id,
> >>             map { ($node->get_tag_values($_))[0] }
> >>             qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
> >>         }
> >>      }
> >>      print "positively selected sites:\n";
> >>      #  get the positively select sites
> >>      for my $site ( $ns_result->get_pos_selected_sites ) {
> >>         print join(" ", @$site, "\n");
> >>      }
> >>      print "\n";
> >> }
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at duke.edu
> >> http://www.duke.edu/~jes12/
> >>
> >> On Mar 9, 2005, at 6:37 PM, Edward Chuong wrote:
> >>
> >>> Hi Jason,
> >>>
> >>> Thanks for the help.
> >>>
> >>> The code seems to get stuck at
> >>>
> >>>  if( ! $node->ancestor || ! $node->has_tag('t') ) {
> >>> (this condition turns out true for every node, not just root, so it
> >>> always hits "next")
> >>>
> >>> I used Data::Dumper to check on the node and I've pasted the
> >>> results--it seems like those tags aren't being sent in?
> >>>
> >>>
> >>> Thanks!
> >>> -Ed
> >>>
> >>> '_root_cleanup_methods' => [
> >>>                 sub { "DUMMY" }
> >>>               ],
> >>> '_creation_id' => 0,
> >>> '_branch_length' => '0.613722',
> >>> '_desc' => {},
> >>> '_id' => 'NP_033437.2_mus',
> >>> '_ancestor' => bless( {
> >>>        '_root_cleanup_methods' => [
> >>>
> >>> $VAR1->{'_root_cleanup_methods'}[0]
> >>>                                       ],
> >>>        '_creation_id' => 3,
> >>>        '_desc' => {
> >>>                                 '2' => bless( {
> >>>                                         '_root_cleanup_methods' => [
> >>>
> >>> $VAR1->{'_root_cleanup_methods'}[0]
> >>>                                                ],
> >>>                                         '_creation_id' => 2,
> >>>                                         '_branch_length' =>
> >>> '0.768322',
> >>>                                         '_desc' => {},
> >>>                                         '_id' => 'PM_BWp0001H02f',
> >>>                                         '_ancestor' =>
> >>> $VAR1->{'_ancestor'},
> >>>                                         '_root_verbose' => 0
> >>>                                       }, 'Bio::Tree::Node' ),
> >>>                                 '0' => $VAR1,
> >>>                                 '1' => bless( {
> >>>                                         '_root_cleanup_methods' => [
> >>>
> >>>                             $VAR1->{'_root_cleanup_methods'}[0]
> >>>
> >>>                           ],
> >>>                                         '_creation_id' => 1,
> >>>                                         '_branch_length' =>
> >>> '0.366319',
> >>>                                         '_desc' => {},
> >>>                                         '_id' => 'NP_742070.1_rat',
> >>>                                         '_ancestor' =>
> >>> $VAR1->{'_ancestor'},
> >>>                                         '_root_verbose' => 0
> >>>                                       }, 'Bio::Tree::Node' )
> >>>                               },
> >>>        '_id' => '',
> >>>        '_height' => undef,
> >>>        '_root_verbose' => 0
> >>>    }, 'Bio::Tree::Node' ),
> >>> '_root_verbose' => 0
> >>> }, 'Bio::Tree::Node' );
> >>>
> >>>
> >>> On Wed, 9 Mar 2005 18:01:34 -0500, Jason Stajich
> >>> <jason.stajich@duke.edu> wrote:
> >>>> Resend with code pasted....
> >>>>
> >>>> #!/usr/bin/perl -w
> >>>> use strict;
> >>>> use Bio::Tools::Phylo::PAML;
> >>>>
> >>>> my $outcodeml = shift(@ARGV);
> >>>>
> >>>> my $paml_parser = new Bio::Tools::Phylo::PAML(-file =>
> >>>> "./$outcodeml",
> >>>>                                               -dir => "./");
> >>>> my $result = $paml_parser->next_result();
> >>>> my $MLmatrix = $result->get_MLmatrix(); # get MaxLikelihood Matrix
> >>>> my @otus = $result->get_seqs;
> >>>> if( $#{$MLmatrix} < 0 ) {
> >>>>      for my $tree ($result->next_tree ) {
> >>>>         for my $node ( $tree->get_nodes ) {
> >>>>             my $id;
> >>>>             if( $node->is_Leaf() ) {
> >>>>                 $id = $node->id;
> >>>>             } else {
> >>>>                 $id = "(".join(",", map { $_->id } grep {
> >>>> $_->is_Leaf
> >>>> }
> >>>>                            $node->get_all_Descendents) .")";
> >>>>             }
> >>>>             if( ! $node->ancestor || ! $node->has_tag('t') ) {
> >>>>                 # skip when no values have been associated with this
> >>>> node
> >>>>                 # (like the root node)
> >>>>                 next;
> >>>>             }
> >>>>              # I know this looks complicated
> >>>>             # but we use the get_tag_values method to pull out the
> >>>> annotations
> >>>>             # for each branch
> >>>>             # The ()[0] around the call is because get_tag_values
> >>>> returns a
> >>>> list
> >>>>             # if we want to just get the 1st item in the list we
> >>>> have
> >>>>             # to tell Perl we are treating it like an array.
> >>>>             # in the future get_tag_values needs to be smart and
> >>>> just
> >>>>             # return the 1st item in the array if called in scalar
> >>>>             # context
> >>>>
> >>>>             printf
> >>>> "%s\tt=%.3f\tS=%.1f\tN=%.1f\tdN/
> >>>> dS=%.4f\tdN=%.4f\tdS=%.4f\tS*dS=%.1f\tN*dN=%.1f\n",
> >>>>             $id,
> >>>>             map { ($node->get_tag_values($_))[0] }
> >>>>             qw(t S N dN/dS dN dS), 'S*dS', 'N*dN';
> >>>>         }
> >>>>      }
> >>>> } else {
> >>>>      my $i =0;
> >>>>      my @seqs = $result->get_seqs;
> >>>>      for my $row ( @$MLmatrix ) {
> >>>>         print $seqs[$i++]->display_id, join("\t",@$row), "\n";
> >>>>      }
> >>>> }
> >>>>
> >>>> On Mar 9, 2005, at 5:41 PM, Jason Stajich wrote:
> >>>>
> >>>>> I just updated things last week so this is brand-spanking-new.  I
> >>>>> don't know if I connected everything up for NSsites stuff quite yet
> >>>>> as that is handled in - the branch-specific parsing should work
> >>>>> now.
> >>>>> I don't know if the synopsis code is really up to snuff either.
> >>>>> When
> >>>>> I get around to it I will try and see what still needs to be
> >>>>> connected
> >>>>> in NSsites parsing.
> >>>>>
> >>>>> I don't think $node->param() is going to work -
> >>>>> $node->get_tag_values() is the way I've implemented it.
> >>>>>
> >>>>> <00parse_codeml.pl>
> >>>>>
> >>>>> -jason
> >>>>> --
> >>>>> Jason Stajich
> >>>>> jason.stajich at duke.edu
> >>>>> http://www.duke.edu/~jes12/
> >>>>>
> >>>>> On Mar 9, 2005, at 5:23 PM, Edward Chuong wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm trying to parse PAML results, and running into some trouble.
> >>>>>> I'm
> >>>>>> using branch specific omega model, and I want to get the branch
> >>>>>> specific ka/ks values out.
> >>>>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >>>>>> Bio/
> >>>>>> Tools/Phylo/PAML.pm?rev=HEAD&cvsroot=bioperl&content-type=text/
> >>>>>> vnd.viewcvs-markup
> >>>>>> says that $node->param('omega') should work, but Data::Dumper
> >>>>>> shows
> >>>>>> that this value isn't stored in the node (only branch lengths and
> >>>>>> seq
> >>>>>> IDs appear to be stored).
> >>>>>>
> >>>>>> I'm assuming that I can get these values out of the
> >>>>>> get_NSSite_result() Bio::Tools::Phylo::PAML::ModelResult object,
> >>>>>> but
> >>>>>> I'm not sure how to call it. The current synopsis uses
> >>>>>> "get_model_params" but it seems to be out of date because it's not
> >>>>>> in
> >>>>>> the current souce. The docs at
> >>>>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >>>>>> Bio/
> >>>>>> Tools/Phylo/PAML/Result.pm?rev=HEAD&cvsroot=bioperl&content-
> >>>>>> type=text/vnd.viewcvs-markup
> >>>>>> say to use my
> >>>>>> @results = @{$self->get_NSSite_results};
> >>>>>> --that looks like a mistake, and I've tried
> >>>>>> @result = $result->get_NSSite_results but that doesn't work either
> >>>>>> (just get undefined objs).
> >>>>>>
> >>>>>> Am I doing something wrong, or is this functionality still being
> >>>>>> worked on? I've tried using both 1.4 and the LIVE versions. Any
> >>>>>> help
> >>>>>> is appreciated, thanks!
> >>>>>>
> >>>>>> -Ed
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l@portal.open-bio.org
> >>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Edward Chuong
> >>> (949) 939-2732
> >>> AIM: edawad85
> >>>
> >>
> >>
> >
> >
> > --
> > Edward Chuong
> > (949) 939-2732
> > AIM: edawad85
> >
> 
> 
> 


-- 
Edward Chuong
(949) 939-2732
AIM: edawad85
From allenday at ucla.edu  Thu Mar 10 20:36:33 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu Mar 10 20:31:27 2005
Subject: [Bioperl-l] Living on the edg with 1.5?
In-Reply-To: <4230CB95.2000701@purdue.edu>
References: <4230CB95.2000701@purdue.edu>
Message-ID: <Pine.LNX.4.58.0503101735380.8204@sumo.ctrl.ucla.edu>

This is fixed on cvs HEAD.  I haven't ported the changes to the 1.5.1 
bugfix branch yet.

I can't comment on the other bug you report.

-Allen

> ...
> FEATURES             Location/Qualifiers
>      source          1..7313
>                      /mol_type="Bio::Annotation::SimpleValue=HASH(0x9b3870)"
>                      
> /tissue_type="Bio::Annotation::SimpleValue=HASH(0x9b38b8)"
>                      /db_xref="Bio::Annotation::SimpleValue=HASH(0x9b3828)"
>                      
> /transposon="Bio::Annotation::SimpleValue=HASH(0x9b0968)"
>                      /strain="Bio::Annotation::SimpleValue=HASH(0x9b3900)"
>                      
> /chromosome="Bio::Annotation::SimpleValue=HASH(0x9b3948)"
>                      /organism="Bio::Annotation::SimpleValue=HASH(0x9b3990)"
>      LTR             1..649
>                      /label=Bio::Annotation::SimpleValue=HASH(0x9aefd4)
>      TATA_signal     304..310
>                      /label=Bio::Annotation::SimpleValue=HASH(0x9b55fc)
>      misc_feature    651..659
>                      /label=Bio::Annotation::SimpleValue=HASH(0x9b5764)
> ...
From allenday at ucla.edu  Thu Mar 10 20:57:36 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu Mar 10 20:52:29 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <422ECDDD.40404@biologie.uni-freiburg.de>
References: <422ECDDD.40404@biologie.uni-freiburg.de>
Message-ID: <Pine.LNX.4.58.0503101755580.8204@sumo.ctrl.ucla.edu>

I'm unable to test the code in PersistentObject.pm as I don't have biosql
set up, but you might try adding this to Reference.pm

  use overload 'ne' => sub { "$_[0]" ne "$_[1]" }

Please let me know if this fixes your error and I'll add this 'ne'
overload to all the Bio::Annotation::* classes on HEAD.

-Allen


On Wed, 9 Mar 2005, Daniel Lang wrote:

> Hi,
> I?m retrieving seq objects from a local biosql db (using the latest cvs 
> verion of bioperl-db) and e.g. writing them with SeqIO. After changing 
> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the 
> following error:
> 
> Operation `ne': no method found,!!left argument in overloaded package 
> Bio::Annotation::Reference,!!right argument has no overloaded magic at 
> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm 
> line 534, <GEN1> line 1.!
> 
> The module PersistentObject.pm hasn?t changed and in Reference.pm there 
> is only this change:
> 
> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm 
> bioperl-live/Bio/Annotation/Reference.pm
> 1c1
> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
> ---
>  > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
> 56c56,57
> < # use overload '""' => \&as_text;
> ---
>  > use overload '""' => sub { $_[0]->title || ''};
>  > use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
> 
> I?ve reversed this, but no positive result - the error remains...
> Any hints?
> 
> Thanks in advance,
> Daniel
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From elia at tigem.it  Fri Mar 11 03:02:03 2005
From: elia at tigem.it (Elia Stupka)
Date: Fri Mar 11 03:07:03 2005
Subject: [Bioperl-l] Proposal for bio-perl updates: ACE assembly file
In-Reply-To: <200503101303.40339.jswanson@iastate.edu>
References: <200502141205.52256.jswanson@iastate.edu>
	<4f12c65ac919697fd8a7e9220db182fd@tigem.it>
	<20050310134814.GE27364@iib.unsam.edu.ar>
	<200503101303.40339.jswanson@iastate.edu>
Message-ID: <17ac8b6cebdb61a12a42d6cb429a3e9b@tigem.it>

I was going to have a look at both the ACE issue as well as the more 
general issues which I have found not ideal such as:
-resetting LocatableSeq coordinates (why not keep useful coords?)
-Having both features and LocatableSeqs, when LocatableSeqs give you 
both the gapped sequence as well as the coordinates

Will write in about 24 hours (I hope) with a decent outcome (I hope)

Cheers,

Elia


On Mar 10, 2005, at 8:03 PM, Jordan Swanson wrote:

> On Thursday 10 March 2005 07:48 am, Fernan Aguero wrote:
>> +----[ Elia Stupka <elia@tigem.it> (01.Mar.2005 10:17):
>> | Hi Jordan,
>> |
>> | I have been doing some work on Contig::Assembly myself recently, and
>> | have also been in touch with the author (Robson) about it. Perhaps 
>> the
>> | best thing would be for the three of us to have a chat about this
>> | object, try to revamp it a little with our improvements, and then
>> | Robson or I can check it in?
>> |
>> | regards,
>> |
>> | Elia
>>
>> +----]
>>
>> Hi!
>>
>> We have just got a need to produce .ace files and noticed
>> that this functionality was lacking.
>>
>> I also saw the recent thread about this topic on the list.
>> Question: has this moved forward since the last message was
>> posted (March 1st)?
>>
>> If so, are the  proposed changes in a form that can be
>> applied and tested by others (a recursive diff, perhaps
>> against a recent CVS checkout or against the 1.5-release)
>
> I sent a copy of the code I have written to Elia and Robson.  Robson 
> mentioned
> that he was extremely busy this month, and that he would be willing to
> discuss it at a later time.  Later today, I can send you a zipped up 
> file (my
> diff-skills are non-existant, so I can't do it that way without poring
> through the manual) of the code we have been using, which is working 
> well for
> the features that our lab uses.  Of course, I would appreciate any 
> feedback
> that you could offer, as well.
>
> ---
> Jordan M Swanson
> Department of Ecology, Evolution, and Organismal Biology
> 431 Bessey Hall
> Iowa State University
> Ames, IA 50011
> Lab 515 294-7098
> FAX: 515-294-1337
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
---
Telethon Institute of Genetics and Medicine
Via Pietro Castellino, 111
80131 Napoli

Tel. +39 081 6132 335
Fax. +39 081 560 98 77

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2404 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050311/b4113b5d/attachment.bin
From Richard.Adams at ed.ac.uk  Fri Mar 11 04:19:07 2005
From: Richard.Adams at ed.ac.uk (Richard Adams)
Date: Fri Mar 11 04:14:56 2005
Subject: [Bioperl-l] 1.6 release
Message-ID: <4231628B.4010007@ed.ac.uk>

Hello,
Is there any schedule for the 1.6 release?
just to know by when I have to get by modules working.....

Richard

-- 
Dr Richard Adams
Psychiatric Genetics Group,
Medical Genetics,
Molecular Medicine Centre,
Western General Hospital,
Crewe Rd West,
Edinburgh UK
EH4 2XU

Tel: 44 131 651 1084
richard.adams@ed.ac.uk


From sutripa at vbi.vt.edu  Thu Mar 10 11:29:53 2005
From: sutripa at vbi.vt.edu (Sucheta Tripathy)
Date: Fri Mar 11 04:38:45 2005
Subject: RES: [Bioperl-l] Mysql columns and Blast evalues
In-Reply-To: <8D44604203DAF9438BF9123B4A08C779575FC3@alpha.ioc.fiocruz.b
 r>
Message-ID: <5.1.0.14.0.20050310112935.02066100@mail.vbi.vt.edu>


Try storing as double.

Sucheta

At 06:42 AM 3/10/2005 -0300, davila wrote:
>Hi Stefan,
>
>Thanks for the tips !
>
>I guess the problem of using VARCHAR could be the limitations to compare 
>the real evalues, so if I want to do something or only show evalues 
>greater or smaller than 1e-50 would it work ok ?
>
>I wonder to know what other (mysql) column types (any further details 
>would be appreciated) colleagues are using to store their Blast evalues ?
>
>Thanks.
>
>
>-----Mensagem original-----
>De: Stefan Kirov [mailto:skirov@utk.edu]
>Enviada: qui 10/3/2005 00:25
>Para: davila
>Cc: bioperl-l@portal.open-bio.org
>Assunto: Re: [Bioperl-l] Mysql columns and Blast evalues
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l

From daniel.lang at biologie.uni-freiburg.de  Fri Mar 11 05:06:19 2005
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Fri Mar 11 05:01:05 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <Pine.LNX.4.58.0503101755580.8204@sumo.ctrl.ucla.edu>
References: <422ECDDD.40404@biologie.uni-freiburg.de>
	<Pine.LNX.4.58.0503101755580.8204@sumo.ctrl.ucla.edu>
Message-ID: <42316D9B.801@biologie.uni-freiburg.de>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Allen,

When I add the line to all Bio::Annotation::*, we run into various other
errors, e.g. :
Can't call method "primary_key" on an undefined value at
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm
line 1325.!

What are these overloaded methods for?
Who/What is calling ->ne()?

I?ve tried using the latest cvs version with the Annotation classes from
december: This error is gone but then all SeqFeature tag_values are
stringified memory addresses:
/clone_lib="Bio::Annotation::SimpleValue=HASH(0x76883bb8)"

/tissue_type="Bio::Annotation::SimpleValue=HASH(0x76883ccc
~                     )"
~                     /clone="Bio::Annotation::SimpleValue=HASH(0x76883d14)"

/organism="Bio::Annotation::SimpleValue=HASH(0x76883d5c)"

/lab_host="Bio::Annotation::SimpleValue=HASH(0x76883dec)"

/db_xref="Bio::Annotation::SimpleValue=HASH(0x76883e34)"

/mol_type="Bio::Annotation::SimpleValue=HASH(0x76885360)"
~                     /note="Bio::Annotation::SimpleValue=HASH(0x76883da4)"

?
To make it even more complicated, I?ve dumped both seq objects (the one
all with classes from dec?04 and the bioperl-live with only the
Annotation classes from dec?04) there is no diff!?
The Seq, SeqI, RichSeq SeqFeature::Generic objects didn?t change since
then...

- -Daniel

Allen Day wrote:
| I'm unable to test the code in PersistentObject.pm as I don't have biosql
| set up, but you might try adding this to Reference.pm
|
|   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
|
| Please let me know if this fixes your error and I'll add this 'ne'
| overload to all the Bio::Annotation::* classes on HEAD.
|
| -Allen
|
|
| On Wed, 9 Mar 2005, Daniel Lang wrote:
|
|
|>Hi,
|>I?m retrieving seq objects from a local biosql db (using the latest cvs
|>verion of bioperl-db) and e.g. writing them with SeqIO. After changing
|>from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the
|>following error:
|>
|>Operation `ne': no method found,!!left argument in overloaded package
|>Bio::Annotation::Reference,!!right argument has no overloaded magic at
|>/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm
|>line 534, <GEN1> line 1.!
|>
|>The module PersistentObject.pm hasn?t changed and in Reference.pm there
|>is only this change:
|>
|>diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
|>bioperl-live/Bio/Annotation/Reference.pm
|>1c1
|>< # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
|>---
|> > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
|>56c56,57
|>< # use overload '""' => \&as_text;
|>---
|> > use overload '""' => sub { $_[0]->title || ''};
|> > use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
|>
|>I?ve reversed this, but no positive result - the error remains...
|>Any hints?
|>
|>Thanks in advance,
|>Daniel
|>
|>
|>
|>_______________________________________________
|>Bioperl-l mailing list
|>Bioperl-l@portal.open-bio.org
|>http://portal.open-bio.org/mailman/listinfo/bioperl-l
|>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCMW2bmJnbCpJAG3ARAvbAAJ966Qc8RBFbhlFL0VpVo073N1sEWgCdF7jM
56Ozp3Rl2HdHxxXipeJnx8w=
=OHJA
-----END PGP SIGNATURE-----
From rich at thevillas.eclipse.co.uk  Fri Mar 11 09:09:06 2005
From: rich at thevillas.eclipse.co.uk (rich)
Date: Fri Mar 11 09:05:22 2005
Subject: [Bioperl-l] hapmap.pm startingcol now 11?
In-Reply-To: <1110461476.8193.6.camel@magneto>
References: <1110461476.8193.6.camel@magneto>
Message-ID: <4231A682.6080100@thevillas.eclipse.co.uk>


Hi,

yes,  you're right.
Jason, I seem to remember you were going to give me cvs access to make 
fixes. Could you give me access so that I can make the change?

cheers
Rich


Albert Vilella wrote:

>Hi all,
>
>AFAICS, Hapmap dump files have (since Dec 2004?) an extra field previous
>to the starting column for the first genotype, so the $startingcol in
>hapmap.pm should change from 10 to 11 (see end of message).
>
>Can anyone confirm? 
>
>I'm getting a MSG:
>
>-------------------- WARNING ---------------------
>MSG: cannot add NA06993 as a genotype skipping
>--------------------------------------------------
>
>And I'm not sure is related to this or not,
>
>Bests,
>
>    Albert.
>
>hapmap.pm
>---------------------------
>sub _pivot {
>    my ($self) = @_;
>
>    my (@cols,@rows,@idheader);
>    while ($_ = $self->_readline){
>	chomp($_);
>	next if( /^\s*\#/ || /^\s+$/ || ! length($_) );
>	if( /^rs\#\s+alleles\s+chrom\s+pos\s+strand/ ) {
>	    @idheader = split $self->flag('field_delimiter');
>	} else { 
>	    push @cols, [split $self->flag('field_delimiter')];
>	}
>    }
>    #Post Dec 2004. Previously was 10
>    my $startingcol = 11;
>
>    $self->{'_header'} = [ map { $_->[0] } @cols];
>    for my $n ($startingcol.. $#{ $cols[ 0 ]}) { 
>	my $column = [ $idheader[$n],
>		       map{ $_->[ $n ] } @cols ];	
>	push (@rows, $column); 
>    }
>    $self->{'_pivot'} = [@rows];
>    $self->{'_i'} = 0;
>}
>---------------------------
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>

From allenday at ucla.edu  Fri Mar 11 12:59:58 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Mar 11 12:54:47 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <42316D9B.801@biologie.uni-freiburg.de>
References: <422ECDDD.40404@biologie.uni-freiburg.de>
	<Pine.LNX.4.58.0503101755580.8204@sumo.ctrl.ucla.edu>
	<42316D9B.801@biologie.uni-freiburg.de>
Message-ID: <Pine.LNX.4.58.0503110955460.3609@sumo.ctrl.ucla.edu>

> What are these overloaded methods for?
> Who/What is calling ->ne()?

The SeqFeatureI class has been made, under the hood, to use the 
Bio::AnnotationCollection for storing annotations.  
Bio::AnnotationColleciton holds Bio::AnnotationI objects, not the simple 
strings that were held in older SeqFeatureI implementing classes.  There 
is still a lot of code in bioperl (and bioperl-db I take it) that wants to 
treat the annotations as strings, so we add overloading to allow this to 
happen.  No one is calling the eq() method direcly, it gets triggered when 
someone does like this:

  if ( $obj1->dbxref eq $obj2->dbxref ) { }

It sounds like this might not be what is causing the problem for you, but 
I thought you should be aware as you debug.

-Allen
From hlapp at gmx.net  Fri Mar 11 13:11:24 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 11 13:07:37 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <Pine.LNX.4.58.0503101755580.8204@sumo.ctrl.ucla.edu>
Message-ID: <FAE79F74-9258-11D9-8881-000A959EB4C4@gmx.net>

I suggest that all the fancy overloading is removed from core bioperl 
modules. If we need overloading for stringification or comparison 
operators in one or our core modules I think we are making a mistake.

This is part of the huge mess introduced when the SeqFeatureI 
architecture was carelessly changed days before release. It's a 
prototypical example for what not to do in a project that's as widely 
used as bioperl.

*Every single bit* of those changes need to be rolled back from the 
release and if nobody else has done it by then I will do so in two 
weeks.

	-hilmar

On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:

> I'm unable to test the code in PersistentObject.pm as I don't have 
> biosql
> set up, but you might try adding this to Reference.pm
>
>   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
>
> Please let me know if this fixes your error and I'll add this 'ne'
> overload to all the Bio::Annotation::* classes on HEAD.
>
> -Allen
>
>
> On Wed, 9 Mar 2005, Daniel Lang wrote:
>
>> Hi,
>> I?m retrieving seq objects from a local biosql db (using the latest 
>> cvs
>> verion of bioperl-db) and e.g. writing them with SeqIO. After changing
>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the
>> following error:
>>
>> Operation `ne': no method found,!!left argument in overloaded package
>> Bio::Annotation::Reference,!!right argument has no overloaded magic at
>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm
>> line 534, <GEN1> line 1.!
>>
>> The module PersistentObject.pm hasn?t changed and in Reference.pm 
>> there
>> is only this change:
>>
>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
>> bioperl-live/Bio/Annotation/Reference.pm
>> 1c1
>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
>> ---
>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
>> 56c56,57
>> < # use overload '""' => \&as_text;
>> ---
>>> use overload '""' => sub { $_[0]->title || ''};
>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
>>
>> I?ve reversed this, but no positive result - the error remains...
>> Any hints?
>>
>> Thanks in advance,
>> Daniel
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Fri Mar 11 13:12:36 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 11 13:07:52 2005
Subject: [Bioperl-l] Living on the edg with 1.5?
In-Reply-To: <4230CB95.2000701@purdue.edu>
Message-ID: <25B516B6-9259-11D9-8881-000A959EB4C4@gmx.net>

Don't use 1.5.0 for now. It's SeqFeatureI is broken and isn't nor ever 
was ready for release. Sorry about this glitch.

	-hilmar

On Thursday, March 10, 2005, at 02:35  PM, Phillip San Miguel wrote:

>
>    Because GBROWSE wants bioperl v 1.5 I'm moved to that on most of 
> the platforms I use. But I've noticed two (presumably) unrelated 
> glitches just in using it today. Should I really be using v 1.4? I'm 
> not a developer, and don't really welcome additional headaches. > Advice?
>
>    Here are the bugs:
>
> perl -e 'use Bio::Perl; $seq_object = 
> get_sequence("genbank","u11059"); 
> write_sequence(">test","genbank",$seq_object);'
>
>
> fetches U11059 from genbank and tries to print in genbank format but 
> something is wrong in v 1.5.
>
> Running Version 1.5:
>
> ...
> FEATURES             Location/Qualifiers
>     source          1..7313
>                     
> /mol_type="Bio::Annotation::SimpleValue=HASH(0x9b3870)"
>                     
> /tissue_type="Bio::Annotation::SimpleValue=HASH(0x9b38b8)"
>                     
> /db_xref="Bio::Annotation::SimpleValue=HASH(0x9b3828)"
>                     
> /transposon="Bio::Annotation::SimpleValue=HASH(0x9b0968)"
>                     
> /strain="Bio::Annotation::SimpleValue=HASH(0x9b3900)"
>                     
> /chromosome="Bio::Annotation::SimpleValue=HASH(0x9b3948)"
>                     
> /organism="Bio::Annotation::SimpleValue=HASH(0x9b3990)"
>     LTR             1..649
>                     /label=Bio::Annotation::SimpleValue=HASH(0x9aefd4)
>     TATA_signal     304..310
>                     /label=Bio::Annotation::SimpleValue=HASH(0x9b55fc)
>     misc_feature    651..659
>                     /label=Bio::Annotation::SimpleValue=HASH(0x9b5764)
> ...
>
> Running Version 1.4 (no problem):
>
> ...
> FEATURES             Location/Qualifiers
>     source          1..7313
>                     /transposon="retrotransposon"
>                     /mol_type="genomic DNA"
>                     /db_xref="taxon:4577"
>                     /tissue_type="leaf"
>                     /strain="A188"
>                     /chromosome="7"
>                     /organism="Zea mays"
>     LTR             1..649
>                     /label=upstreamLTR
>     TATA_signal     304..310
>                     /label=upstream
>     misc_feature    651..659
>                     /label=PBSsite
> ...
>
> The other 1.5 bug I found today: The following one-liner demonstrates 
> it:
>
> perl -e 'use Bio::SeqIO;use Bio::Seq::PrimaryQual; $qual_object = 
> Bio::Seq::PrimaryQual->new(-qual=> "10 20 30 40 50 40 30 20 10", -id 
> => "test", -format => 'qual'); $qual_out = Bio::SeqIO->new(-file => 
> ">test", -format => 'qual');$qual_out->write_seq($qual_object);'
>
> When I run it under Version 1.5 the correct output file is produced 
> but I also get the following output sent to STDOUT:
>
> '_root_verbose' => 0
> 'display_id' => 'test'
> 'qual' => ARRAY(0x5b07e8)
>   0  10
>   1  20
>   2  30
>   3  40
>   4  50
>   5  40
>   6  30
>   7  20
>   8  10
>
> Under Version 1.4 everything is fine. (No extraneous STDOUT is 
> created.)
>
> This looks like someone uncommented a Data::Dumper print somewhere, 
> but I wasn't able to find it.
>
> -- 
> Phillip SanMiguel
> Purdue Genomics Core Facility
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Fri Mar 11 13:17:23 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 11 13:11:55 2005
Subject: [Bioperl-l] Entrez Gene ASN
In-Reply-To: <42305650.30403@utk.edu>
Message-ID: <D0BFA07A-9259-11D9-8881-000A959EB4C4@gmx.net>

Gene shouldn't be fundamentally different from LocusLink, and LocusLink 
was represented as an annotated SeqI within bioperl.

If at all possible I'd still like it to remain that way for Gene in 
order to allow for a smooth transition from LL to Gene for code that's 
been using the former.

If you want to emphasize the fact that it's a container for sequences, 
then that sounds like a ClusterI to me, which can be richly annotated 
too.

Note also that NCBI is working on an ASN.1->XML converter. Personally, 
I'm inclined to wait for that converter to appear, but other priorities 
may prevail.

Let me know what you think.

	-hilmar

On Thursday, March 10, 2005, at 06:14  AM, Stefan Kirov wrote:

> Hi guys!
> I have done some (mostly) serious thinking about ASN Entrez Gene 
> parsing and I propose we do my favorite thing- postpone everything we 
> cannot deal with right now. If you want it to sound better: take a 
> gradual approach where we store the data we can deal with in the 
> existing Bioperl objects and skipping the rest for now.
> In details:
> ASN gene record can be correctly represented as a tree. I have written 
> a simple parser for my own purposes which is storing the following:
> node_id---|
>                  --parent
>                  --level
>                  --tag
>                  --values
> What I do then is get specific levels and tags and build different 
> objects. So level 2 with parent EntrezGene (which is the root level 
> and has no information) is gene description and has tags such as gene, 
> name, etc; at level 3, 5 and 6 you can get the complete specie 
> definition by looking for orgname and org as tags and records with 
> parent mod (which is a value for orgname, descend down the branch).
> I am using this approach to store most of the data in a relational 
> database without going through Bioperl. What I ultimately want to do 
> is use standard Bioperl modules. However, I don't think we have an 
> object that can efficiently represent the structure (correct me if I 
> am wrong). I think it may be a good idea to have a container object, 
> possibly Bio::Gene that may contain multiple Bio::Seq objects (with or 
> without real sequence). I believe we can borrow some structure and 
> code from EnsEMBL gene representation (way to contain multiple 
> transcripts, etc., not the database interactions certainly).
> Please let me know what you think.
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Fri Mar 11 13:19:44 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 11 13:14:27 2005
Subject: [Bioperl-l] uniprot flatfile extraction
In-Reply-To: <1110378245.422f07057f49c@sms.ed.ac.uk>
Message-ID: <24B0CE20-925A-11D9-8881-000A959EB4C4@gmx.net>

Basically everything that's in the UniProt file should be found on the 
RichSeqI object returned from the parser (Bio::SeqIO::swiss). If it's 
in the feature table you'll find it as annotation (tag/value) of the 
features held by the seq object ($seq->get_SeqFeatures). Other stuff 
like dbxrefs are in the annotation bundle ($seq->annotation).

	-hilmar

On Wednesday, March 9, 2005, at 06:24  AM, SG Edwards wrote:

> Hi, sorry if this is basic but I've read the documentation and am still
> confused!!
>
> I wish to extract uniprot flatfile data into my database. I want to 
> get the
> following variables:
>
> Protein ID, length, description, molecular weight, sequence, comments, 
> cross
> references, disulphide bonds, species, entered date, last modified, 
> last
> annotated, protein synonyms.
>
> I know that I can get some of these (e.g. protein ID, length) using 
> Bioperl but
> can I get all of the data also or am I better writing my own from 
> scratch?
>
> Thanks
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Fri Mar 11 13:22:38 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 11 13:19:13 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <422ECDDD.40404@biologie.uni-freiburg.de>
Message-ID: <8C8C7E3A-925A-11D9-8881-000A959EB4C4@gmx.net>

Try removing the overload altogether.

Or, possibly better yet, don't use 1.5. I'm saying this because it 
should be the responsibility of the one who created the mess and didn't 
test it to clean it up and test rigorously and not the reponsibility of 
the community out there.

	-hilmar

On Wednesday, March 9, 2005, at 02:20  AM, Daniel Lang wrote:

> Hi,
> I?m retrieving seq objects from a local biosql db (using the latest 
> cvs verion of bioperl-db) and e.g. writing them with SeqIO. After 
> changing from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I 
> get the following error:
>
> Operation `ne': no method found,!!left argument in overloaded package 
> Bio::Annotation::Reference,!!right argument has no overloaded magic at 
> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm 
> line 534, <GEN1> line 1.!
>
> The module PersistentObject.pm hasn?t changed and in Reference.pm 
> there is only this change:
>
> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm 
> bioperl-live/Bio/Annotation/Reference.pm
> 1c1
> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
> ---
> > # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
> 56c56,57
> < # use overload '""' => \&as_text;
> ---
> > use overload '""' => sub { $_[0]->title || ''};
> > use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
>
> I?ve reversed this, but no positive result - the error remains...
> Any hints?
>
> Thanks in advance,
> Daniel
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From skirov at utk.edu  Fri Mar 11 13:36:29 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Fri Mar 11 13:31:41 2005
Subject: [Bioperl-l] Entrez Gene ASN
In-Reply-To: <20050310212823.GB5392@anna>
References: <42305650.30403@utk.edu> <20050310212823.GB5392@anna>
Message-ID: <4231E52D.5030908@utk.edu>


>
>Hi Stefan,
>
>from the work I have done on this issue it would seem that your suggestion is quite promising. Let me know if you need some help on this.
>
I will definitely need some as right now I am just extracting data that 
I need for my own project. Therefore some help wold be nice with respect 
to the boring task of deciding where some data should go and how exactly 
to capture it.

> How is the performance that you are seeing to date?
>
>  
>
Hard to tell as I am not parsing everything. Rough estimate is few 
seconds and I doubt it will grow significantly.

>best,
>Peter
>
>  
>
Stefan
From skirov at utk.edu  Fri Mar 11 14:02:16 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Fri Mar 11 13:56:53 2005
Subject: [Bioperl-l] Entrez Gene ASN
In-Reply-To: <D0BFA07A-9259-11D9-8881-000A959EB4C4@gmx.net>
References: <D0BFA07A-9259-11D9-8881-000A959EB4C4@gmx.net>
Message-ID: <4231EB38.8040809@utk.edu>


Hilmar Lapp wrote:

> Gene shouldn't be fundamentally different from LocusLink, and 
> LocusLink was represented as an annotated SeqI within bioperl.

It is not, you are right.

>
> If at all possible I'd still like it to remain that way for Gene in 
> order to allow for a smooth transition from LL to Gene for code that's 
> been using the former.
>
hmmmm, back compatibility is good thing, but sometimes it may be hard to 
achieve.

> If you want to emphasize the fact that it's a container for sequences, 
> then that sounds like a ClusterI to me, which can be richly annotated 
> too.

Let me disagree here. Cluster is designed for independent sequences, 
where Gene should deal with sequences, that have hierarchical 
relationship among themselves. This is one of the issues I think  Seq 
object is not designed to deal with.  What we need is:
genome--(Bio::Seq)-
                   |--transcript(Bio::Seq)
                                          |--protein(Bio::Seq)
                     |--transcript(Bio::Seq)
                                          |--protein(Bio::Seq)
etc. As an alternative one can store in a separate ontology object the 
relationships, but I don't think this is really effective.
As many genome and transcript entries exist, it will be easy to loose 
the relations.

Another significant concern I have is that if we store everything as 
SeqFeature or the overhead may become huge (some records have hundreds 
of different features) and any user of the parser will have to do quite 
of a data mining to find the relevant feature. One approach would be to 
add more Bio::Annotation:: objects (for example Bio::Annotation::STS, 
Bio::Annotation::GRIF, etc). And one last thing: orthology (which agan 
could be based on ontology) and synteny are things that should be in the 
Gene (or loculink) object.
We may decide to create a simplified (Bio::Seq, no relationships) or 
more complex object (Gene), based on the user request.
I hope this does not sound too counfusing as I am burried in the Gene 
ASN structure and I am quickly approaching quiet madness.

>
> Note also that NCBI is working on an ASN.1->XML converter. Personally, 
> I'm inclined to wait for that converter to appear, but other 
> priorities may prevail.
>
I have waited for a while. If they cannot parse their own data...? 
Anyway, some issues will still be there even if we have the XML.
Stefan

> Let me know what you think.
>
>     -hilmar
>
> On Thursday, March 10, 2005, at 06:14  AM, Stefan Kirov wrote:
>
>> Hi guys!
>> I have done some (mostly) serious thinking about ASN Entrez Gene 
>> parsing and I propose we do my favorite thing- postpone everything we 
>> cannot deal with right now. If you want it to sound better: take a 
>> gradual approach where we store the data we can deal with in the 
>> existing Bioperl objects and skipping the rest for now.
>> In details:
>> ASN gene record can be correctly represented as a tree. I have 
>> written a simple parser for my own purposes which is storing the 
>> following:
>> node_id---|
>>                  --parent
>>                  --level
>>                  --tag
>>                  --values
>> What I do then is get specific levels and tags and build different 
>> objects. So level 2 with parent EntrezGene (which is the root level 
>> and has no information) is gene description and has tags such as 
>> gene, name, etc; at level 3, 5 and 6 you can get the complete specie 
>> definition by looking for orgname and org as tags and records with 
>> parent mod (which is a value for orgname, descend down the branch).
>> I am using this approach to store most of the data in a relational 
>> database without going through Bioperl. What I ultimately want to do 
>> is use standard Bioperl modules. However, I don't think we have an 
>> object that can efficiently represent the structure (correct me if I 
>> am wrong). I think it may be a good idea to have a container object, 
>> possibly Bio::Gene that may contain multiple Bio::Seq objects (with 
>> or without real sequence). I believe we can borrow some structure and 
>> code from EnsEMBL gene representation (way to contain multiple 
>> transcripts, etc., not the database interactions certainly).
>> Please let me know what you think.
>> Stefan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From allenday at ucla.edu  Fri Mar 11 14:36:11 2005
From: allenday at ucla.edu (Allen Day)
Date: Fri Mar 11 14:32:47 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <FAE79F74-9258-11D9-8881-000A959EB4C4@gmx.net>
References: <FAE79F74-9258-11D9-8881-000A959EB4C4@gmx.net>
Message-ID: <Pine.LNX.4.58.0503111126240.31639@sumo.ctrl.ucla.edu>

On Fri, 11 Mar 2005, Hilmar Lapp wrote:

> I suggest that all the fancy overloading is removed from core bioperl
> modules. If we need overloading for stringification or comparison
> operators in one or our core modules I think we are making a mistake.

The overloading is only there because assumptions have been made that
annotations will be strings.  This assumption was okay previously becasue
the Bio::Annotation* modules were previously "non core" -- there was no
unified annotation system in bioperl.  Now these modules are being made
core, and this is part of the growing pain.

I'm doing what I can to address the bug reports related to these changes
as they come in, and I don't think anyone will disagree that I'm doing so
in a timely manner.  However, I cannot fix bugs or field questions on
biosql modules and would appreciate some cooperation/assistance from the
biosql developers.

> This is part of the huge mess introduced when the SeqFeatureI
> architecture was carelessly changed days before release. It's a
> prototypical example for what not to do in a project that's as widely
> used as bioperl.

The SeqFeatureI changes were being gradually made in the 1-2 months prior
to the 1.5 release.  The release was, may I remind you, a *developer*
release and not expected to be bug free.

> *Every single bit* of those changes need to be rolled back from the
> release and if nobody else has done it by then I will do so in two
> weeks.

Fine for the 1.5.1 branch, although I don't agree that this should be done
on the main trunk.

-Allen


> 	-hilmar
> 
> On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:
> 
> > I'm unable to test the code in PersistentObject.pm as I don't have 
> > biosql
> > set up, but you might try adding this to Reference.pm
> >
> >   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
> >
> > Please let me know if this fixes your error and I'll add this 'ne'
> > overload to all the Bio::Annotation::* classes on HEAD.
> >
> > -Allen
> >
> >
> > On Wed, 9 Mar 2005, Daniel Lang wrote:
> >
> >> Hi,
> >> I?m retrieving seq objects from a local biosql db (using the latest 
> >> cvs
> >> verion of bioperl-db) and e.g. writing them with SeqIO. After changing
> >> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the
> >> following error:
> >>
> >> Operation `ne': no method found,!!left argument in overloaded package
> >> Bio::Annotation::Reference,!!right argument has no overloaded magic at
> >> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm
> >> line 534, <GEN1> line 1.!
> >>
> >> The module PersistentObject.pm hasn?t changed and in Reference.pm 
> >> there
> >> is only this change:
> >>
> >> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
> >> bioperl-live/Bio/Annotation/Reference.pm
> >> 1c1
> >> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
> >> ---
> >>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
> >> 56c56,57
> >> < # use overload '""' => \&as_text;
> >> ---
> >>> use overload '""' => sub { $_[0]->title || ''};
> >>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
> >>
> >> I?ve reversed this, but no positive result - the error remains...
> >> Any hints?
> >>
> >> Thanks in advance,
> >> Daniel
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
From sutripa at vbi.vt.edu  Sat Mar 12 08:35:42 2005
From: sutripa at vbi.vt.edu (Sucheta Tripathy)
Date: Sat Mar 12 08:30:33 2005
Subject: [Bioperl-l] drawing additional axes with GD::Graph
Message-ID: <2598.199.3.136.4.1110634542.squirrel@webmail.vbi.vt.edu>


Dear group,

I am trying to have 2 more horizontal lines to the plot I have, using
GD::Graph.For example the X,Y origin be 0,0 and additional horizontal
lines at y value .78 and .84. Please suggest me something other than
$im->line().

Thanks

Sucheta


-- 
Sucheta Tripathy
Virginia Bioinformatics Institute Phase-I
Washington street.
Virginia Tech.
Blacksburg,VA 24061-0447
phone:(540)231-8138
Fax:  (540) 231-2606
From Mingyi.Liu at gpc-biotech.com  Sat Mar 12 10:43:44 2005
From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi)
Date: Sat Mar 12 10:37:40 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com>

Hello,

I have just released a project on sourceforge that contains 4 different parsers for Entrez Gene ASN file based on regex, Parse::RecDescent, Parse::Yapp, and Perl-byacc.  They differ in performance and the regex-based parser is the best performer, processing over 13000 records a minute on average (It finishes the 900+ MB human annotation file in 11 minutes on one Intel Xeon 2.4 GHz CPU).  The other parsers are at least a few fold slower but I included them since it'd be of intererst to people learning to use those tools or choosing among the tools for a practical project.  All parsers are short OO-modules (<100 lines if not counting POD/YACC-generated code), so they are easy to use and understand.

Right now my parsers do not assemble data into Bioperl objects (because for my project I only needed to put them into a proprietary XML format, which is not released (not that it's anything special, just IP issues.  Without IP issues, I could've released the parser code in Feb.)).  They behave like XML-parsers, namely, they parse entrez gene records and assemble content into data structures only.  But I hope it could serve as a base that Bioperl objects can be built (the data structure is easy to use).  Please feel free to use the code for any Bioperl or other projects as I released them under GPL (thanks to my company and a collaborating company's consent).

Please also feel free to contact me if you have any suggestion or bug report.

The URL for the sourceforge project is http://sourceforge.net/projects/egparser/

Thanks,

Mingyi

Dr. Mingyi Liu
Computational Biologist
GPC Biotech Inc.
610 Lincoln St.
Waltham, MA 02451
USA

From skirov at utk.edu  Sat Mar 12 17:59:16 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Sat Mar 12 17:54:04 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com>
References: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com>
Message-ID: <42337444.1050102@utk.edu>

Mingyi,
I looked at the code (EntrezGene) and so far it seems to me it gives as 
you claim pretty accurate and easy to understand data structure (few 
dead entries and some 0 size array, but nothing major).
The only concern I have is that the data structure. If you want to 
achieve a better structure (non-redundant, two level where possible or a 
collection of Bioperl objects) this will slow things down. I guess I 
will compare how the code I wrote compares to yours and choose the 
faster one. I think this makes sense.
Stefan

Liu, Mingyi wrote:

>Hello,
>
>I have just released a project on sourceforge that contains 4 different parsers for Entrez Gene ASN file based on regex, Parse::RecDescent, Parse::Yapp, and Perl-byacc.  They differ in performance and the regex-based parser is the best performer, processing over 13000 records a minute on average (It finishes the 900+ MB human annotation file in 11 minutes on one Intel Xeon 2.4 GHz CPU).  The other parsers are at least a few fold slower but I included them since it'd be of intererst to people learning to use those tools or choosing among the tools for a practical project.  All parsers are short OO-modules (<100 lines if not counting POD/YACC-generated code), so they are easy to use and understand.
>
>Right now my parsers do not assemble data into Bioperl objects (because for my project I only needed to put them into a proprietary XML format, which is not released (not that it's anything special, just IP issues.  Without IP issues, I could've released the parser code in Feb.)).  They behave like XML-parsers, namely, they parse entrez gene records and assemble content into data structures only.  But I hope it could serve as a base that Bioperl objects can be built (the data structure is easy to use).  Please feel free to use the code for any Bioperl or other projects as I released them under GPL (thanks to my company and a collaborating company's consent).
>
>Please also feel free to contact me if you have any suggestion or bug report.
>
>The URL for the sourceforge project is http://sourceforge.net/projects/egparser/
>
>Thanks,
>
>Mingyi
>
>Dr. Mingyi Liu
>Computational Biologist
>GPC Biotech Inc.
>610 Lincoln St.
>Waltham, MA 02451
>USA
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From Mingyi.Liu at gpc-biotech.com  Sat Mar 12 18:50:41 2005
From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi)
Date: Sat Mar 12 18:44:33 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2E@sw-wal-beta.gpc-biotech.com>

Hi, Stefan,

Yes, the advantage and disadvantage of my approach are that my parsers do not take the underlying data into account.  By totally ignoring the data content and focusing just on format, this appropach ensured that no data will be left behind in parsing and that the development of the parsers would be very fast, and the parsers perform very well.  In addition, even if NCBI changes the data content, the parser will most likely work just fine without any modifications.

However, this does result in a data structure that is not consolidated into, for example, the two level type you'd want.  The data structure generated merely reflects however NCBI chose to structure their Entrez Gene ASN files.  Building Bioperl objects based on my parser would take some serious efforts (1-2 weeks).  It is definitely doable though, and the performance should not slow down much.  The benchmark I gave included not just the time for parsing and data structure construction, but also data structure trimming, which traverses almost the entire data structure and make changes.  But the initiation of Bioperl objects may make the whole process slow down a few fold.

Regardless, I totally agree that it's the best if you could do a comparison and choose the most suitable approach.

BTW, can you send me example entries for which there are dead entries or 0-sized array in my parser?  I wonder if it's a problem of Entrez Gene file or my parser, since I simply let the data structure mirror the file.  But if it isn't, then I would want to check if it's a bug.  I did process the full human genome into XML files and did not see any empty elements or attributes, and the parser runs on entire mouse and rat genomes without problem, which is expected.

Thanks,

Mingyi

> -----Original Message-----
> From: Stefan Kirov [mailto:skirov@utk.edu]
> Sent: Saturday, March 12, 2005 5:59 PM
> To: Liu, Mingyi
> Cc: bioperl-l@portal.open-bio.org
> Subject: Re: [Bioperl-l] Entrez Gene ASN parsers
> 
> 
> Mingyi,
> I looked at the code (EntrezGene) and so far it seems to me 
> it gives as 
> you claim pretty accurate and easy to understand data structure (few 
> dead entries and some 0 size array, but nothing major).
> The only concern I have is that the data structure. If you want to 
> achieve a better structure (non-redundant, two level where 
> possible or a 
> collection of Bioperl objects) this will slow things down. I guess I 
> will compare how the code I wrote compares to yours and choose the 
> faster one. I think this makes sense.
> Stefan
> 

From hlapp at gmx.net  Sat Mar 12 19:33:24 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Mar 12 19:28:07 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2E@sw-wal-beta.gpc-biotech.com>
Message-ID: <82B4FA2E-9357-11D9-B647-000A959EB4C4@gmx.net>

I kind of like this approach, i.e., have a general purpose low-level 
parser that you have reasonable confidence in will never be the 
bottleneck, and then build a bioperl parser on top of it that now can 
focus its code on assembling the desired data structure as opposed to 
the file format itself.

And if course assembling that data structure will slow things down a 
lot but hey, either you want an object hierarchy in (bio-)perl or you 
don't.

Also, given the thread and previous ones, that ominous bioperl data 
structure may be very fluid initially, or even result in different 
top-level parsers depending on how compatible the different visions are 
for what to get out of that parser.

	-hilmar

On Saturday, March 12, 2005, at 03:50  PM, Liu, Mingyi wrote:

> Hi, Stefan,
>
> Yes, the advantage and disadvantage of my approach are that my parsers 
> do not take the underlying data into account.  By totally ignoring the 
> data content and focusing just on format, this appropach ensured that 
> no data will be left behind in parsing and that the development of the 
> parsers would be very fast, and the parsers perform very well.  In 
> addition, even if NCBI changes the data content, the parser will most 
> likely work just fine without any modifications.
>
> However, this does result in a data structure that is not consolidated 
> into, for example, the two level type you'd want.  The data structure 
> generated merely reflects however NCBI chose to structure their Entrez 
> Gene ASN files.  Building Bioperl objects based on my parser would 
> take some serious efforts (1-2 weeks).  It is definitely doable 
> though, and the performance should not slow down much.  The benchmark 
> I gave included not just the time for parsing and data structure 
> construction, but also data structure trimming, which traverses almost 
> the entire data structure and make changes.  But the initiation of 
> Bioperl objects may make the whole process slow down a few fold.
>
> Regardless, I totally agree that it's the best if you could do a 
> comparison and choose the most suitable approach.
>
> BTW, can you send me example entries for which there are dead entries 
> or 0-sized array in my parser?  I wonder if it's a problem of Entrez 
> Gene file or my parser, since I simply let the data structure mirror 
> the file.  But if it isn't, then I would want to check if it's a bug.  
> I did process the full human genome into XML files and did not see any 
> empty elements or attributes, and the parser runs on entire mouse and 
> rat genomes without problem, which is expected.
>
> Thanks,
>
> Mingyi
>
>> -----Original Message-----
>> From: Stefan Kirov [mailto:skirov@utk.edu]
>> Sent: Saturday, March 12, 2005 5:59 PM
>> To: Liu, Mingyi
>> Cc: bioperl-l@portal.open-bio.org
>> Subject: Re: [Bioperl-l] Entrez Gene ASN parsers
>>
>>
>> Mingyi,
>> I looked at the code (EntrezGene) and so far it seems to me
>> it gives as
>> you claim pretty accurate and easy to understand data structure (few
>> dead entries and some 0 size array, but nothing major).
>> The only concern I have is that the data structure. If you want to
>> achieve a better structure (non-redundant, two level where
>> possible or a
>> collection of Bioperl objects) this will slow things down. I guess I
>> will compare how the code I wrote compares to yours and choose the
>> faster one. I think this makes sense.
>> Stefan
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Sat Mar 12 19:55:44 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Mar 12 19:50:16 2005
Subject: [Bioperl-l] Entrez Gene ASN
In-Reply-To: <4231EB38.8040809@utk.edu>
Message-ID: <A16CAA7A-935A-11D9-B647-000A959EB4C4@gmx.net>


On Friday, March 11, 2005, at 11:02  AM, Stefan Kirov wrote:

>
>
> Hilmar Lapp wrote:
>
>> Gene shouldn't be fundamentally different from LocusLink, and 
>> LocusLink was represented as an annotated SeqI within bioperl.
>
> It is not, you are right.
>
>>
>> If at all possible I'd still like it to remain that way for Gene in 
>> order to allow for a smooth transition from LL to Gene for code 
>> that's been using the former.
>>
> hmmmm, back compatibility is good thing, but sometimes it may be hard 
> to achieve.

Well, now you contradict yourself. Above you agree that Gene and 
LocusLink are fundamentally the same, and here you say representing 
them in a compatible fashion may be hard to achieve ...

There are problems indeed though, read on ...

>
>> If you want to emphasize the fact that it's a container for 
>> sequences, then that sounds like a ClusterI to me, which can be 
>> richly annotated too.
>
> Let me disagree here. Cluster is designed for independent sequences, 
> where Gene should deal with sequences, that have hierarchical 
> relationship among themselves.

Two notes here. First, ClusterI is not designed for independent 
sequences. It is just meant as a container for sequences, be those 
related to each other or not.

Second, the ability to represent hierarchical relationships between 
sequences is basically absent from bioperl, not just from ClusterI 
(aside from ClusterI representing a relationship between the containing 
seq and the contained seqs).

We should think seriously before we add that capability. Most of the 
people and effort in the field towards hierarchical relationships 
between biological entities with sequence takes place in the domain of 
feature hierarchies, *not* sequence hierarchies. See GFF3, SO, GBrowse, 
Chado, and related efforts.

The only place I know where sequence heirarchies are extensively used 
is in our local adaptation of Biosql, and we do all of this in SQL (as 
bioperl and therefore bioperl-db has zero support for it).

It's possible but I'm not sure also wise to duplicate the support for 
feature hierarchies to sequences ... Wouldn't it in the end benefit 
more people if you were able to tie in Gene into the Unflattener that 
Chris wrote?

>  This is one of the issues I think  Seq object is not designed to deal 
> with.  What we need is:
> genome--(Bio::Seq)-
>                   |--transcript(Bio::Seq)
>                                          |--protein(Bio::Seq)
>                     |--transcript(Bio::Seq)
>                                          |--protein(Bio::Seq)

Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are 
pretty close to GFF3 and a growing wealth of support for it.

>
> Another significant concern I have is that if we store everything as 
> SeqFeature or the overhead may become huge (some records have hundreds 
> of different features)

Have you talked to Lincoln about this? I believe GBrowse is dealing 
pretty well with this huge overhead but I may be missing something here.


> [...] and any user of the parser will have to do quite of a data 
> mining to find the relevant feature. One approach would be to add more 
> Bio::Annotation:: objects (for example Bio::Annotation::STS, 
> Bio::Annotation::GRIF, etc).

Possibly. Bio::Annotation objects was in fact what I was primarily 
referring to when I spoke about annotation.

> We may decide to create a simplified (Bio::Seq, no relationships) or 
> more complex object (Gene), based on the user request.

Just as an aside, I guess you know that there is a Gene object already, 
but it's feature based.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Sat Mar 12 21:43:13 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Mar 12 21:37:48 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <Pine.LNX.4.58.0503111126240.31639@sumo.ctrl.ucla.edu>
Message-ID: <A541F5CE-9369-11D9-B647-000A959EB4C4@gmx.net>

My first response to this was a long rant about almost every single one 
of your statements and which may have been mildly entertaining for 
people while the TV is on commercial. In the end I calmed down and 
thought people have probably better things to do than reading my rants 
(should I start a bioperl blog?), so here is the same in a gist and 
without (most of) the rant.

	- In my opinion the annotation system is core, like everything is by 
definition that attaches to a Bio::SeqI.

	- I'm not ever going to turn away people who took to the code to fill 
gaps or ambiguities in the documentation - API assumptions based on 
what the code did for years count as a binding contract just as 
expressly written contracts do.

	- I am strongly opposed to the notion that your customers should to 
the testing for your wild innovations as opposed to yourself doing that 
in advance, regardless of how fast or slow you respond to bug reports; 
people have better things to do than ironing out your revolution.

	- I *am* going to back out the changes from the main trunk; 
traditionally, in bioperl the main trunk has *not* been used for wild 
experiments the repercussions of which were not really clear - instead 
people opened their branches for that.

Allen feel free to reintroduce your changes and overloads and all kinds 
of crazy stuff on a branch that you open. We need the main trunk free 
of debris as the road to the next releases to come. Feel free to wreck 
the train elsewhere. People need the bugfixes now and Lincoln's 
additions that aren't in 1.4.x.

Of course, this being a community project, everybody who disagrees 
please feel free to speak up and if people want to stop me I'll be more 
than glad to step down - but then be prepared to step up yourself and 
take care of the mess.

	-hilmar
	
On Friday, March 11, 2005, at 11:36  AM, Allen Day wrote:

> On Fri, 11 Mar 2005, Hilmar Lapp wrote:
>
>> I suggest that all the fancy overloading is removed from core bioperl
>> modules. If we need overloading for stringification or comparison
>> operators in one or our core modules I think we are making a mistake.
>
> The overloading is only there because assumptions have been made that
> annotations will be strings.  This assumption was okay previously 
> becasue
> the Bio::Annotation* modules were previously "non core" -- there was no
> unified annotation system in bioperl.  Now these modules are being made
> core, and this is part of the growing pain.
>
> I'm doing what I can to address the bug reports related to these 
> changes
> as they come in, and I don't think anyone will disagree that I'm doing 
> so
> in a timely manner.  However, I cannot fix bugs or field questions on
> biosql modules and would appreciate some cooperation/assistance from 
> the
> biosql developers.
>
>> This is part of the huge mess introduced when the SeqFeatureI
>> architecture was carelessly changed days before release. It's a
>> prototypical example for what not to do in a project that's as widely
>> used as bioperl.
>
> The SeqFeatureI changes were being gradually made in the 1-2 months 
> prior
> to the 1.5 release.  The release was, may I remind you, a *developer*
> release and not expected to be bug free.
>
>> *Every single bit* of those changes need to be rolled back from the
>> release and if nobody else has done it by then I will do so in two
>> weeks.
>
> Fine for the 1.5.1 branch, although I don't agree that this should be 
> done
> on the main trunk.
>
> -Allen
>
>
>> 	-hilmar
>>
>> On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:
>>
>>> I'm unable to test the code in PersistentObject.pm as I don't have
>>> biosql
>>> set up, but you might try adding this to Reference.pm
>>>
>>>   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
>>>
>>> Please let me know if this fixes your error and I'll add this 'ne'
>>> overload to all the Bio::Annotation::* classes on HEAD.
>>>
>>> -Allen
>>>
>>>
>>> On Wed, 9 Mar 2005, Daniel Lang wrote:
>>>
>>>> Hi,
>>>> I?m retrieving seq objects from a local biosql db (using the latest
>>>> cvs
>>>> verion of bioperl-db) and e.g. writing them with SeqIO. After 
>>>> changing
>>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the
>>>> following error:
>>>>
>>>> Operation `ne': no method found,!!left argument in overloaded 
>>>> package
>>>> Bio::Annotation::Reference,!!right argument has no overloaded magic 
>>>> at
>>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm
>>>> line 534, <GEN1> line 1.!
>>>>
>>>> The module PersistentObject.pm hasn?t changed and in Reference.pm
>>>> there
>>>> is only this change:
>>>>
>>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
>>>> bioperl-live/Bio/Annotation/Reference.pm
>>>> 1c1
>>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
>>>> ---
>>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
>>>> 56c56,57
>>>> < # use overload '""' => \&as_text;
>>>> ---
>>>>> use overload '""' => sub { $_[0]->title || ''};
>>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
>>>>
>>>> I?ve reversed this, but no positive result - the error remains...
>>>> Any hints?
>>>>
>>>> Thanks in advance,
>>>> Daniel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From brian_osborne at cognia.com  Sat Mar 12 22:01:50 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Sat Mar 12 21:56:24 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <A541F5CE-9369-11D9-B647-000A959EB4C4@gmx.net>
Message-ID: <GPENLDEIJJHJLHOAJBBPGEMBCCAA.brian_osborne@cognia.com>

Hilmar,

If I'm not mistaken this proposal to back out these changes was made
previously, and not by you. There were no objections to this proposal at
that time.

Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp
Sent: Saturday, March 12, 2005 9:43 PM
To: Allen Day
Cc: Daniel Lang; BioPerl-List; OBDA BioSQL
Subject: Re: [Bioperl-l] strange error after changing to RC1.5


My first response to this was a long rant about almost every single one
of your statements and which may have been mildly entertaining for
people while the TV is on commercial. In the end I calmed down and
thought people have probably better things to do than reading my rants
(should I start a bioperl blog?), so here is the same in a gist and
without (most of) the rant.

	- In my opinion the annotation system is core, like everything is by
definition that attaches to a Bio::SeqI.

	- I'm not ever going to turn away people who took to the code to fill
gaps or ambiguities in the documentation - API assumptions based on
what the code did for years count as a binding contract just as
expressly written contracts do.

	- I am strongly opposed to the notion that your customers should to
the testing for your wild innovations as opposed to yourself doing that
in advance, regardless of how fast or slow you respond to bug reports;
people have better things to do than ironing out your revolution.

	- I *am* going to back out the changes from the main trunk;
traditionally, in bioperl the main trunk has *not* been used for wild
experiments the repercussions of which were not really clear - instead
people opened their branches for that.

Allen feel free to reintroduce your changes and overloads and all kinds
of crazy stuff on a branch that you open. We need the main trunk free
of debris as the road to the next releases to come. Feel free to wreck
the train elsewhere. People need the bugfixes now and Lincoln's
additions that aren't in 1.4.x.

Of course, this being a community project, everybody who disagrees
please feel free to speak up and if people want to stop me I'll be more
than glad to step down - but then be prepared to step up yourself and
take care of the mess.

	-hilmar

On Friday, March 11, 2005, at 11:36  AM, Allen Day wrote:

> On Fri, 11 Mar 2005, Hilmar Lapp wrote:
>
>> I suggest that all the fancy overloading is removed from core bioperl
>> modules. If we need overloading for stringification or comparison
>> operators in one or our core modules I think we are making a mistake.
>
> The overloading is only there because assumptions have been made that
> annotations will be strings.  This assumption was okay previously
> becasue
> the Bio::Annotation* modules were previously "non core" -- there was no
> unified annotation system in bioperl.  Now these modules are being made
> core, and this is part of the growing pain.
>
> I'm doing what I can to address the bug reports related to these
> changes
> as they come in, and I don't think anyone will disagree that I'm doing
> so
> in a timely manner.  However, I cannot fix bugs or field questions on
> biosql modules and would appreciate some cooperation/assistance from
> the
> biosql developers.
>
>> This is part of the huge mess introduced when the SeqFeatureI
>> architecture was carelessly changed days before release. It's a
>> prototypical example for what not to do in a project that's as widely
>> used as bioperl.
>
> The SeqFeatureI changes were being gradually made in the 1-2 months
> prior
> to the 1.5 release.  The release was, may I remind you, a *developer*
> release and not expected to be bug free.
>
>> *Every single bit* of those changes need to be rolled back from the
>> release and if nobody else has done it by then I will do so in two
>> weeks.
>
> Fine for the 1.5.1 branch, although I don't agree that this should be
> done
> on the main trunk.
>
> -Allen
>
>
>> 	-hilmar
>>
>> On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:
>>
>>> I'm unable to test the code in PersistentObject.pm as I don't have
>>> biosql
>>> set up, but you might try adding this to Reference.pm
>>>
>>>   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
>>>
>>> Please let me know if this fixes your error and I'll add this 'ne'
>>> overload to all the Bio::Annotation::* classes on HEAD.
>>>
>>> -Allen
>>>
>>>
>>> On Wed, 9 Mar 2005, Daniel Lang wrote:
>>>
>>>> Hi,
>>>> I?m retrieving seq objects from a local biosql db (using the latest
>>>> cvs
>>>> verion of bioperl-db) and e.g. writing them with SeqIO. After
>>>> changing
>>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the
>>>> following error:
>>>>
>>>> Operation `ne': no method found,!!left argument in overloaded
>>>> package
>>>> Bio::Annotation::Reference,!!right argument has no overloaded magic
>>>> at
>>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm
>>>> line 534, <GEN1> line 1.!
>>>>
>>>> The module PersistentObject.pm hasn?t changed and in Reference.pm
>>>> there
>>>> is only this change:
>>>>
>>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
>>>> bioperl-live/Bio/Annotation/Reference.pm
>>>> 1c1
>>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
>>>> ---
>>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
>>>> 56c56,57
>>>> < # use overload '""' => \&as_text;
>>>> ---
>>>>> use overload '""' => sub { $_[0]->title || ''};
>>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
>>>>
>>>> I?ve reversed this, but no positive result - the error remains...
>>>> Any hints?
>>>>
>>>> Thanks in advance,
>>>> Daniel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>
>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From hlapp at gmx.net  Sat Mar 12 22:21:32 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Mar 12 22:17:57 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <GPENLDEIJJHJLHOAJBBPGEMBCCAA.brian_osborne@cognia.com>
Message-ID: <FF8197C2-936E-11D9-B647-000A959EB4C4@gmx.net>

In my recollection the previous proposal was to back them out of the  
branch to be created for v1.5.1., whereas they were to remain in the  
main trunk and thereby implicitly accepted for and included in all  
future development of bioperl.

My proposal is that the basis for 1.5.1. is the main trunk and so they  
need to be backed out of the main trunk, and whoever (i.e., Allen) is  
in favor of those changes first prove their viability on a branch  
before bothering anybody else with them again.

I do feel that there are no objections to my proposal other than from  
Allen, but I may be missing something or someone may not have spoken up  
yet.

	-hilmar

On Saturday, March 12, 2005, at 07:01  PM, Brian Osborne wrote:

> Hilmar,
>
> If I'm not mistaken this proposal to back out these changes was made
> previously, and not by you. There were no objections to this proposal  
> at
> that time.
>
> Brian O.
>
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Hilmar Lapp
> Sent: Saturday, March 12, 2005 9:43 PM
> To: Allen Day
> Cc: Daniel Lang; BioPerl-List; OBDA BioSQL
> Subject: Re: [Bioperl-l] strange error after changing to RC1.5
>
>
> My first response to this was a long rant about almost every single one
> of your statements and which may have been mildly entertaining for
> people while the TV is on commercial. In the end I calmed down and
> thought people have probably better things to do than reading my rants
> (should I start a bioperl blog?), so here is the same in a gist and
> without (most of) the rant.
>
> 	- In my opinion the annotation system is core, like everything is by
> definition that attaches to a Bio::SeqI.
>
> 	- I'm not ever going to turn away people who took to the code to fill
> gaps or ambiguities in the documentation - API assumptions based on
> what the code did for years count as a binding contract just as
> expressly written contracts do.
>
> 	- I am strongly opposed to the notion that your customers should to
> the testing for your wild innovations as opposed to yourself doing that
> in advance, regardless of how fast or slow you respond to bug reports;
> people have better things to do than ironing out your revolution.
>
> 	- I *am* going to back out the changes from the main trunk;
> traditionally, in bioperl the main trunk has *not* been used for wild
> experiments the repercussions of which were not really clear - instead
> people opened their branches for that.
>
> Allen feel free to reintroduce your changes and overloads and all kinds
> of crazy stuff on a branch that you open. We need the main trunk free
> of debris as the road to the next releases to come. Feel free to wreck
> the train elsewhere. People need the bugfixes now and Lincoln's
> additions that aren't in 1.4.x.
>
> Of course, this being a community project, everybody who disagrees
> please feel free to speak up and if people want to stop me I'll be more
> than glad to step down - but then be prepared to step up yourself and
> take care of the mess.
>
> 	-hilmar
>
> On Friday, March 11, 2005, at 11:36  AM, Allen Day wrote:
>
>> On Fri, 11 Mar 2005, Hilmar Lapp wrote:
>>
>>> I suggest that all the fancy overloading is removed from core bioperl
>>> modules. If we need overloading for stringification or comparison
>>> operators in one or our core modules I think we are making a mistake.
>>
>> The overloading is only there because assumptions have been made that
>> annotations will be strings.  This assumption was okay previously
>> becasue
>> the Bio::Annotation* modules were previously "non core" -- there was  
>> no
>> unified annotation system in bioperl.  Now these modules are being  
>> made
>> core, and this is part of the growing pain.
>>
>> I'm doing what I can to address the bug reports related to these
>> changes
>> as they come in, and I don't think anyone will disagree that I'm doing
>> so
>> in a timely manner.  However, I cannot fix bugs or field questions on
>> biosql modules and would appreciate some cooperation/assistance from
>> the
>> biosql developers.
>>
>>> This is part of the huge mess introduced when the SeqFeatureI
>>> architecture was carelessly changed days before release. It's a
>>> prototypical example for what not to do in a project that's as widely
>>> used as bioperl.
>>
>> The SeqFeatureI changes were being gradually made in the 1-2 months
>> prior
>> to the 1.5 release.  The release was, may I remind you, a *developer*
>> release and not expected to be bug free.
>>
>>> *Every single bit* of those changes need to be rolled back from the
>>> release and if nobody else has done it by then I will do so in two
>>> weeks.
>>
>> Fine for the 1.5.1 branch, although I don't agree that this should be
>> done
>> on the main trunk.
>>
>> -Allen
>>
>>
>>> 	-hilmar
>>>
>>> On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:
>>>
>>>> I'm unable to test the code in PersistentObject.pm as I don't have
>>>> biosql
>>>> set up, but you might try adding this to Reference.pm
>>>>
>>>>   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
>>>>
>>>> Please let me know if this fixes your error and I'll add this 'ne'
>>>> overload to all the Bio::Annotation::* classes on HEAD.
>>>>
>>>> -Allen
>>>>
>>>>
>>>> On Wed, 9 Mar 2005, Daniel Lang wrote:
>>>>
>>>>> Hi,
>>>>> I?m retrieving seq objects from a local biosql db (using the latest
>>>>> cvs
>>>>> verion of bioperl-db) and e.g. writing them with SeqIO. After
>>>>> changing
>>>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get  
>>>>> the
>>>>> following error:
>>>>>
>>>>> Operation `ne': no method found,!!left argument in overloaded
>>>>> package
>>>>> Bio::Annotation::Reference,!!right argument has no overloaded magic
>>>>> at
>>>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/ 
>>>>> PersistentObject.pm
>>>>> line 534, <GEN1> line 1.!
>>>>>
>>>>> The module PersistentObject.pm hasn?t changed and in Reference.pm
>>>>> there
>>>>> is only this change:
>>>>>
>>>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
>>>>> bioperl-live/Bio/Annotation/Reference.pm
>>>>> 1c1
>>>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
>>>>> ---
>>>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
>>>>> 56c56,57
>>>>> < # use overload '""' => \&as_text;
>>>>> ---
>>>>>> use overload '""' => sub { $_[0]->title || ''};
>>>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
>>>>>
>>>>> I?ve reversed this, but no positive result - the error remains...
>>>>> Any hints?
>>>>>
>>>>> Thanks in advance,
>>>>> Daniel
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>
>>
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From Mingyi.Liu at gpc-biotech.com  Sat Mar 12 23:12:37 2005
From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi)
Date: Sat Mar 12 23:08:14 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2F@sw-wal-beta.gpc-biotech.com>

> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net]
> Sent: Saturday, March 12, 2005 7:33 PM
> To: Liu, Mingyi
> Cc: Stefan Kirov; bioperl-l@portal.open-bio.org
> Subject: Re: [Bioperl-l] Entrez Gene ASN parsers
> 
> 
> I kind of like this approach, i.e., have a general purpose low-level 
> parser that you have reasonable confidence in will never be the 
> bottleneck, and then build a bioperl parser on top of it that now can 
> focus its code on assembling the desired data structure as opposed to 
> the file format itself.
>

That was my intention too.  I saw plenty of requests that NCBI release Entrez Gene in XML format. But suppose that NCBI did release XML-formatted Entrez Gene files, then to build bioperl objects from the XML files one could take several approaches: 
1. write a module that directly deals with (parses) the XML tags and code everything including object instantiations along with parsing code.  

Or, more likely, 
2. write a module that utilizes the service of an XML parser, let it do its work and make a data structure, then create all objects using that data structure.  This way there's a clear code separation, and one only needs to worry about the data, not the parsing.

My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does to a yet-to-exist XML-formatted EntrezGene file (or better than it, if NCBI decides to code Entrez Gene in the XML format that Eutils provide).  And it performs better than XML parsers.  So I really don't think there's any need for XML file from NCBI.

> And if course assembling that data structure will slow things down a 
> lot but hey, either you want an object hierarchy in (bio-)perl or you 
> don't.

I also agree that using external parser users could choose what they like: (bio)perl objects containing the Entrez Gene data, or just directly use the data structure to pick and choose data.  More flexible for both developers and users.

Just my two cents.

Mingyi

From hlapp at gmx.net  Sat Mar 12 23:54:14 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Mar 12 23:48:43 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC2F@sw-wal-beta.gpc-biotech.com>
Message-ID: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>


On Saturday, March 12, 2005, at 08:12  PM, Liu, Mingyi wrote:

>
> My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does 
> to a yet-to-exist XML-formatted EntrezGene file (or better than it, if 
> NCBI decides to code Entrez Gene in the XML format that Eutils 
> provide).

This is apparently what they will be doing, or at least my 
understanding of it. The discomforting thing is that it's taken them so 
long already to come up with that supposedly little tool. In fact, 
apparently the fact they weren't able to provide the off-line tool yet 
is the reason that they're still maintaining the LocusLink download. 
That's what they told me in a response to an inquiry. Although from 
Monday on they'll remove C.elegans and fruitfly from LL_tmpl. Not good.

> And it performs better than XML parsers.

Actually, even an expat-based XML parser would be by orders of 
magnitude slower than your regexp-based.

The question is how safe are your regexps from possibly unexpected 
things like escaped quotes or an escaped curly brace that's part of a 
string and not end of an entity etc or whatever might confuse your 
regexps.

Maybe in ASN.1 this isn't a big deal? I just have too little knowledge 
about ASN.1 to make any judgment here.

>
> So I really don't think there's any need for XML file from NCBI.

Yeah, I actually started to change my mind w.r.t. waiting for the XML 
format.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From Mingyi.Liu at gpc-biotech.com  Sun Mar 13 00:17:59 2005
From: Mingyi.Liu at gpc-biotech.com (Liu, Mingyi)
Date: Sun Mar 13 00:14:11 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
Message-ID: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC30@sw-wal-beta.gpc-biotech.com>

> > My parser does to NCBI's ASN.1 EntrezGene file what an XML 
> parser does 
> > to a yet-to-exist XML-formatted EntrezGene file (or better 
> than it, if 
> > NCBI decides to code Entrez Gene in the XML format that Eutils 
> > provide).
> 
> This is apparently what they will be doing, or at least my 
> understanding of it. 

That's logical, but not good.  I really don't like the XML format Eutils provided.  In fact, I heard few people did.

> The question is how safe are your regexps from possibly unexpected 
> things like escaped quotes or an escaped curly brace that's part of a 
> string and not end of an entity etc or whatever might confuse your 
> regexps.

It's not a problem. In my parsers these situations are dealt with already.  So far, nothing in the latest human, mouse, rat breaks the parser.  I didn't test on other genomes, but they should work fine.

BTW, an unrelated question: Do you know why is it that my reply mails always started new threads in Bioperl-l mailing list archive, whereas others' (like yours) form a nice thread?  

Thanks

Mingyi

From hlapp at gmx.net  Sun Mar 13 00:49:14 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun Mar 13 00:43:43 2005
Subject: [Bioperl-l] Entrez Gene ASN parsers
In-Reply-To: <15C0817A76D1B74C8E3EEA0FADE464A4CBAC30@sw-wal-beta.gpc-biotech.com>
Message-ID: <A1D1A2DC-9383-11D9-B647-000A959EB4C4@gmx.net>


On Saturday, March 12, 2005, at 09:17  PM, Liu, Mingyi wrote:
>
> BTW, an unrelated question: Do you know why is it that my reply mails 
> always started new threads in Bioperl-l mailing list archive, whereas 
> others' (like yours) form a nice thread?

No idea. I have no idea by what kind of header mailman opens a new 
thread or recognizes an existing. However, I notice the following 
differences between your and my email headers that might pertain to 
threads:

Only in your reply:

Thread-Topic: Entrez Gene ASN parsers
Thread-Index: AcUnGiosN/cq6k3WQYeV1nHcNmaGJg==

Only in my reply:

In-Reply-To: 
<15C0817A76D1B74C8E3EEA0FADE464A4CBAC2F@sw-wal-beta.gpc-biotech.com>

The latter is precisely the message ID of your email that I replied to:

Message-ID: 
<15C0817A76D1B74C8E3EEA0FADE464A4CBAC2D@sw-wal-beta.gpc-biotech.com>

	-hilmar

>
> Thanks
>
> Mingyi
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From cain at cshl.edu  Sun Mar 13 04:11:17 2005
From: cain at cshl.edu (Scott Cain)
Date: Sun Mar 13 04:07:04 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <A541F5CE-9369-11D9-B647-000A959EB4C4@gmx.net>
Message-ID: <Pine.GSO.4.05.10503130407020.19110-100000@phage.cshl.edu>

Hilmar,

I sympathize with your frustration, but backing out those changes will
break several tools that I've written for chado/gmod.  This is an
area of active development, and users a frequently advised to update
to bioperl-live.  Since there are users of that as well, how do we decide
who gets to feel the pain?

Scott

----------------------------------------------------------------------
Scott Cain, Ph. D.				 	 cain@cshl.org
GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
----------------------------------------------------------------------


On Sat, 12 Mar 2005, Hilmar Lapp wrote:

> My first response to this was a long rant about almost every single one 
> of your statements and which may have been mildly entertaining for 
> people while the TV is on commercial. In the end I calmed down and 
> thought people have probably better things to do than reading my rants 
> (should I start a bioperl blog?), so here is the same in a gist and 
> without (most of) the rant.
> 
> 	- In my opinion the annotation system is core, like everything is by 
> definition that attaches to a Bio::SeqI.
> 
> 	- I'm not ever going to turn away people who took to the code to fill 
> gaps or ambiguities in the documentation - API assumptions based on 
> what the code did for years count as a binding contract just as 
> expressly written contracts do.
> 
> 	- I am strongly opposed to the notion that your customers should to 
> the testing for your wild innovations as opposed to yourself doing that 
> in advance, regardless of how fast or slow you respond to bug reports; 
> people have better things to do than ironing out your revolution.
> 
> 	- I *am* going to back out the changes from the main trunk; 
> traditionally, in bioperl the main trunk has *not* been used for wild 
> experiments the repercussions of which were not really clear - instead 
> people opened their branches for that.
> 
> Allen feel free to reintroduce your changes and overloads and all kinds 
> of crazy stuff on a branch that you open. We need the main trunk free 
> of debris as the road to the next releases to come. Feel free to wreck 
> the train elsewhere. People need the bugfixes now and Lincoln's 
> additions that aren't in 1.4.x.
> 
> Of course, this being a community project, everybody who disagrees 
> please feel free to speak up and if people want to stop me I'll be more 
> than glad to step down - but then be prepared to step up yourself and 
> take care of the mess.
> 
> 	-hilmar
> 	
> On Friday, March 11, 2005, at 11:36  AM, Allen Day wrote:
> 
> > On Fri, 11 Mar 2005, Hilmar Lapp wrote:
> >
> >> I suggest that all the fancy overloading is removed from core bioperl
> >> modules. If we need overloading for stringification or comparison
> >> operators in one or our core modules I think we are making a mistake.
> >
> > The overloading is only there because assumptions have been made that
> > annotations will be strings.  This assumption was okay previously 
> > becasue
> > the Bio::Annotation* modules were previously "non core" -- there was no
> > unified annotation system in bioperl.  Now these modules are being made
> > core, and this is part of the growing pain.
> >
> > I'm doing what I can to address the bug reports related to these 
> > changes
> > as they come in, and I don't think anyone will disagree that I'm doing 
> > so
> > in a timely manner.  However, I cannot fix bugs or field questions on
> > biosql modules and would appreciate some cooperation/assistance from 
> > the
> > biosql developers.
> >
> >> This is part of the huge mess introduced when the SeqFeatureI
> >> architecture was carelessly changed days before release. It's a
> >> prototypical example for what not to do in a project that's as widely
> >> used as bioperl.
> >
> > The SeqFeatureI changes were being gradually made in the 1-2 months 
> > prior
> > to the 1.5 release.  The release was, may I remind you, a *developer*
> > release and not expected to be bug free.
> >
> >> *Every single bit* of those changes need to be rolled back from the
> >> release and if nobody else has done it by then I will do so in two
> >> weeks.
> >
> > Fine for the 1.5.1 branch, although I don't agree that this should be 
> > done
> > on the main trunk.
> >
> > -Allen
> >
> >
> >> 	-hilmar
> >>
> >> On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:
> >>
> >>> I'm unable to test the code in PersistentObject.pm as I don't have
> >>> biosql
> >>> set up, but you might try adding this to Reference.pm
> >>>
> >>>   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
> >>>
> >>> Please let me know if this fixes your error and I'll add this 'ne'
> >>> overload to all the Bio::Annotation::* classes on HEAD.
> >>>
> >>> -Allen
> >>>
> >>>
> >>> On Wed, 9 Mar 2005, Daniel Lang wrote:
> >>>
> >>>> Hi,
> >>>> I?m retrieving seq objects from a local biosql db (using the latest
> >>>> cvs
> >>>> verion of bioperl-db) and e.g. writing them with SeqIO. After 
> >>>> changing
> >>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get the
> >>>> following error:
> >>>>
> >>>> Operation `ne': no method found,!!left argument in overloaded 
> >>>> package
> >>>> Bio::Annotation::Reference,!!right argument has no overloaded magic 
> >>>> at
> >>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm
> >>>> line 534, <GEN1> line 1.!
> >>>>
> >>>> The module PersistentObject.pm hasn?t changed and in Reference.pm
> >>>> there
> >>>> is only this change:
> >>>>
> >>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
> >>>> bioperl-live/Bio/Annotation/Reference.pm
> >>>> 1c1
> >>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
> >>>> ---
> >>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
> >>>> 56c56,57
> >>>> < # use overload '""' => \&as_text;
> >>>> ---
> >>>>> use overload '""' => sub { $_[0]->title || ''};
> >>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
> >>>>
> >>>> I?ve reversed this, but no positive result - the error remains...
> >>>> Any hints?
> >>>>
> >>>> Thanks in advance,
> >>>> Daniel
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l@portal.open-bio.org
> >>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 


From hlapp at gmx.net  Sun Mar 13 05:06:46 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun Mar 13 05:02:01 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <Pine.GSO.4.05.10503130407020.19110-100000@phage.cshl.edu>
Message-ID: <9BD28A60-93A7-11D9-9542-000A959EB4C4@gmx.net>

Can you formulate as a test case how you use the API such it depends on  
those changes that I may be targeting?

Or are these test cases in a gmod package that can be set up reasonably  
simple?

Are you sure that what you depend on are the changes to the  
Bio::SeqFeatureI and Bio::Annotation::* modules? If it's only  
SeqFeature::Annotated, my plan was to leave that as much intact as  
possible.

	-hilmar

On Sunday, March 13, 2005, at 01:11  AM, Scott Cain wrote:

> Hilmar,
>
> I sympathize with your frustration, but backing out those changes will
> break several tools that I've written for chado/gmod.  This is an
> area of active development, and users a frequently advised to update
> to bioperl-live.  Since there are users of that as well, how do we  
> decide
> who gets to feel the pain?
>
> Scott
>
> ----------------------------------------------------------------------
> Scott Cain, Ph. D.				 	 cain@cshl.org
> GMOD Coordinator, http://www.gmod.org/			 (216)392-3087
> ----------------------------------------------------------------------
>
>
> On Sat, 12 Mar 2005, Hilmar Lapp wrote:
>
>> My first response to this was a long rant about almost every single  
>> one
>> of your statements and which may have been mildly entertaining for
>> people while the TV is on commercial. In the end I calmed down and
>> thought people have probably better things to do than reading my rants
>> (should I start a bioperl blog?), so here is the same in a gist and
>> without (most of) the rant.
>>
>> 	- In my opinion the annotation system is core, like everything is by
>> definition that attaches to a Bio::SeqI.
>>
>> 	- I'm not ever going to turn away people who took to the code to fill
>> gaps or ambiguities in the documentation - API assumptions based on
>> what the code did for years count as a binding contract just as
>> expressly written contracts do.
>>
>> 	- I am strongly opposed to the notion that your customers should to
>> the testing for your wild innovations as opposed to yourself doing  
>> that
>> in advance, regardless of how fast or slow you respond to bug reports;
>> people have better things to do than ironing out your revolution.
>>
>> 	- I *am* going to back out the changes from the main trunk;
>> traditionally, in bioperl the main trunk has *not* been used for wild
>> experiments the repercussions of which were not really clear - instead
>> people opened their branches for that.
>>
>> Allen feel free to reintroduce your changes and overloads and all  
>> kinds
>> of crazy stuff on a branch that you open. We need the main trunk free
>> of debris as the road to the next releases to come. Feel free to wreck
>> the train elsewhere. People need the bugfixes now and Lincoln's
>> additions that aren't in 1.4.x.
>>
>> Of course, this being a community project, everybody who disagrees
>> please feel free to speak up and if people want to stop me I'll be  
>> more
>> than glad to step down - but then be prepared to step up yourself and
>> take care of the mess.
>>
>> 	-hilmar
>> 	
>> On Friday, March 11, 2005, at 11:36  AM, Allen Day wrote:
>>
>>> On Fri, 11 Mar 2005, Hilmar Lapp wrote:
>>>
>>>> I suggest that all the fancy overloading is removed from core  
>>>> bioperl
>>>> modules. If we need overloading for stringification or comparison
>>>> operators in one or our core modules I think we are making a  
>>>> mistake.
>>>
>>> The overloading is only there because assumptions have been made that
>>> annotations will be strings.  This assumption was okay previously
>>> becasue
>>> the Bio::Annotation* modules were previously "non core" -- there was  
>>> no
>>> unified annotation system in bioperl.  Now these modules are being  
>>> made
>>> core, and this is part of the growing pain.
>>>
>>> I'm doing what I can to address the bug reports related to these
>>> changes
>>> as they come in, and I don't think anyone will disagree that I'm  
>>> doing
>>> so
>>> in a timely manner.  However, I cannot fix bugs or field questions on
>>> biosql modules and would appreciate some cooperation/assistance from
>>> the
>>> biosql developers.
>>>
>>>> This is part of the huge mess introduced when the SeqFeatureI
>>>> architecture was carelessly changed days before release. It's a
>>>> prototypical example for what not to do in a project that's as  
>>>> widely
>>>> used as bioperl.
>>>
>>> The SeqFeatureI changes were being gradually made in the 1-2 months
>>> prior
>>> to the 1.5 release.  The release was, may I remind you, a *developer*
>>> release and not expected to be bug free.
>>>
>>>> *Every single bit* of those changes need to be rolled back from the
>>>> release and if nobody else has done it by then I will do so in two
>>>> weeks.
>>>
>>> Fine for the 1.5.1 branch, although I don't agree that this should be
>>> done
>>> on the main trunk.
>>>
>>> -Allen
>>>
>>>
>>>> 	-hilmar
>>>>
>>>> On Thursday, March 10, 2005, at 05:57  PM, Allen Day wrote:
>>>>
>>>>> I'm unable to test the code in PersistentObject.pm as I don't have
>>>>> biosql
>>>>> set up, but you might try adding this to Reference.pm
>>>>>
>>>>>   use overload 'ne' => sub { "$_[0]" ne "$_[1]" }
>>>>>
>>>>> Please let me know if this fixes your error and I'll add this 'ne'
>>>>> overload to all the Bio::Annotation::* classes on HEAD.
>>>>>
>>>>> -Allen
>>>>>
>>>>>
>>>>> On Wed, 9 Mar 2005, Daniel Lang wrote:
>>>>>
>>>>>> Hi,
>>>>>> I?m retrieving seq objects from a local biosql db (using the  
>>>>>> latest
>>>>>> cvs
>>>>>> verion of bioperl-db) and e.g. writing them with SeqIO. After
>>>>>> changing
>>>>>> from a cvs version ~ 12/04 to RC1.5 or latest cvs version, I get  
>>>>>> the
>>>>>> following error:
>>>>>>
>>>>>> Operation `ne': no method found,!!left argument in overloaded
>>>>>> package
>>>>>> Bio::Annotation::Reference,!!right argument has no overloaded  
>>>>>> magic
>>>>>> at
>>>>>> /usr/lib/perl5/site_perl/5.6.1/Bio/DB/Persistent/ 
>>>>>> PersistentObject.pm
>>>>>> line 534, <GEN1> line 1.!
>>>>>>
>>>>>> The module PersistentObject.pm hasn?t changed and in Reference.pm
>>>>>> there
>>>>>> is only this change:
>>>>>>
>>>>>> diff bioperl-live-Dec04/Bio/Annotation/Reference.pm
>>>>>> bioperl-live/Bio/Annotation/Reference.pm
>>>>>> 1c1
>>>>>> < # $Id: Reference.pm,v 1.21 2004/08/19 20:13:32 lapp Exp $
>>>>>> ---
>>>>>>> # $Id: Reference.pm,v 1.22 2005/02/02 22:13:22 allenday Exp $
>>>>>> 56c56,57
>>>>>> < # use overload '""' => \&as_text;
>>>>>> ---
>>>>>>> use overload '""' => sub { $_[0]->title || ''};
>>>>>>> use overload 'eq' => sub { "$_[0]" eq "$_[1]" };
>>>>>>
>>>>>> I?ve reversed this, but no positive result - the error remains...
>>>>>> Any hints?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Daniel
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l@portal.open-bio.org
>>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From mail2doreen at gmx.de  Sun Mar 13 06:54:49 2005
From: mail2doreen at gmx.de (mail2doreen@gmx.de)
Date: Sun Mar 13 06:50:13 2005
Subject: [Bioperl-l] gff2genbank
Message-ID: <29912.1110714889@www58.gmx.net>

Hello,

i convert a gff file into genbank format using 

Bio::Tools::GFF
Bio::SeqIO

but the cds data in the output are not joined.

CDS             complement(17262..17813)
                     /gene_id="899.t00001"
                     /transcript_id="899.m00029"
CDS             complement(17879..18174)
                     /gene_id="899.t00001"
                     /transcript_id="899.m00029"
This is what i need:

CDS             complement(join(17262..17813,17879..18174)).
How can i solve this problem?

Greetings


-- 
SMS bei wichtigen e-mails und Ihre Gedanken sind frei ...
Alle Infos zur SMS-Benachrichtigung: http://www.gmx.net/de/go/sms
From mingyi.liu at gpc-biotech.com  Sun Mar 13 10:03:41 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Sun Mar 13 09:59:43 2005
Subject: [Bioperl-l] 
	Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++
In-Reply-To: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
Message-ID: <4234564D.7010906@gpc-biotech.com>

I forgot to mention another advantage of having a purely regex based 
small parser means very easy porting into any language that supports 
perl styled regular expressions, like Java, Python, PHP, C++ with PCRE 
(used by php and python). 

There could potentially be performance hit to any perl parsers ported 
into those languages.  Mainly because AFAIK there is a lack of full 
support for all the modifiers for Perl regex, so unless I missed 
something, we'd have to either code some modifier logic in the program 
or use string replacement.  Nevertheless, the smaller the parser, the 
less pain in porting.

Just some more cents (and advocation) :)

From dalke at dalkescientific.com  Sun Mar 13 14:07:55 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun Mar 13 14:03:01 2005
Subject: [Bioperl-l]  Porting Entrez Gene parser to Biojava, Biopython,
	Biophp, even C++
In-Reply-To: <4234564D.7010906@gpc-biotech.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
Message-ID: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>

Mingyi Liu wrote:
> I forgot to mention another advantage of having a purely regex based 
> small parser means very easy porting into any language that supports 
> perl styled regular expressions, like Java, Python, PHP, C++ with PCRE 
> (used by php and python).

I developed Martel ( http://www.dalkescientific.com/Martel/ ) to
do just this sort of thing - describe a typical bioinformatics
file format as a set of declarations instead of as a set of code.

It works but turns out to be hard to maintain.  Here's a
list of problems I came up with

   - regexps are hard to write and debug
       Could be improved with some sort of development/
       testing environment

   - Martel's grammars are hard to edit
       When a grammar changes it's not possible to say "the
       new format is the old format but change this one
       bottom level node".  I'm actually considering
       switching over to a DOM-style description of the
       tree so I can use XSLT as the editing language.
       Except that I think XSLT's grammar is clumsy and ugly.

   - Martel needs everything in memory
       I implemented a hack to parse a record at a time but
       it's a hack and fails (except on large memory machines)
       for people who want to read a chromosome at a time.
       I would also like it to be feed based instead of
       pull based.

I found that normal regular expressions weren't quite powerful
enough to handle the format so needed to implement a new
feature for some file formats which include a count of the
N, the number of records followed by N repeats of those counts.

When I wrote my grammars I did so in strict mode, and reported
a bunch of errors to the database providers.  The advantage
is that wrong formats aren't accidently parsed.  The disadvantage
is that minor changes break the parser.

I don't see any solution to this other than having someone
track the file formats over time.

> There could potentially be performance hit to any perl parsers
> ported into those languages.  Mainly because AFAIK there is a
> lack of full support for all the modifiers for Perl regex, so
> unless I missed something, we'd have to either code some modifier
> logic in the program or use string replacement.

I looked at the regexps.  The ones that Python doesn't
support are \G and the compilation flags /cg .  They won't
be in Python because the start/end positions are available
as local variables and not as implicit globals.  It
uses a different stylism.

Years ago I did some timing tests for parsing SWISS-PROT
records using a large number of parsers (~20).  I found
a wide range of timings, from 1 minute to 40 minutes.
The diversity is because there are many different types
of things that might be done with a file.  If the task
is simple ("how many record are in this file?") then a
simple parser is all that's needed.

http://biopython.org/pipermail/biopython/2001-January/000472.html
http://biopython.org/pipermail/biopython-dev/2001-January/000257.html

The first of these lists some tasks that can't be done
with your approach, like being able to index all the
records in a file by byte position.

Parsers can also get better performance by assuming the
file format is correct.  Eg, your EntrezGene.pm doesn't
detect if the file was truncated (I fed it only the first
1000 lines of the human genome file) while the context-free
parsers you have will at least generate an error that
the parenthesis are unbalanced.

One thing I note, investigating a question of Hilmar's,
is that your tokenization of strings isn't quite complete.
Double-quoted "strings" that contain a double quote are
escaped ""with doubled"" double quotes.  Your tokenizer
doesn't convert the double quotes into single ones.  My
Martel code has the same problem.  It needed another
layer to describe how to unescape strings and handle
word spilling.

> Just some more cents (and advocation) :)

This email too is advocation.  I like the idea of having
one set of format definitions that can be shared
across the different code bases.  It's proved rather
difficult and tedious to implement.  I hope that
some of my experience will help you or the next
person working on the problem.

					Andrew Dalke
					dalke@dalkescientific.com

From mingyi.liu at gpc-biotech.com  Sun Mar 13 16:44:57 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Sun Mar 13 16:41:36 2005
Subject: [Bioperl-l]  Porting Entrez Gene parser to Biojava, Biopython,
	Biophp, even C++
In-Reply-To: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
Message-ID: <4234B459.7020109@gpc-biotech.com>

Andrew Dalke wrote:

> When I wrote my grammars I did so in strict mode, and reported
> a bunch of errors to the database providers.  The advantage
> is that wrong formats aren't accidently parsed.  The disadvantage
> is that minor changes break the parser.
>
> I don't see any solution to this other than having someone
> track the file formats over time.
>
Sure. If there's arbitrary and drastic changes to file format, there 
must be someone watching the change .  But one of my points was that my 
parser would likely stay valid even if NCBI changes their data 
definitions because it's very unlikely that NCBI changes their file 
structure/format, although they'd change data definitions (recall that I 
said my parser doesn't care about data content?)

> I looked at the regexps.  The ones that Python doesn't
> support are \G and the compilation flags /cg .  They won't
> be in Python because the start/end positions are available
> as local variables and not as implicit globals.  It
> uses a different stylism.
>
You're right.  The /cg modifiers are exactly the ones I was talking 
about.  \G is actually supprted by PCRE, so very likely in Python too 
since Python uses PCRE (please check again).  Nonetheless, without /cg, 
\G means little.  That's why I said there's gonna be a performance hit.

> The first of these lists some tasks that can't be done
> with your approach, like being able to index all the
> records in a file by byte position.
>
Not really.  If you really want those, my parser code can be easily 
modified to record the file byte position of each token.

> Parsers can also get better performance by assuming the
> file format is correct.  Eg, your EntrezGene.pm doesn't
> detect if the file was truncated (I fed it only the first
> 1000 lines of the human genome file) while the context-free
> parsers you have will at least generate an error that
> the parenthesis are unbalanced.

Yeah, my parser does not give much warnings at current stage.  I 
certainly wouldn't mind someone taking my code and add exception 
handling.  But frankly many parsers do not excel in this department.  
Even some XML parsers only warn when something breaks the parser.

>
> One thing I note, investigating a question of Hilmar's,
> is that your tokenization of strings isn't quite complete.
> Double-quoted "strings" that contain a double quote are
> escaped ""with doubled"" double quotes.  Your tokenizer
> doesn't convert the double quotes into single ones.  My
> Martel code has the same problem.  It needed another
> layer to describe how to unescape strings and handle
> word spilling.
>
You caught me.  I was just being lazy - I noticed this a while ago, but 
decided to delay a bit since I have 4 different parsers that need to be 
modified.  Then I forgot. (it's probably my fault that actually last 
night I remembered this too, and I just uploaded the files anyway 'cause 
it's so simple to fix by anybody anyway).

I'd say you're really exaggerating when you said my tokenization of 
string isn't complete based on this.  Not unescaping the "" escape has 
nothing to do with tokenization (it's a post-processing step after 
tokenization).  It simply take one simple regex to fix it, no other 
layer needed.

Thanks for your suggestions.  I think problems specific to Martel might 
not apply in this case since Entrez Gene file structure/format is really 
simple, and they are likely to stay very stable.  That's why I was 
proposing sharing this code base across languages.

Thanks,

Mingyi

From mingyi.liu at gpc-biotech.com  Sun Mar 13 17:47:08 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Sun Mar 13 17:42:58 2005
Subject: [Bioperl-l] Entrez Gene parsers updated
In-Reply-To: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
Message-ID: <4234C2EC.1020805@gpc-biotech.com>

Hi, Andrew,

Thanks to your quick spotting of the unescaping "" issue, I realized 
unless I fix it, my parsers are gonna be tagged "incomplete" :-) .  So I 
just released version 1.01 that fixed this.  I guess I can't get away 
with the early-Linux/OSS-developer-mentality - "I've done the hard work, 
so for the small things like documentation or fixing tiny bug in open 
source software, it's user's responsibility", especially when I myself 
really dislike the attitude. :-)

Thanks again & best,

Mingyi

From dalke at dalkescientific.com  Sun Mar 13 19:34:11 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun Mar 13 19:29:13 2005
Subject: [Bioperl-l]  Porting Entrez Gene parser to Biojava, Biopython,
	Biophp, even C++
In-Reply-To: <4234B459.7020109@gpc-biotech.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
	<4234B459.7020109@gpc-biotech.com>
Message-ID: <8a66c16373c43a36b13b98acda3288a3@dalkescientific.com>

Mingyi Liu wrote:
> Sure. If there's arbitrary and drastic changes to file format, there 
> must be someone watching the change .  But one of my points was that 
> my parser would likely stay valid even if NCBI changes their data 
> definitions because it's very unlikely that NCBI changes their file 
> structure/format,

Ah, I was mixing two topics - using this set of regexps to parse
this file format and the general topic of using regexps portably
to parse a range of file formats.

> \G is actually supprted by PCRE, so very likely in Python too since 
> Python uses PCRE (please check again).  Nonetheless, without /cg, \G 
> means little.  That's why I said there's gonna be a performance hit.

Python used to use pcre but that was replaced with sre some years
back, in part to support Unicode-based regexps.

It looks like Java's java.util.regex does support the \G
flag, says
   http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

Personally I don't like the lack of thread safety because
that value depends on previous interactions of the pattern.
I think perl solved it by making those values be thread local,
but I'm not sure.

>> The first of these lists some tasks that can't be done
>> with your approach, like being able to index all the
>> records in a file by byte position.
>>
> Not really.  If you really want those, my parser code can be easily 
> modified to record the file byte position of each token.

The code I looked at took a string and there was outer
scaffolding to identify the record locations.

   my $parser = GI::Parser::EntrezGene->new();
   open(IN, "Homo_sapiens") || die "...";
   $/ = "Entrezgene ::= {";
   while(<IN>)
   {
     chomp;
     next unless /\S/;

     my $text = (/^\s*Entrezgene ::= ({.*)/si)? $1 : "{" . $_;
     my $value = $parser->parse($text, 2);
      .. do something with $value ....
   }

The actual record extraction was not part of the EntrezGene
library so I don't see what you could modify.  Perhaps add
an "offset" field to the parse method?

If you do get the byte positions of terms in the ASN.1
(eg to report "syntax error at line 1234 column 56") then
you would need to use the $` and $' fields, which perlvar
warns is slow, so your timings would change.

> Yeah, my parser does not give much warnings at current
> stage.  I certainly wouldn't mind someone taking my code
> and add exception handling.  But frankly many parsers do
> not excel in this department.  Even some XML parsers only
> warn when something breaks the parser.

Sadly the fun part for most people is making the parser
work correctly with correct data.  Few people like making
parsing code correctly handle incorrect data.  Hence all
the parsers which "do not excel in this department."


> You caught me.  I was just being lazy - I noticed this a while ago, 
> but decided to delay a bit since I have 4 different parsers that need 
> to be modified.
   ...
> I'd say you're really exaggerating when you said my tokenization of 
> string isn't complete based on this.

There are several layers to parsing.  One is identifying the lexical
components, which can be done with regular expressions.  The lexer
should convert these into tokens that the parser can use, which
may include things like unescaping quotes, concatenating strings,
normalizing different numeric representations (0xa == 10 == 012
  -> the integer 10).

I don't actually know how to distinguish between these two
parts of the lexer.  One is the LHS of the pattern definition
and the other is the result of applying the RHS actions to the
matched components.  If the actions were a null-op then there
is no difference.

Your parser though doesn't return a token stream, it returns
a parse tree, so you've already passed the step where any
sort of data conversion / normalization should take place.

But if you define that your parse tree returns the raw text
representation then it is complete.  My question - which I
haven't been able to resolve for Martel - is how should code
like this, which tries to be cross-platform, handle what
is semantically one item when it's represented as multiple
components in the input format?

Here are two examples to show how tricky that is

      url "http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&conti
g=NT_009714.16&gene=A2M&lid=2&from=1979284&to=2027463"

       text "There is a significant genetic association of the 5 bp 
deletion
  and two novel polymorphisms in alpha-2-macroglobulin 
alpha-2-macroglobulin
  precursor with AD",

In the first the "\n" should be removed while in the second
it should be replaced with a space.

It would be nice if this behavior was also the same cross-platform.

>   Not unescaping the "" escape has nothing to do with tokenization 
> (it's a post-processing step after tokenization).  It simply take one 
> simple regex to fix it, no other layer needed.

It's post tokenization and pre parse tree assembly.  For this
case it's a simple regexp search/replace but 1) how is that handled
in a cross platform manner and 2) for the general problem it's
not as simple as a regexp.


> Thanks for your suggestions.  I think problems specific to Martel 
> might not apply in this case since Entrez Gene file structure/format 
> is really simple, and they are likely to stay very stable.  That's why 
> I was proposing sharing this code base across languages.

Indeed some of the problems don't apply.  But speaking solely for
myself and not for the Biopython project I would rather use a
validating parser that reported at least imbalanced parens,
roughly equivalent to checking for well-formed XML.

One question I have is that while I know the file format is stable,
given that it's based on ASN.1, what are the chances of new tags
being added which are still valid ASN.1 but which are not yet
present in the existing files?

For example, in reading the ASN.1 spec at
  http://asn1.elibel.tm.fr/en/standards/index.htm#x680
I see that ASN.1 could include a real number but the
Homo_sapiens file doesn't have one and your parser doesn't
handle it (it looks for [\w-]).  Mmm, and there are many
more data types in full ASN.1.

As far as I can tell, if NCBI does add a new data type that
your code doesn't support then it's very hard to tell that
the code is ignoring problems.

Consider a floating point date value (not legal according toe
NCBI but legal ASN.1. .. I think - just testing the idea)

   track-info {
     geneid 1,
     status live,
     create-date std {
       year 2003.43,
       month 8,
       day 28,
       hour 20,
       minute 30,
       second 0
     },


Your code converts that into

     'track-info' => [
              {
               'geneid' => '1',
               'create-date' => [
                      {
                       'std' => [
                            {
                             'year' => [
                                    {
                                     '2003' => [
                                         undef
                                         ]
                                     }
                               ]
                             }
                           ]
                      }
                   ],
               'status' => 'live'
           }
       ]

That doesn't seem like the behavior it should do.


BTW, looking at what you do, I don't understand why you handle
the explicit types fields as you do.  Why does

           tag id 9606
turn into
          'tag' => [
             {
               'id' => '11'
             }
           ],

As far as I can tell there's only a single data type
there so what about omitting the list reference?

          'tag' => {
               'id' => '11'
             },

But I don't know enough about ASN.1.


					Andrew
					dalke@dalkescientific.com

From mingyi.liu at gpc-biotech.com  Sun Mar 13 21:44:57 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Sun Mar 13 21:40:53 2005
Subject: [Bioperl-l]  Porting Entrez Gene parser to Biojava, Biopython,
	Biophp, even C++
In-Reply-To: <8a66c16373c43a36b13b98acda3288a3@dalkescientific.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>	<4234564D.7010906@gpc-biotech.com>	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>	<4234B459.7020109@gpc-biotech.com>
	<8a66c16373c43a36b13b98acda3288a3@dalkescientific.com>
Message-ID: <4234FAA9.9050102@gpc-biotech.com>

Andrew Dalke wrote:

> Python used to use pcre but that was replaced with sre some years
> back, in part to support Unicode-based regexps.
>
I see.  Doesn't matter anyway.  I do want to note that this \G /cg is 
purely for parser efficiency, so s/// would work just fine except at 
least an order of magnitude slower with large Entrez Gene records.  So 
just as I said, porting is fine, but performance will take a hit.  Then 
again, any parser relying on regex would need \G /cg for performance, 
and would be hit when ported over.

> The code I looked at took a string and there was outer
> scaffolding to identify the record locations.
>
> The actual record extraction was not part of the EntrezGene
> library so I don't see what you could modify.  Perhaps add
> an "offset" field to the parse method?
>
Seems what you're looking for in a parser is a do-it-all text 
processor.  It parses, it indexes, and it adapts (read on for my comment 
on this one).  But I strictly said my parser is parser only.  Now with 
that out of the way, let me address your question: Yes, since my parser 
is parser only, if you want to use it for indexing purpose, then you'd 
have to keep position in outer scaffolding or custom programs, and make 
simple changes like calling pos function after token generation to 
record position of token in input string (a truncated Entrez Gene 
record).  It's all doable, but I just wouldn't put the indexing code 
into a parser.

> If you do get the byte positions of terms in the ASN.1
> (eg to report "syntax error at line 1234 column 56") then
> you would need to use the $` and $' fields, which perlvar
> warns is slow, so your timings would change.

Yeah, I know. If my parser tries to do more, sure it'd get slower. ;-)

> There are several layers to parsing.  ...
>
> But if you define that your parse tree returns the raw text
> representation then it is complete.  My question - which I
> haven't been able to resolve for Martel - is how should code
> like this, which tries to be cross-platform, handle what
> is semantically one item when it's represented as multiple
> components in the input format?
>
> Here are two examples to show how tricky that is
>
>      url "http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&conti
> g=NT_009714.16&gene=A2M&lid=2&from=1979284&to=2027463"
>
>       text "There is a significant genetic association of the 5 bp 
> deletion
>  and two novel polymorphisms in alpha-2-macroglobulin 
> alpha-2-macroglobulin
>  precursor with AD",
>
> In the first the "\n" should be removed while in the second
> it should be replaced with a space.
>
> It would be nice if this behavior was also the same cross-platform.
>
I think the phrase you were looking for instead of "what is semantically 
one item when it's represented as multiple components in the input 
format?" is simply "context-sensitive rules".  Context-sensitivity can 
be cross-platform, but my parser does not need to deal with it (note 
that how to replace the "\n" really is user's preference and none of 
parser's business. You might want to replace the 2nd one with space, but 
another person might want it to be replaced with "<br>"). Even if you 
find a better example, I could suggest you look to my Parse::RecDescent 
based parser, since Parse::RecDescent allows context-senstive grammar.  
And also one should know that coding context-sensitivity in regex is 
also not that hard, but you do need to have a well defined set of 
scenarios and rules.

>
> It's post tokenization and pre parse tree assembly.  For this
> case it's a simple regexp search/replace but 1) how is that handled
> in a cross platform manner 

My parser is regex based.  Any change in the perl parser could be 
reflected in other languages (I still prefer language instead of 
platforms though, since this is really the point.  My parsers are 
already cross-platform, they're supported by any platform that supports 
Perl).  There could be changes that are needed, like unsupported 
modifiers, but you wouldn't think that porting across languages should 
not ask developers to do anything, right?  What needs to be done should 
be determined on a case-by-case manner.  I can't think of a generic 
response that is panacea for all porting cases.

> and 2) for the general problem it's
> not as simple as a regexp.
>
Exactly.  If you read my comments on my parsers, I mentioned that when 
things get more complex, use those grammar-based tools instead.  Right 
now, for Entrez Gene, regex works and it works best, that's why I mostly 
talk about this one.  But you're very welcome to check other ones out 
for completeness.

> Indeed some of the problems don't apply.  But speaking solely for
> myself and not for the Biopython project I would rather use a
> validating parser that reported at least imbalanced parens,
> roughly equivalent to checking for well-formed XML.

Of course.  I could suggest that such checking can easily be added to my 
parser, with one variable tracking depth - that's all that's needed 
since Entrez Gene only has one type of block delimiter.  I'll probably 
do it when I have time next week since it's only 3 lines of code or so.  
But then again, I start to realize that you would rather use some other 
parser ranyway.

>
> For example, in reading the ASN.1 spec at
>  http://asn1.elibel.tm.fr/en/standards/index.htm#x680
> I see that ASN.1 could include a real number but the
> Homo_sapiens file doesn't have one and your parser doesn't
> handle it (it looks for [\w-]).  Mmm, and there are many
> more data types in full ASN.1.
>
Mmm, you really tried hard to let me know that my parser can not do it 
all.  ;-) Well, read on for my response.

> As far as I can tell, if NCBI does add a new data type that
> your code doesn't support then it's very hard to tell that
> the code is ignoring problems.

Good point. I'll add one line in the _parse function to do a catch-all 
error reporting.

>
> Consider a floating point date value (not legal according toe
> NCBI but legal ASN.1. .. I think - just testing the idea)
>   ...
>       year 2003.43,
>  ...
>
> Your code converts that into
> ...
>                                     '2003' => [
>                                         undef
>                                         ]
> ...
> That doesn't seem like the behavior it should do.
>
Well, you point that my parser is not a general ASN.1 parser is well 
taken, especially since I never claimed it to be one.  If you're looking 
for an ASN.1 perl parser, I heard that on the mailing list someone 
already made one, and it could be of help to you.

>
> BTW, looking at what you do, I don't understand why you handle
> the explicit types fields as you do.  Why does
>
>           tag id 9606
> turn into
>          'tag' => [
>             {
>               'id' => '11'
>             }
>           ],
>
> As far as I can tell there's only a single data type
> there so what about omitting the list reference?
>
>          'tag' => {
>               'id' => '11'
>             },
>
> But I don't know enough about ASN.1.
>
This has nothing to do with ASN. It is all about how uniform the data 
structure could be.  In fact, consider when NCBI decides to do
{
  tag id 12345,
  tag str "whatever"
}
which is far more possible than the cases you considered in earlier 
criticisms, then the data sturcture would need to become:

        'tag' => [
            {
              'id' => '12345',
              'str' => 'whatever'
            }
          ],
With your suggested approach, this would force the user to test what 
type of reference $hash{'tag'} is before dealing with it either as a 
hash or an array.  With my approach, user always knows to deal with it 
as an array.  This is also exactly the reason (I guess) why XML::Simple 
has option 'ForceArray', if you recall.

Now the promised response to the criticism that my parser doesn't do: 1. 
Indexing of EntrezGene file. 2. Adaptive behavior when new format comes 
out. 3. (semi-?)Automatic cross-language porting. 4. Full support for 
ASN.1 parsing.

It's really simple - if you haven't already known - my parser is just an 
Entrez Gene parser.  It is not designed to do those things.  You really 
went out of your way to show me that my parser simply doesn't do 
everything, but failed to show me that why my parser cannot be a 
reasonable Entrez Gene parser, which is your main point.  Also I don't 
understand why you just dispatch my parser right away as a candidate for 
porting to other language while I could address your valid concern next 
week with a few lines.  Why?  I can understand that you were possibly 
offended by my may-seem-naive enthusiasm of thinking about the prospect 
of porting this fast parser to other languages.  But I was pretty happy 
with the parser I made, simply because:

1. There are plenty of people talking about that they have a parser 
working for Entrez Gene, but probably due to various reasons like IP 
issues or specific projects, no one posted one yet (at least I couldn't 
find it after plenty of searching).  Mine's the first one I could find 
that's in public domain and in Perl.
2. My parser is so short, and not written in guru-style (since I'm far 
from a Perl guru), so it's easy to understand.
3. It's OO with pod and example scripts, so very easy to use.
4. Most importantly, it's freakishly fast without making mistakes with 
the NCBI Entrez Gene downloads.

My enthusiasm is based on the belief that there's not a Perl parser out 
there that's better than mine overall when points 2-4 are considered.  
And point 1 is just a trump card.  I thought it'd be helpful to many who 
want to get a GPL-ed Entrez Gene parser.

Nonetheless, if you just don't want to use my parser, you can simply say 
so (or tell me why it doesn't work as a portable Entrez Gene parser).  
Frankly, reading your emails, initially I was glad that we had a useful 
discussion on parsers, but the endless picking on the progressively 
absurd tasks for an Entrez Gene parser to do (like it's unable to index, 
adapt to arbitrary changes, auto-port, parse full ASN.1 specifications) 
just really changed my opinion, particularly because I doubt anyone 
using any language would be looking for those in an Entrez Gene parser.  
Again, FYI, it's only a parser, and I repeatedly said it's only a parser 
that only constructures a data structure.

But I certainly welcome good suggestions, and I'll add some basic error 
reporting next week.  I didn't think it was needed since again, I 
already parsed and checked results on human, mouse and rat.  But it's 
still a good idea & thanks for the suggestion!  If someday you work out 
a fast parser and/or one that does it all in either python or perl, I'd 
like to know too.  I'm always thrilled to learn useful things.

Thanks,

Mingyi

BTW, I realized that I was a bit overly broad in my last email in my 
criticism of early attitude that users have to do work to use their 
software.  I should say it's just some of the early softwares that gave 
such impression, even though it's only a few, the impression could be 
big.  If that's what's thrown you off, I apologize.

From mingyi.liu at gpc-biotech.com  Sun Mar 13 22:26:35 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Sun Mar 13 22:22:45 2005
Subject: [Bioperl-l]  Porting Entrez Gene parser to Biojava, Biopython,
	Biophp, even C++
In-Reply-To: <4234FAA9.9050102@gpc-biotech.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>	<4234564D.7010906@gpc-biotech.com>	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>	<4234B459.7020109@gpc-biotech.com>	<8a66c16373c43a36b13b98acda3288a3@dalkescientific.com>
	<4234FAA9.9050102@gpc-biotech.com>
Message-ID: <4235046B.7070002@gpc-biotech.com>

Mingyi Liu wrote

> This has nothing to do with ASN. It is all about how uniform the data 
> structure could be.  In fact, consider when NCBI decides to do
> {
>  tag id 12345,
>  tag str "whatever"
> }

oops, I really meant:
{
  tag id 12345,
  tag str "whatever",
  tag id 34567
}

I switched to str just as example but forgot that this renders my 
example incorrect.  So now the structure has to become:

      'tag' => [
           {
             'id' => '12345',
             'str' => 'whatever'
           }
           {
             'id' => 34567
           }
         ]
or one that makes more sense
      'tag' => [
           {
             'id' => '12345'
           }
           {
             'str' => 'whatever'
           }
           {
             'id' => 34567
           }
         ]
which is my approach.  Again your approach would demand users to test 
reference before dealing with content, and users have to design two ways 
of dealing with the content.  While in my approach users always deal 
with it as array, just one design and no reference testing needed.  If 
you read my comment for the data structure trimming function, you'll see 
some more consideration in this aspect.  It's still not perfect, I hope 
that's not too surprising and not becoming a reason to dispatch my 
parser altogether. ;-)        

Regards,

Mingyi

From ewijaya at singnet.com.sg  Sun Mar 13 22:04:40 2005
From: ewijaya at singnet.com.sg (Edward Wijaya)
Date: Mon Mar 14 02:50:00 2005
Subject: [Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix
In-Reply-To: <956ed650307183b2819321abc990543b@duke.edu>
References: <002b01c519bc$bce75370$6600a8c0@GOLHARMOBILE1>
	<956ed650307183b2819321abc990543b@duke.edu>
Message-ID: <op.snlxh2wvpncm2o@mail.singnet.com.sg>

Hi,

Why my code below fails to return the IC values?
I thought the method is able to do that.
Is there anything I miss here?

My second question is about"consensus" method.
The consensus is generated by choosing the highest probability OR *N if  
prob is too low*

1. How do you define when the probability is *too low*?
2. What is the reasoning behind this implementation?
    e.g. Why my code below gives 'TANGTA' instead of "TATGTA"?

I find this particular module is very very useful.
I really wish I can make best use of it.

Thanks so much for your time.
Hope to hear from you again.

---
Regards,
Edward WIJAYA
SINGAPORE


__BEGIN__

#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Matrix::PSM::SiteMatrix;

      #Frequency matrix
      my  @pA = (2,19,3,6,8,10);
      my  @pT = (7,3,6,2,20,5);
      my  @pC = (1,2,2,1,1,1);
      my  @pG = (3,1,1,9,8,7);


my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT);
my $site=new Bio::Matrix::PSM::SiteMatrix(%param);

my $consensus = $site->consensus;
my $ic = $site->IC; #Why it fails here?


print Dumper $ic;
print Dumper $consensus;

__END__

From s0460205 at sms.ed.ac.uk  Mon Mar 14 03:17:22 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Mon Mar 14 03:13:19 2005
Subject: [Bioperl-l] Full uniprot annotation extraction
Message-ID: <1110788242.423548925f306@sms.ed.ac.uk>

Hi,

I am parsing uniprot flat files and I need to extract as many of the lines as
possible for insertion into a RDBMS.

I use Bio::DB::SwissProt to get the major annotation (e.g. primary accession
number) but is there a way to get other annotation also (e.g. date of the last
update?)


From tex at biocompute.net  Sun Mar 13 12:35:23 2005
From: tex at biocompute.net (James Thompson)
Date: Mon Mar 14 03:39:20 2005
Subject: [Bioperl-l] Getting IC & Consensus with
	Bio::Matrix::PSM::SiteMatrix
In-Reply-To: <op.snlxh2wvpncm2o@mail.singnet.com.sg>
Message-ID: <Pine.LNX.4.44.0503131217210.16978-100000@biosysadmin.com>

Edward,

1. There is no code in SiteMatrix (or any of other other Bio::Matrix::PSM modules
as far as I know) that calculates information content for you. It's assumed to
provided as a parameter to the constructor rather than calculated by the
SiteMatrix object itself.

2. I don't know the exact reasoning behind this implementation for calculating
ambiguity, but here's the algorithm to calculate the consensus for an individual
position:

   - Take the frequencies for a given position, multiply them all by ten and divide
   by the total number of characters at that position. In your example for the third
   position, we would transform these numbers:
   { A => 3, T => 6, C => 2, G => 1 }

   into this set of numbers:
   { A => 2.5, T => 3, C => 1.667, G => 0.833 }

   - If none of these numbers are above the threshold (which defaults to 5),
   then return an N for this position.

This algorithm is in the _to_cons method of the Bio::Matrix::PSM::SiteMatrix module
if you'd like to take a peek.

I'll defer your other questions to Stefan and the rest of the list. :)

Cheers,

James Thompson

On Mon, 14 Mar 2005, Edward Wijaya wrote:

> Hi,
> 
> Why my code below fails to return the IC values?
> I thought the method is able to do that.
> Is there anything I miss here?
> 
> My second question is about"consensus" method.
> The consensus is generated by choosing the highest probability OR *N if  
> prob is too low*
> 
> 1. How do you define when the probability is *too low*?
> 2. What is the reasoning behind this implementation?
>     e.g. Why my code below gives 'TANGTA' instead of "TATGTA"?
> 
> I find this particular module is very very useful.
> I really wish I can make best use of it.
> 
> Thanks so much for your time.
> Hope to hear from you again.
> 
> ---
> Regards,
> Edward WIJAYA
> SINGAPORE
> 
> 
> __BEGIN__
> 
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> use Bio::Matrix::PSM::SiteMatrix;
> 
>       #Frequency matrix
>       my  @pA = (2,19,3,6,8,10);
>       my  @pT = (7,3,6,2,20,5);
>       my  @pC = (1,2,2,1,1,1);
>       my  @pG = (3,1,1,9,8,7);
> 
> 
> my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT);
> my $site=new Bio::Matrix::PSM::SiteMatrix(%param);
> 
> my $consensus = $site->consensus;
> my $ic = $site->IC; #Why it fails here?
> 
> 
> print Dumper $ic;
> print Dumper $consensus;

From skirov at utk.edu  Mon Mar 14 08:15:43 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Mon Mar 14 08:16:35 2005
Subject: [Bioperl-l] Getting IC & Consensus with
	Bio::Matrix::PSM::SiteMatrix
In-Reply-To: <Pine.LNX.4.44.0503131217210.16978-100000@biosysadmin.com>
References: <Pine.LNX.4.44.0503131217210.16978-100000@biosysadmin.com>
Message-ID: <42358E7F.7020209@utk.edu>

Edward,
The rules for too low are:
single base probability>0.7; combination of two>0.8 and three>0.9 for 
IUPAC consensus and >0.5 for simple consensus. Actually you can 
recalculate the consensus by doing:
$matrix->_calculate_consensus(0.45) (naturally will set the consensus at 
0.45). Probably I should document this, though generally speaking this 
method is internal use only. However if you do this, you will have 
A=>0.46,C=>0.01,G=>0.48,T=>0.05) and then you will get A in the 
consensus (which is obviously incorrect, first base to surpass the 
thresh). I can fix this, but do you really want to get in your consensus 
a position with proba less than 0.5? If you use IUPAC you will get H 
(A+T+C). We can easily add IC calculating method if you really need it.
Please let me know if you have further questions.
Stefan

James Thompson wrote:

>Edward,
>
>1. There is no code in SiteMatrix (or any of other other Bio::Matrix::PSM modules
>as far as I know) that calculates information content for you. It's assumed to
>provided as a parameter to the constructor rather than calculated by the
>SiteMatrix object itself.
>
>2. I don't know the exact reasoning behind this implementation for calculating
>ambiguity, but here's the algorithm to calculate the consensus for an individual
>position:
>
>   - Take the frequencies for a given position, multiply them all by ten and divide
>   by the total number of characters at that position. In your example for the third
>   position, we would transform these numbers:
>   { A => 3, T => 6, C => 2, G => 1 }
>
>   into this set of numbers:
>   { A => 2.5, T => 3, C => 1.667, G => 0.833 }
>
>   - If none of these numbers are above the threshold (which defaults to 5),
>   then return an N for this position.
>
>This algorithm is in the _to_cons method of the Bio::Matrix::PSM::SiteMatrix module
>if you'd like to take a peek.
>
>I'll defer your other questions to Stefan and the rest of the list. :)
>
>Cheers,
>
>James Thompson
>
>On Mon, 14 Mar 2005, Edward Wijaya wrote:
>
>  
>
>>Hi,
>>
>>Why my code below fails to return the IC values?
>>I thought the method is able to do that.
>>Is there anything I miss here?
>>
>>My second question is about"consensus" method.
>>The consensus is generated by choosing the highest probability OR *N if  
>>prob is too low*
>>
>>1. How do you define when the probability is *too low*?
>>2. What is the reasoning behind this implementation?
>>    e.g. Why my code below gives 'TANGTA' instead of "TATGTA"?
>>
>>I find this particular module is very very useful.
>>I really wish I can make best use of it.
>>
>>Thanks so much for your time.
>>Hope to hear from you again.
>>
>>---
>>Regards,
>>Edward WIJAYA
>>SINGAPORE
>>
>>
>>__BEGIN__
>>
>>#!/usr/bin/perl -w
>>use strict;
>>use Data::Dumper;
>>use Bio::Matrix::PSM::SiteMatrix;
>>
>>      #Frequency matrix
>>      my  @pA = (2,19,3,6,8,10);
>>      my  @pT = (7,3,6,2,20,5);
>>      my  @pC = (1,2,2,1,1,1);
>>      my  @pG = (3,1,1,9,8,7);
>>
>>
>>my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT);
>>my $site=new Bio::Matrix::PSM::SiteMatrix(%param);
>>
>>my $consensus = $site->consensus;
>>my $ic = $site->IC; #Why it fails here?
>>
>>
>>print Dumper $ic;
>>print Dumper $consensus;
>>    
>>
>
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From brian_osborne at cognia.com  Mon Mar 14 08:20:12 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Mon Mar 14 08:16:52 2005
Subject: [Bioperl-l] Full uniprot annotation extraction
In-Reply-To: <1110788242.423548925f306@sms.ed.ac.uk>
Message-ID: <GPENLDEIJJHJLHOAJBBPAEMNCCAA.brian_osborne@cognia.com>

SG,

You should take a look at the Feature and Annotation HOWTO
(http://bioperl.org/HOWTOs/Feature-Annotation).

You might also want to consider using bioperl-db, it has scripts that load
sequence into a BioSql database (Oracle, Mysql, Postgres). This package is
available at http://bioperl.org/Core/Latest/index.shtml.


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
Sent: Monday, March 14, 2005 3:17 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Full uniprot annotation extraction


Hi,

I am parsing uniprot flat files and I need to extract as many of the lines
as
possible for insertion into a RDBMS.

I use Bio::DB::SwissProt to get the major annotation (e.g. primary accession
number) but is there a way to get other annotation also (e.g. date of the
last
update?)


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From ewijaya at singnet.com.sg  Mon Mar 14 04:26:05 2005
From: ewijaya at singnet.com.sg (Edward Wijaya)
Date: Mon Mar 14 09:12:37 2005
Subject: [Bioperl-l] Getting IC & Consensus with
	Bio::Matrix::PSM::SiteMatrix
In-Reply-To: <42358E7F.7020209@utk.edu>
References: <Pine.LNX.4.44.0503131217210.16978-100000@biosysadmin.com>
	<42358E7F.7020209@utk.edu>
Message-ID: <op.snme5rcepncm2o@mail.singnet.com.sg>

Dear Stefan and James,

Thanks so much for answering

On Mon, 14 Mar 2005 21:15:43 +0800, Stefan Kirov <skirov@utk.edu> wrote:

> The rules for too low are:
[snip]
Got it Stef.

> I can fix this, but do you really want to get in your consensus a  
> position with proba less than 0.5?

Yes, Don't you think by default it should be that way?
Besides it'll be nice to have an *option* of how are we going to get the  
Consensus.

> We can easily add IC calculating method if you really need it.
Yes, we definitely need that. I think naturally we would need the  
computation,
same as e-value.

Actually I have the subroutine to compute the IC given the frequency  
matrices.
You can use it to incorporate it to the module if you want, although it  
isn't a great piece of work.
I just thought it may save you time.

> Please let me know if you have further questions.
I'll save them till next time Stef ;-)


-- 
Edward WIJAYA
Singapore
From skirov at utk.edu  Mon Mar 14 09:47:21 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Mon Mar 14 09:41:56 2005
Subject: [Bioperl-l] Getting IC & Consensus with
	Bio::Matrix::PSM::SiteMatrix
In-Reply-To: <op.snme5rcepncm2o@mail.singnet.com.sg>
References: <Pine.LNX.4.44.0503131217210.16978-100000@biosysadmin.com>
	<42358E7F.7020209@utk.edu> <op.snme5rcepncm2o@mail.singnet.com.sg>
Message-ID: <4235A3F9.6000208@utk.edu>

Hi Edwars,

Edward Wijaya wrote:

> Dear Stefan and James,
>
> Thanks so much for answering
>
> On Mon, 14 Mar 2005 21:15:43 +0800, Stefan Kirov <skirov@utk.edu> wrote:
>
>> The rules for too low are:
>
> [snip]
> Got it Stef.
>
>> I can fix this, but do you really want to get in your consensus a  
>> position with proba less than 0.5?
>
>
> Yes, Don't you think by default it should be that way?
> Besides it'll be nice to have an *option* of how are we going to get 
> the  Consensus.
>
I'll commit the code tomorrow, just update your bioperl-live

>> We can easily add IC calculating method if you really need it.
>
> Yes, we definitely need that. I think naturally we would need the  
> computation,
> same as e-value.

You mean you want SiteMatrix to compute the e-value? Hmm... we are 
getting out a bit out of scope here. Essentially PSM modules were 
supposed only to provide data structure and parsers. And if IC is 
generally straightforward and contained in the PFM/PSM, this is not the 
case with e-val or p-val. Therefore I am reluctant to put it in PSM 
collection.

>
> Actually I have the subroutine to compute the IC given the frequency  
> matrices.
> You can use it to incorporate it to the module if you want, although 
> it  isn't a great piece of work.
> I just thought it may save you time.
>
Sure, that would be great. Just send it and I will optimize it if I can 
and put it in. But maybe it should go to Bio::Tools... Any thoughts from 
anyone else?

>> Please let me know if you have further questions.
>
> I'll save them till next time Stef ;-)
>
>
Stefan

From amtd9 at umr.edu  Mon Mar 14 08:29:30 2005
From: amtd9 at umr.edu (Mane, Ajay (UMR-Student))
Date: Mon Mar 14 10:10:12 2005
Subject: [Bioperl-l] query
Message-ID: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>


Hi,

I am Ajay from University of Missouri - Rolla, doing research in
bioinformatics. The bl2seq tool takes 2 sequences to align. I am
interested in a list of sequences and want to compare them. Instead of
putting 2 at a time, I have a large list of pairs to be analysed. How do
I automate the process. Everytime running the tool and manually looking
for the point where the coding of proteins start is time consuming. Can
I write a perl file to automate the process. Can I get any help on this.
I have gone through the bioperl modules, but could not find on bl2seq. 

Thanks,
Ajay

From lstein at cshl.edu  Thu Mar 10 16:49:05 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon Mar 14 10:10:42 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <Pine.LNX.4.50.0503101105050.19841-100000@sausage.usask.ca>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>
	<3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu>
	<Pine.LNX.4.50.0503101105050.19841-100000@sausage.usask.ca>
Message-ID: <200503101649.11352.lstein@cshl.edu>

The problem is tied up with the need for better handling of GFF3 by 
Bio::DB::GFF.  In GFF3 you can separate the Name of a thing and its 
parentage:

	
	ID=match0001;Target=cdna0123 12 462
	ID=match0001;Target=cdna0123 463 963
	ID=match0001;Target=cdna0123 964 2964
	ID=match0002;Target=cdna0123 1 129
	ID=match0002;Target=cdna0123 463 960

This is what the alignment GFF emitter should produce.  Unfortunately, 
when you load this into Bio::DB::GFF, the distinction between the ID 
and the Target is lost and all the lines get aggregated together 
again on the target name cdna0123.

I've got lots of notes on a better Bio::DB::GFF and a sample schema 
and queries.  If someone wants to work on this, I'll hand it over to 
them.  ...Alternatively, perhaps this can be fixed by a much less 
invasive change to the Bio::DB::GFF module.  Perhaps the Target 
should simply be converted into an alias so that it can be 
identified.

Lincoln

On Thursday 10 March 2005 12:21 pm, Chad Matsalla wrote:
> On Wed, 9 Mar 2005, Aaron J. Mackey wrote:
> > > chr1 aafcest     HSP   200   275   .     -     .     Target
> > > "Sequence:chad1" 200 275
> > > chr1 aafcest     HSP   300   450   .     -     .     Target
> > > "Sequence:chad1" 300 450
> > > chr1 aafcest     match 200   450   .     -     .     Target
> > > "Sequence:chad1" 200 450
> >
> > These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2"
> > or some such.  This also means that if you're saving the ESTs in
> > the database (for sequence alignment display), you'll have to
> > save them redundantly under chad1-1, chad1-2, etc.
>
> This is horrible. I want to fix this.
>
> > Now, you could write a custom aggregator that de-aggregated
> > multiple chad1 "match" features, assigning the contained HSPs to
> > each, but there is no such "default" behavior.  Let me know if
> > there's general interest for this ...
>
> I think there is, and I volunteer to write it. I'm new to the
> Bio::DB subsystem but I'm eager to dive in. Can you help me by
> providing a general flowchart on what you'd do to create this? What
> should the Aggregator be called? Hmm.
> Bio::DB::GFF::Aggregator::manymatch ?
>
> Chad Matsalla
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724

NOTE: Please copy Sandra Michelsen <michelse@cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050310/0d70e8a1/attachment.bin
From elia at tigem.it  Sat Mar 12 05:08:08 2005
From: elia at tigem.it (Elia Stupka)
Date: Mon Mar 14 10:10:49 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <Pine.LNX.4.58.0503111126240.31639@sumo.ctrl.ucla.edu>
References: <FAE79F74-9258-11D9-8881-000A959EB4C4@gmx.net>
	<Pine.LNX.4.58.0503111126240.31639@sumo.ctrl.ucla.edu>
Message-ID: <59ff1e0691dce94b58f4bc0a0432ca4a@tigem.it>

>> *Every single bit* of those changes need to be rolled back from the
>> release and if nobody else has done it by then I will do so in two
>> weeks.
>
> Fine for the 1.5.1 branch, although I don't agree that this should be 
> done
> on the main trunk.

I couldn't agree more with Hilmar. I am writing this comment almost as 
an outsider considering my minor development involvement in bioperl 
since 1.4 was rolled out. As an external observer I can assure you that 
the 1.5 changes are causing a lot of trouble in the real world, many of 
which you don't get on the mailing list. Quite a few people are keeping 
1.4 for their day to day work and using 1.5 only when it is required 
(e.g. gbrowse). Bioperl, because of its wide usage by a non-developer 
crowd has most definitely become the sort of project where code 
elegance and efficiency and conceptual issues are much less of a 
priority than stability and usability.

Elia

---
Telethon Institute of Genetics and Medicine
Via Pietro Castellino, 111
80131 Napoli

Tel. +39 081 6132 335
Fax. +39 081 560 98 77

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1115 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050312/b5b47162/attachment.bin
From jason.stajich at duke.edu  Mon Mar 14 10:13:27 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Mar 14 10:10:52 2005
Subject: [Bioperl-l] 1.6 release
In-Reply-To: <4231628B.4010007@ed.ac.uk>
References: <4231628B.4010007@ed.ac.uk>
Message-ID: <7ce85432cbc83b38f79a3aa5320bfeea@duke.edu>

[using this post to also advocate for volunteers even though you were 
just trying to read on when your module changes needed to go in]

At least from my POV there isn't really a plan for a 1.6 release date.  
I was hoping it could released before BOSC this summer.

We still need a release-master to do 1.6 and a lot of recently added 
stuff needs to be cleaned up and re-tested before we will think about 
doing a stable release.  I don't know when we will start a 1.6 branch 
in preparation for the release.  I think this time around we will 
branch and make the stable release off the branch instead of our normal 
releasing off the main trunk.  This gives us the flexibility to prune 
modules which are too new or add ports to support backwards 
compatibility.


It was decided that the new Feature/Annotation stuff won't be part of 
the stable release 1.6 but would be considered for 1.8 once it is 
proved to be stable.  If backwards compatible patches can be made so 
the API established in Bioperl 1.4 is still respected (and no 
additional XML or Graph modules are needed for the core Feature and 
Annotation objects to work) we can consider some compromises.   [Scott] 
I realize that GMOD/Gbrowse has begun relying on this so a plan will 
need to be discussed, outlining exactly what new functionality is 
expected.

We will need a volunteer to be the release master/pumpkin and several 
people to help in the testing and bug fixing prior to the release.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 11, 2005, at 4:19 AM, Richard Adams wrote:

> Hello,
> Is there any schedule for the 1.6 release?
> just to know by when I have to get by modules working.....
>
> Richard
>
> -- 
> Dr Richard Adams
> Psychiatric Genetics Group,
> Medical Genetics,
> Molecular Medicine Centre,
> Western General Hospital,
> Crewe Rd West,
> Edinburgh UK
> EH4 2XU
>
> Tel: 44 131 651 1084
> richard.adams@ed.ac.uk
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050314/69b51667/PGP.bin
From ak at ebi.ac.uk  Mon Mar 14 11:02:23 2005
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Mon Mar 14 10:56:53 2005
Subject: [Bioperl-l] query
In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>
References: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>
Message-ID: <20050314160223.GB1160@ebi.ac.uk>

I might be a bit na?ve, but wouldn't this be solved by putting
all sequences in one file and then blasting it against itself?

I didn't quite get the part where you mention the coding of
proteins, but maybe someone else knows exactly what you mean...

Andreas

On Mon, Mar 14, 2005 at 07:29:30AM -0600, Mane, Ajay (UMR-Student) wrote:
> 
> Hi,
> 
> I am Ajay from University of Missouri - Rolla, doing research in
> bioinformatics. The bl2seq tool takes 2 sequences to align. I am
> interested in a list of sequences and want to compare them. Instead of
> putting 2 at a time, I have a large list of pairs to be analysed. How do
> I automate the process. Everytime running the tool and manually looking
> for the point where the coding of proteins start is time consuming. Can
> I write a perl file to automate the process. Can I get any help on this.
> I have gone through the bioperl modules, but could not find on bl2seq. 

-- 
Andreas K?h?ri
EMBL-EBI/ensembl

1024D/C2E163CB
From razi at genet.sickkids.on.ca  Mon Mar 14 11:06:10 2005
From: razi at genet.sickkids.on.ca (Razi Khaja)
Date: Mon Mar 14 11:00:40 2005
Subject: [Bioperl-l] query
In-Reply-To: 6667
Message-ID: <20050314160611.51919.qmail@web51605.mail.yahoo.com>

There is documentation available for this at
http://doc.bioperl.org/releases/bioperl-1.4/Bio/AlignIO/bl2seq.html
Razi

 
"Mane, Ajay (UMR-Student)" <amtd9@umr.edu> wrote:

Hi,

I am Ajay from University of Missouri - Rolla, doing research in
bioinformatics. The bl2seq tool takes 2 sequences to align. I am
interested in a list of sequences and want to compare them. Instead of
putting 2 at a time, I have a large list of pairs to be analysed. How do
I automate the process. Everytime running the tool and manually looking
for the point where the coding of proteins start is time consuming. Can
I write a perl file to automate the process. Can I get any help on this.
I have gone through the bioperl modules, but could not find on bl2seq. 

Thanks,
Ajay

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


/**
 * Razi Khaja, Bioinformatics Analyst
 * The Hospital for Sick Children, Toronto
 * The Centre for Applied Genomics, www.tcag.ca
 * Tel 416-813-7032, Fax 416-813-8319
 */
From palmeida at igc.gulbenkian.pt  Mon Mar 14 11:25:17 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Mon Mar 14 11:20:33 2005
Subject: [Bioperl-l] query
In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>
References: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>
Message-ID: <20050314162517.GA3026@bioinf.igc.gulbenkian.pt>

Hi,
I have never used bl2seq on perl, but this page may help you:

http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/PiseApplication/bl2seq.html

(you can get bioperl-run from: http://bioperl.org/Core/Latest/index.shtml )

-Paulo

On Mon, Mar 14, 2005 at 07:29:30AM -0600, Mane, Ajay (UMR-Student) wrote:
> 
> Hi,
> 
> I am Ajay from University of Missouri - Rolla, doing research in
> bioinformatics. The bl2seq tool takes 2 sequences to align. I am
> interested in a list of sequences and want to compare them. Instead of
> putting 2 at a time, I have a large list of pairs to be analysed. How do
> I automate the process. Everytime running the tool and manually looking
> for the point where the coding of proteins start is time consuming. Can
> I write a perl file to automate the process. Can I get any help on this.
> I have gone through the bioperl modules, but could not find on bl2seq. 
> 
> Thanks,
> Ajay

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From palmeida at igc.gulbenkian.pt  Mon Mar 14 11:25:17 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Mon Mar 14 11:21:53 2005
Subject: [Bioperl-l] query
In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>
References: <58AF0CF509606A49B1770AB5DFF811CE13B4DF@UMR-CMAIL1.umr.edu>
Message-ID: <20050314162517.GA3026@bioinf.igc.gulbenkian.pt>

Hi,
I have never used bl2seq on perl, but this page may help you:

http://doc.bioperl.org/bioperl-run/Bio/Tools/Run/PiseApplication/bl2seq.html

(you can get bioperl-run from: http://bioperl.org/Core/Latest/index.shtml )

-Paulo

On Mon, Mar 14, 2005 at 07:29:30AM -0600, Mane, Ajay (UMR-Student) wrote:
> 
> Hi,
> 
> I am Ajay from University of Missouri - Rolla, doing research in
> bioinformatics. The bl2seq tool takes 2 sequences to align. I am
> interested in a list of sequences and want to compare them. Instead of
> putting 2 at a time, I have a large list of pairs to be analysed. How do
> I automate the process. Everytime running the tool and manually looking
> for the point where the coding of proteins start is time consuming. Can
> I write a perl file to automate the process. Can I get any help on this.
> I have gone through the bioperl modules, but could not find on bl2seq. 
> 
> Thanks,
> Ajay

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From skirov at utk.edu  Mon Mar 14 11:28:40 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Mon Mar 14 11:24:30 2005
Subject: [Bioperl-l] Entrez Gene ASN
In-Reply-To: <A16CAA7A-935A-11D9-B647-000A959EB4C4@gmx.net>
References: <A16CAA7A-935A-11D9-B647-000A959EB4C4@gmx.net>
Message-ID: <4235BBB8.2060900@utk.edu>

Hilmar,

Hilmar Lapp wrote:

>
> On Friday, March 11, 2005, at 11:02  AM, Stefan Kirov wrote:
>
>>
>>
>> Hilmar Lapp wrote:
>>
>>> Gene shouldn't be fundamentally different from LocusLink, and 
>>> LocusLink was represented as an annotated SeqI within bioperl.
>>
>>
>> It is not, you are right.
>>
>>>
>>> If at all possible I'd still like it to remain that way for Gene in 
>>> order to allow for a smooth transition from LL to Gene for code 
>>> that's been using the former.
>>>
>> hmmmm, back compatibility is good thing, but sometimes it may be hard 
>> to achieve.
>
>
> Well, now you contradict yourself. Above you agree that Gene and 
> LocusLink are fundamentally the same, and here you say representing 
> them in a compatible fashion may be hard to achieve ...

Not really. They are fairly similar, but not completely and moreover, I 
believe LocusLink parser wouldn't deal with hierarchies.... It just puts 
everything in Annotation objects, thus loosing the relationships 
(correct me if I am wrong here). Same with homologs.

>
> There are problems indeed though, read on ...
>
>>
>>> If you want to emphasize the fact that it's a container for 
>>> sequences, then that sounds like a ClusterI to me, which can be 
>>> richly annotated too.
>>
>>
>> Let me disagree here. Cluster is designed for independent sequences, 
>> where Gene should deal with sequences, that have hierarchical 
>> relationship among themselves.
>
>
> Two notes here. First, ClusterI is not designed for independent 
> sequences. It is just meant as a container for sequences, be those 
> related to each other or not.

OK, I meant independent as in "I don't know what is your relationship". 
My point is it is not fit to describe the hierarchy here.

>
> Second, the ability to represent hierarchical relationships between 
> sequences is basically absent from bioperl, not just from ClusterI 
> (aside from ClusterI representing a relationship between the 
> containing seq and the contained seqs).
>
> We should think seriously before we add that capability. Most of the 
> people and effort in the field towards hierarchical relationships 
> between biological entities with sequence takes place in the domain of 
> feature hierarchies, *not* sequence hierarchies. See GFF3, SO, 
> GBrowse, Chado, and related efforts.

I belive it is reasonable to have this functionality. Anyway I see 
sequence vs sequence feature hierarchy more as a philosophical question 
with a little practical value (unless I am missing something important). 
By the ways isn't GBrowse mysql based?

>
> The only place I know where sequence heirarchies are extensively used 
> is in our local adaptation of Biosql, and we do all of this in SQL (as 
> bioperl and therefore bioperl-db has zero support for it).
>
> It's possible but I'm not sure also wise to duplicate the support for 
> feature hierarchies to sequences ... Wouldn't it in the end benefit 
> more people if you were able to tie in Gene into the Unflattener that 
> Chris wrote?
>
>>  This is one of the issues I think  Seq object is not designed to 
>> deal with.  What we need is:
>> genome--(Bio::Seq)-
>>                   |--transcript(Bio::Seq)
>>                                          |--protein(Bio::Seq)
>>                     |--transcript(Bio::Seq)
>>                                          |--protein(Bio::Seq)
>
>
> Well, yeah, if you replace Bio::Seq with Bio::SeqFeatureI you are 
> pretty close to GFF3 and a growing wealth of support for it.
>
>>
>> Another significant concern I have is that if we store everything as 
>> SeqFeature or the overhead may become huge (some records have 
>> hundreds of different features)
>
>
> Have you talked to Lincoln about this? I believe GBrowse is dealing 
> pretty well with this huge overhead but I may be missing something here.
>
No, I have not, I guess I should...

>
>> [...] and any user of the parser will have to do quite of a data 
>> mining to find the relevant feature. One approach would be to add 
>> more Bio::Annotation:: objects (for example Bio::Annotation::STS, 
>> Bio::Annotation::GRIF, etc).
>
>
> Possibly. Bio::Annotation objects was in fact what I was primarily 
> referring to when I spoke about annotation.
>
So do we agree that Bio::Annotation needs some expansion? What other 
people think?

>> We may decide to create a simplified (Bio::Seq, no relationships) or 
>> more complex object (Gene), based on the user request.
>
>
> Just as an aside, I guess you know that there is a Gene object 
> already, but it's feature based.

Yes, but actually Bio::LiveSeq::Gene (vs Bio::SeqFeature::Gene) is more 
like what I had in mind (it lacks documentation and relationships I 
think, but is a good start), but still what about phylogeny?

>
>     -hilmar


From amackey at pcbi.upenn.edu  Mon Mar 14 11:57:23 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Mon Mar 14 11:53:54 2005
Subject: [Bioperl-l] strange error after changing to RC1.5
In-Reply-To: <59ff1e0691dce94b58f4bc0a0432ca4a@tigem.it>
References: <FAE79F74-9258-11D9-8881-000A959EB4C4@gmx.net>	<Pine.LNX.4.58.0503111126240.31639@sumo.ctrl.ucla.edu>
	<59ff1e0691dce94b58f4bc0a0432ca4a@tigem.it>
Message-ID: <4235C273.7090305@pcbi.upenn.edu>


Elia Stupka wrote:

> As an external observer I can assure you that 
> the 1.5 changes are causing a lot of trouble in the real world, many of 
> which you don't get on the mailing list. Quite a few people are keeping 
> 1.4 for their day to day work and using 1.5 only when it is required 
> (e.g. gbrowse).

So how can we possibly address these issues if we don't know about them? 
  1.5 is a developer's, not stable release.  It wouldn't surprise me 
that "critical" code bases are not ready to use 1.5

> Bioperl, because of its wide usage by a non-developer 
> crowd has most definitely become the sort of project where code elegance 
> and efficiency and conceptual issues are much less of a priority than 
> stability and usability.

So is BioPerl a stable project, or a dead project?  BioPerl has hardly 
ever been (greatly) concerned with usability ...

-Aaron
From tembe at bioanalysis.org  Mon Mar 14 12:19:59 2005
From: tembe at bioanalysis.org (Waibhav Tembe)
Date: Mon Mar 14 12:25:19 2005
Subject: [Bioperl-l] Help with String::Approx
In-Reply-To: <422208AA.1000709@cenix-bioscience.com>
References: <421F79EE.2080503@bioanalysis.org>
	<422208AA.1000709@cenix-bioscience.com>
Message-ID: <4235C7BF.8090100@bioanalysis.org>

Hello,

Thanks for the advice to use String::Approx.

I installed String::Approx and it seems to be functional. Just as a 
check, I am trying to run different utilities such as adist, aindex etc. 
by following the examples on CPAN's string::approx page and can't seem 
to run aslice utility. For the following code, adist and aindex seem to 
work fine. But aslice outputs something unexpected.

$F = "xxxx";
$S = "zzzxxyxyyy";
print "Edit = ", adist($F, $S), "\n";
$index = aindex($F, $S);
print "Matches at ", $index, "\n";
($index, $size) = aslice($F, $S);
print "Matches at ", $index, "\tSize is ", $size, "\n";
($index, $size, $d) = aslice($F, $S);
print "Matches at ", $index, "\tSize is ", $size, "Distance is ", $d, "\n";

output:
Edit = 1
Matches at 3
Matches at ARRAY(0x9cc9d98)     Size is
Matches at ARRAY(0x9ddfbc0)     Size is Distance is

Any help to fix this and to use Approx utility for:
1. Extracting the approximate match from $S
2. At least finding the length of the match and correct index in $S

will be appreciated.

Thanks.

Tembe


Andrew Walsh wrote:

> Hello,
>
> The following cpan module may be of interest:
>
> String::Approx
>
> Cheers,
>
> Andrew
>
>
> Waibhav Tembe wrote:
>
>> Hello,
>>
>> I was wondering if there is any Perl implementation for 
>> "k-differences" string matching algorithm using dynamic programming. 
>> More precisely, given two string s1 and s2, the program finds an 
>> alignment, if one exists, that has less than or equal to k (a 
>> parameter) no. of differences. The differences include mismatches and 
>> indels.
>>
>> Any pointers will be welcome.
>>
>> Thanks.
>>
>> Tembe
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

From allenday at ucla.edu  Mon Mar 14 12:48:08 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon Mar 14 12:42:38 2005
Subject: [Bioperl-l] 1.6 release
In-Reply-To: <7ce85432cbc83b38f79a3aa5320bfeea@duke.edu>
References: <4231628B.4010007@ed.ac.uk>
	<7ce85432cbc83b38f79a3aa5320bfeea@duke.edu>
Message-ID: <Pine.LNX.4.58.0503140947170.15826@sumo.ctrl.ucla.edu>

On Mon, 14 Mar 2005, Jason Stajich wrote:

> [using this post to also advocate for volunteers even though you were 
> just trying to read on when your module changes needed to go in]
> 
> At least from my POV there isn't really a plan for a 1.6 release date.  
> I was hoping it could released before BOSC this summer.
> 
> We still need a release-master to do 1.6 and a lot of recently added 
> stuff needs to be cleaned up and re-tested before we will think about 
> doing a stable release.  I don't know when we will start a 1.6 branch 
> in preparation for the release.  I think this time around we will 
> branch and make the stable release off the branch instead of our normal 
> releasing off the main trunk.  This gives us the flexibility to prune 
> modules which are too new or add ports to support backwards 
> compatibility.
> 
> 
> It was decided that the new Feature/Annotation stuff won't be part of 
> the stable release 1.6 but would be considered for 1.8 once it is 
> proved to be stable.  If backwards compatible patches can be made so 
> the API established in Bioperl 1.4 is still respected (and no 
> additional XML or Graph modules are needed for the core Feature and 
> Annotation objects to work) we can consider some compromises.   [Scott] 

No problem.  I can remove these dependencies.

> I realize that GMOD/Gbrowse has begun relying on this so a plan will 
> need to be discussed, outlining exactly what new functionality is 
> expected.
> 
> We will need a volunteer to be the release master/pumpkin and several 
> people to help in the testing and bug fixing prior to the release.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> On Mar 11, 2005, at 4:19 AM, Richard Adams wrote:
> 
> > Hello,
> > Is there any schedule for the 1.6 release?
> > just to know by when I have to get by modules working.....
> >
> > Richard
> >
> > -- 
> > Dr Richard Adams
> > Psychiatric Genetics Group,
> > Medical Genetics,
> > Molecular Medicine Centre,
> > Western General Hospital,
> > Crewe Rd West,
> > Edinburgh UK
> > EH4 2XU
> >
> > Tel: 44 131 651 1084
> > richard.adams@ed.ac.uk
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
From pstogios at uhnres.utoronto.ca  Mon Mar 14 14:54:35 2005
From: pstogios at uhnres.utoronto.ca (Peter J Stogios)
Date: Mon Mar 14 14:49:02 2005
Subject: [Bioperl-l] Refseq and Splice Variants
Message-ID: <E3F8573D-94C2-11D9-BB48-000A95BDE2C6@uhnres.utoronto.ca>

Hi,

I am wondering if there is a way of easily identifying Refseq sequences 
that are splice variants of the same gene.  If a gene has multiple 
splice products that are supported by experimental evidence, they get 
their own Refseq identifier, but there is no explicit reference to the 
underlying gene they came from (outside of the identifier line).

What I am trying to do is group sets of Refseq sequences in FASTA 
format into sets of splice variants of the same gene.  Does anyone know 
of a way, using Bioperl, that I can accomplish this?

Thanks,

~
Peter J Stogios
Ph.D. candidate, Priv? Lab
Dept. of Medical Biophysics, University of Toronto
Ontario Cancer Institute, Princess Margaret Hospital
e: pstogios@uhnres.utoronto.ca
w: http://xtal.uhnres.utoronto.ca/prive
p: (416) 946-4501x3280

From yanfeng at csit.fsu.edu  Mon Mar 14 15:29:03 2005
From: yanfeng at csit.fsu.edu (yanfeng)
Date: Mon Mar 14 15:23:14 2005
Subject: [Bioperl-l] How to use trnascan.pm
Message-ID: <4235F40F.7020808@csit.fsu.edu>

Hi,
Is there anyone knows how to use trnascan.pm.
I want to use that to locate the tRNA of my lab seuqnces.
Thanks.
Fisher
From skirov at utk.edu  Mon Mar 14 15:34:32 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Mon Mar 14 15:29:07 2005
Subject: [Bioperl-l] Refseq and Splice Variants
In-Reply-To: <E3F8573D-94C2-11D9-BB48-000A95BDE2C6@uhnres.utoronto.ca>
References: <E3F8573D-94C2-11D9-BB48-000A95BDE2C6@uhnres.utoronto.ca>
Message-ID: <4235F558.1060602@utk.edu>

What is your initial id- refseq or gene? Do you want all of them or just 
some. In any case LL_tmpl (locuslink file) has this data and there is a 
parser for it (hopefully an Entrez gene parser will be there soon). Also 
you can get  gene2refseq file from here 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/. It is tab delimited and pretty 
easy to use. You need columns 1 and 4 I think.
Stefan

Peter J Stogios wrote:

> Hi,
>
> I am wondering if there is a way of easily identifying Refseq 
> sequences that are splice variants of the same gene.  If a gene has 
> multiple splice products that are supported by experimental evidence, 
> they get their own Refseq identifier, but there is no explicit 
> reference to the underlying gene they came from (outside of the 
> identifier line).
>
> What I am trying to do is group sets of Refseq sequences in FASTA 
> format into sets of splice variants of the same gene.  Does anyone 
> know of a way, using Bioperl, that I can accomplish this?
>
> Thanks,
>
> ~
> Peter J Stogios
> Ph.D. candidate, Priv? Lab
> Dept. of Medical Biophysics, University of Toronto
> Ontario Cancer Institute, Princess Margaret Hospital
> e: pstogios@uhnres.utoronto.ca
> w: http://xtal.uhnres.utoronto.ca/prive
> p: (416) 946-4501x3280
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From mingyi.liu at gpc-biotech.com  Mon Mar 14 16:15:32 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Mon Mar 14 16:11:34 2005
Subject: [Bioperl-l] Error reporting/Validation implemented
In-Reply-To: <97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
Message-ID: <4235FEF4.2070901@gpc-biotech.com>

Hi, there,

I just implemented basic error reporting and validation functionalities 
in my Entrez Gene parser in Perl (the regex version).  The validation 
will catch all non-conforming data, while error reporting reports line 
number, error type, and the first 20 (customizable) characters of the 
offending data (but the line number could be incorrect if the format 
resulted in an exception, which is hard to deal with for ASN.1-formatted 
data, although easy for XML parsers). 

The speed for the parser of course slowed down, but I'd say it'd still 
beat most parsers hands down.  The full human genome now takes a bit 
over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 
GHz CPU.  So I don't think my parser's speed has much to do with 
performing validation or not.

I had also communicated with Stefan Kirov and turns out the dead entries 
and 0-sized (should be 1-sized) arrays were simply related to data 
trimming options.  So far, so good.

If anyone is interested, check it out at 
http://www.sourceforge.net/projects/egparser/.

Regards,

Mingyi


From iluminati at earthlink.net  Mon Mar 14 16:22:40 2005
From: iluminati at earthlink.net (iluminati@earthlink.net)
Date: Mon Mar 14 16:14:57 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq
Message-ID: <423600A0.2080109@earthlink.net>

I'm having this unuusal problem with loading this particular module.  I 
need b/c I'm working with chromosome-sized sequence files as a part of 
my project, but yet it seems to not want to load properly even when it's 
loaded using the following statement:

use Bio::Seq::LargePrimarySeq;

I checked my modules, and the necessary module is there.  It seems to 
just not want to load.  Can anyone be of service?

From garrettsorensen at gmail.com  Mon Mar 14 19:03:49 2005
From: garrettsorensen at gmail.com (Garrett Sorensen)
Date: Mon Mar 14 18:58:27 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq
In-Reply-To: <423600A0.2080109@earthlink.net>
References: <423600A0.2080109@earthlink.net>
Message-ID: <d8fb9af9050314160361c85403@mail.gmail.com>

I've had the same issue... I ended up breaking down the sequences into
manageable fragments but would really like to get the largePrimarySeq
working.  When I tried loading a chrom size sequence I just sat back
and watched my RAM get used up (2 gigs), then the swap, then the
crash....  So if anyone can help it'd benefit both of us!

Thanks for any help,
Garrett


On Mon, 14 Mar 2005 16:22:40 -0500, iluminati@earthlink.net
<iluminati@earthlink.net> wrote:
> I'm having this unuusal problem with loading this particular module.  I
> need b/c I'm working with chromosome-sized sequence files as a part of
> my project, but yet it seems to not want to load properly even when it's
> loaded using the following statement:
> 
> use Bio::Seq::LargePrimarySeq;
> 
> I checked my modules, and the necessary module is there.  It seems to
> just not want to load.  Can anyone be of service?
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From skirov at utk.edu  Mon Mar 14 19:50:04 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Mon Mar 14 19:44:52 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq]
Message-ID: <4236313C.4010006@utk.edu>

First you have to answer few questions: how do you get the object?/
use Bio::Seq::LargePrimarySeq does not create an object it merely makes the code available.
/If you post you code here it will be much easier to answer your questions. How do you access the sequence 
(I hope you have read the documentation, which states that it is not generally a good idea to call $seq->seq).
How big is you /tmp? What are trying to accomplish and why you need the whole seq in memory?
Stefan

I've had the same issue... I ended up breaking down the sequences into
manageable fragments but would really like to get the largePrimarySeq
working.  When I tried loading a chrom size sequence I just sat back
and watched my RAM get used up (2 gigs), then the swap, then the
crash....  So if anyone can help it'd benefit both of us!

Thanks for any help,
Garrett


On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
<iluminati at earthlink.net <http://portal.open-bio.org/mailman/listinfo/bioperl-l>> wrote:
>/ I'm having this unuusal problem with loading this particular module.  I
/>/ need b/c I'm working with chromosome-sized sequence files as a part of
/>/ my project, but yet it seems to not want to load properly even when it's
/>/ loaded using the following statement:
/>/ 
/>/ use Bio::Seq::LargePrimarySeq;
/>/ 
/>/ I checked my modules, and the necessary module is there.  It seems to
/>/ just not want to load.  Can anyone be of service?
/>/ 
/>/ _______________________________________________
/>/ Bioperl-l mailing list
/>/ Bioperl-l at portal.open-bio.org <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
/>/ http://portal.open-bio.org/mailman/listinfo/bioperl-l
/>

From iluminati at earthlink.net  Mon Mar 14 21:39:43 2005
From: iluminati at earthlink.net (iluminati@earthlink.net)
Date: Mon Mar 14 21:32:14 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq]
In-Reply-To: <4236313C.4010006@utk.edu>
References: <4236313C.4010006@utk.edu>
Message-ID: <42364AEF.6050106@earthlink.net>

Thanks for asking the questions!  In hindsight, I realized that I 
glossed over the problem in my frustration. 

Anyway, here's the drill.  I created a seq object from a 
chromosome-sized fasta file like so...

my $seqio = new Bio::SeqIO('-format'=>'largefasta',
                            '-file'  =>Bio::Root::IO->catfile("/Thesis 
Stuff/Chr$Chromosome/chr$Chromosome.fa"));
   
    #Create the seq object
    my $seq = $seqio->next_seq();

 From there, I want to manipulate the sequence and use the functions 
generally available to a seq object.  Now, in order to the build the seq 
object, I have to use the Bio::Seq::largefasta module.  The reason I 
need the Bio::Seq::LargePrimarySeq module is so that I can manipulate 
the sequence and get to the necessary functions.  However, I get this 
error running the script despite including the 
Bio::Seq:::LargePrimarySeq module:

Can't locate object method "add_SeqFeature" via package 
"Bio::Seq::LargePrimaryS
eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at 
ThesisFrontEndS
cript.pl line 94, <GeneExpressionData> line 33294.


I can send you the code in question if you want to get a better look-see. 

Now, the reason I need the whole sequence is two-fold.  For one, I need 
to be able to calculate CG% of genes as an experimental control of my 
project.  The other part is that I need to be able to scan the genome 
for polyA sites with respect to their orientation to L1 sites, and 
there's no simple way to do that other than flat-out scanning the code.  
I'll definitely look into tweaking the /$tmp directory if that helps, 
but other than that, I have to at least try and make it work.

Stefan Kirov wrote:

> First you have to answer few questions: how do you get the object?/
> use Bio::Seq::LargePrimarySeq does not create an object it merely 
> makes the code available.
> /If you post you code here it will be much easier to answer your 
> questions. How do you access the sequence (I hope you have read the 
> documentation, which states that it is not generally a good idea to 
> call $seq->seq).
> How big is you /tmp? What are trying to accomplish and why you need 
> the whole seq in memory?
> Stefan
>
> I've had the same issue... I ended up breaking down the sequences into
> manageable fragments but would really like to get the largePrimarySeq
> working.  When I tried loading a chrom size sequence I just sat back
> and watched my RAM get used up (2 gigs), then the swap, then the
> crash....  So if anyone can help it'd benefit both of us!
>
> Thanks for any help,
> Garrett
>
>
> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net 
> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
> <iluminati at earthlink.net 
> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>> wrote:
>
>> / I'm having this unuusal problem with loading this particular 
>> module.  I
>
> />/ need b/c I'm working with chromosome-sized sequence files as a 
> part of
> />/ my project, but yet it seems to not want to load properly even 
> when it's
> />/ loaded using the following statement:
> />/ />/ use Bio::Seq::LargePrimarySeq;
> />/ />/ I checked my modules, and the necessary module is there.  It 
> seems to
> />/ just not want to load.  Can anyone be of service?
> />/ />/ _______________________________________________
> />/ Bioperl-l mailing list
> />/ Bioperl-l at portal.open-bio.org 
> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l
> />
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From amackey at pcbi.upenn.edu  Mon Mar 14 12:39:26 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Mon Mar 14 21:43:29 2005
Subject: [Bioperl-l] Aggressive aggregation?
In-Reply-To: <200503101649.11352.lstein@cshl.edu>
References: <Pine.LNX.4.50.0503082112350.14042-100000@sausage.usask.ca>	<3503c6582ad58219fe9c590fe09a0f46@pcbi.upenn.edu>	<Pine.LNX.4.50.0503101105050.19841-100000@sausage.usask.ca>
	<200503101649.11352.lstein@cshl.edu>
Message-ID: <4235CC4E.4060000@pcbi.upenn.edu>

In the "FWIW" category:

This is what I did to break the "aggressive aggregation" (attached 
patch); it relies on the fact that when aggregation occurs, the base 
feature's range always (at least in my use cases so far) contains (or at 
least overlaps) the subfeature's ranges.  So in the code below, when 
more than one base feature is detected, then range checking kicks in. 
This won't help you if, for instance, you're saving separate HSP linking 
information as different hits (because the hits will still overlap), but 
it does solve the more common case of one protein/EST matching in 
multiple, distinct locations on the genome.

-Aaron


-------------- next part --------------
diff -u -r1.30 Aggregator.pm
--- Aggregator.pm       3 Aug 2004 09:17:23 -0000       1.30
+++ Aggregator.pm       14 Mar 2005 17:45:35 -0000
@@ -303,7 +303,7 @@
           ? join ($;,$feature->group,$feature->refseq,$feature->source)
           : join ($;,$feature->group,$feature->refseq);
       if ($main_method && lc $feature->method eq lc $main_method) {
-       $aggregates{$key}{base} ||= $feature->clone;
+       push @{$aggregates{$key}{base}}, $feature->clone;
       } else {
        push @{$aggregates{$key}{subparts}},$feature;
       }
@@ -321,18 +321,29 @@
     if ($require_whole_object && $self->components) {
       next unless $aggregates{$_}{base}; # && $aggregates{$_}{subparts};
     }
-    my $base = $aggregates{$_}{base};
+
+    my $base = shift @{$aggregates{$_}{base} || []};
     unless ($base) { # no base, so create one
       my $first = $aggregates{$_}{subparts}[0];
       $base = $first->clone;     # to inherit parent coordinate system, etc
       $base->score(undef);
       $base->phase(undef);
     }
-    $base->method($pseudo_method);
-    $base->add_subfeature($_) foreach @{$aggregates{$_}{subparts}};
-    $base->adjust_bounds;
-    $base->compound(1);  # set the compound flag
-    push @result,$base;
+    while ($base) {
+      $base->method($pseudo_method);
+      if (@{$aggregates{$_}{base} || []}) {
+       # only capture those subfeatures that overlap the base
+       for my $part (@{$aggregates{$_{subparts}}}) {
+         $base->add_subfeature($part) if $part->overlaps($base, "strong");
+       }
+      } else {
+       $base->add_subfeature($_) foreach @{$aggregates{$_}{subparts}};
+      }
+      $base->adjust_bounds;
+      $base->compound(1);  # set the compound flag
+      push @result,$base;
+      $base = shift @{$aggregates{$_}{base} || []}
+    }
   }
   @$features = @result;
 }
From jason.stajich at duke.edu  Mon Mar 14 21:48:40 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Mon Mar 14 21:43:34 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq]
In-Reply-To: <42364AEF.6050106@earthlink.net>
References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net>
Message-ID: <5e75fad6beff228d45e99a6da3129418@duke.edu>


--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 14, 2005, at 9:39 PM, iluminati@earthlink.net wrote:

> Thanks for asking the questions!  In hindsight, I realized that I 
> glossed over the problem in my frustration.
> Anyway, here's the drill.  I created a seq object from a 
> chromosome-sized fasta file like so...
>
> my $seqio = new Bio::SeqIO('-format'=>'largefasta',
>                            '-file'  =>Bio::Root::IO->catfile("/Thesis 
> Stuff/Chr$Chromosome/chr$Chromosome.fa"));
>      #Create the seq object
>    my $seq = $seqio->next_seq();
>
> From there, I want to manipulate the sequence and use the functions 
> generally available to a seq object.  Now, in order to the build the 
> seq object, I have to use the Bio::Seq::largefasta module.  The reason 
> I need the Bio::Seq::LargePrimarySeq module is so that I can 
> manipulate the sequence and get to the necessary functions.  However, 
> I get this error running the script despite including the 
> Bio::Seq:::LargePrimarySeq module:
>
> Can't locate object method "add_SeqFeature" via package 
> "Bio::Seq::LargePrimaryS
> eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at 
> ThesisFrontEndS
> cript.pl line 94, <GeneExpressionData> line 33294.
>
>
> I can send you the code in question if you want to get a better 
> look-see.
> Now, the reason I need the whole sequence is two-fold.  For one, I 
> need to be able to calculate CG% of genes as an experimental control 
> of my project.  The other part is that I need to be able to scan the 
> genome for polyA sites with respect to their orientation to L1 sites, 
> and there's no simple way to do that other than flat-out scanning the 
> code.  I'll definitely look into tweaking the /$tmp directory if that 
> helps, but other than that, I have to at least try and make it work.
>
You are still going to need to chunk it into pieces to do the scanning 
anyways - if you call $seq->seq() you will pull the entire chromosome 
into memory.

You should consider doing things with Bio::DB::Fasta which implements 
an efficient indexed version of getting the sequences.  If you want to 
add annotation consider Bio::DB::GFF system for doing all of this it is 
really more efficient.

-jason

> Stefan Kirov wrote:
>
>> First you have to answer few questions: how do you get the object?/
>> use Bio::Seq::LargePrimarySeq does not create an object it merely 
>> makes the code available.
>> /If you post you code here it will be much easier to answer your 
>> questions. How do you access the sequence (I hope you have read the 
>> documentation, which states that it is not generally a good idea to 
>> call $seq->seq).
>> How big is you /tmp? What are trying to accomplish and why you need 
>> the whole seq in memory?
>> Stefan
>>
>> I've had the same issue... I ended up breaking down the sequences into
>> manageable fragments but would really like to get the largePrimarySeq
>> working.  When I tried loading a chrom size sequence I just sat back
>> and watched my RAM get used up (2 gigs), then the swap, then the
>> crash....  So if anyone can help it'd benefit both of us!
>>
>> Thanks for any help,
>> Garrett
>>
>>
>> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net 
>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>> <iluminati at earthlink.net 
>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>> wrote:
>>
>>> / I'm having this unuusal problem with loading this particular 
>>> module.  I
>>
>> />/ need b/c I'm working with chromosome-sized sequence files as a 
>> part of
>> />/ my project, but yet it seems to not want to load properly even 
>> when it's
>> />/ loaded using the following statement:
>> />/ />/ use Bio::Seq::LargePrimarySeq;
>> />/ />/ I checked my modules, and the necessary module is there.  It 
>> seems to
>> />/ just not want to load.  Can anyone be of service?
>> />/ />/ _______________________________________________
>> />/ Bioperl-l mailing list
>> />/ Bioperl-l at portal.open-bio.org 
>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> />
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050314/3406d818/PGP.bin
From hlapp at gmx.net  Tue Mar 15 03:34:37 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue Mar 15 03:29:15 2005
Subject: [Bioperl-l] Full uniprot annotation extraction
In-Reply-To: <1110788242.423548925f306@sms.ed.ac.uk>
Message-ID: <10B58B33-952D-11D9-BBA7-000A959EB4C4@gmx.net>


On Monday, March 14, 2005, at 12:17  AM, SG Edwards wrote:

> I use Bio::DB::SwissProt to get the major annotation (e.g. primary 
> accession
> number) but is there a way to get other annotation also (e.g. date of 
> the last
> update?)
>

Swissprot (and uniprot) entries are parsed by the Bio::SeqIO::swiss 
parser which returns a Bio::Seq::RichSeqI object. Check out it's POD 
for some shortcut methods to get at specific annotation. 
$seq->get_dates() will return an array of dates as present in the 
entry; the date of last update will be the last element.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From birney at ebi.ac.uk  Tue Mar 15 03:52:22 2005
From: birney at ebi.ac.uk (Ewan Birney)
Date: Tue Mar 15 04:21:50 2005
Subject: [Bioperl-l] Refseq and Splice Variants
In-Reply-To: <E3F8573D-94C2-11D9-BB48-000A95BDE2C6@uhnres.utoronto.ca>
References: <E3F8573D-94C2-11D9-BB48-000A95BDE2C6@uhnres.utoronto.ca>
Message-ID: <4236A246.9080506@ebi.ac.uk>


Peter J Stogios wrote:
> Hi,
> 
> I am wondering if there is a way of easily identifying Refseq sequences 
> that are splice variants of the same gene.  If a gene has multiple 
> splice products that are supported by experimental evidence, they get 
> their own Refseq identifier, but there is no explicit reference to the 
> underlying gene they came from (outside of the identifier line).
> 
> What I am trying to do is group sets of Refseq sequences in FASTA format 
> into sets of splice variants of the same gene.  Does anyone know of a 
> way, using Bioperl, that I can accomplish this?
> 

One way to handle this is to use Ensembl's genes/transcript links, and
each transcript is linked to its RefSeq if it has one.

The easiest way to do this is via Mart.

Go to Ensembl, Click on Mart, Click on Human and Ensembl Genes,
in filter make sure you don't have a genome filter on, optional
select "Genes with RefSeq IDs" if you are only interested in the
RefSeq subset, then click
on next, and in Output, select

Ensembl Gene-ID, Ensembl Transcript-ID, RefSeq-ID

this will give you the 3 way table to use (you can get this as tab
delimited).

On route, you can note in filter how many different constraints
you can make on this :)


> Thanks,
> 
> ~
> Peter J Stogios
> Ph.D. candidate, Priv? Lab
> Dept. of Medical Biophysics, University of Toronto
> Ontario Cancer Institute, Princess Margaret Hospital
> e: pstogios@uhnres.utoronto.ca
> w: http://xtal.uhnres.utoronto.ca/prive
> p: (416) 946-4501x3280
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From skirov at utk.edu  Tue Mar 15 08:51:02 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar 15 08:46:16 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq]
In-Reply-To: <42364AEF.6050106@earthlink.net>
References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net>
Message-ID: <4236E846.4020202@utk.edu>

Your first problem is that you cannot access SeqFeature methods 
(Annotation as well) as LargePrimarySeq inherits (unlike LargeSeq) only 
from PrimarySeq. Therefore you don't have these methods available. Two 
approaches: create an empty Seq object to hold the annotation, or read 
the file into a new LargeSeq object:
my $id=<>;
while (<>) {
chomp;
$largeseq->add_sequence_as_string;
}
I don't know what the performance will be with the second object when 
you add features. First one is far safer I think.
Next problem performance screening the sequence. Unless you have 
something BIG it is likely you will have to split the sequence in at 
least several chunks (then you can see if this disrupted a signal, 
site..etc.) or get few gigs more for your RAM (best would be some shared 
memory and a grid if you want to kill a fly with a tank :-) ).
Let me know if you have further questions.
Hope this helps and good luck.
Stefan

iluminati@earthlink.net wrote:

> Thanks for asking the questions!  In hindsight, I realized that I 
> glossed over the problem in my frustration.
> Anyway, here's the drill.  I created a seq object from a 
> chromosome-sized fasta file like so...
>
> my $seqio = new Bio::SeqIO('-format'=>'largefasta',
>                            '-file'  =>Bio::Root::IO->catfile("/Thesis 
> Stuff/Chr$Chromosome/chr$Chromosome.fa"));
>      #Create the seq object
>    my $seq = $seqio->next_seq();
>
> From there, I want to manipulate the sequence and use the functions 
> generally available to a seq object.  Now, in order to the build the 
> seq object, I have to use the Bio::Seq::largefasta module.  The reason 
> I need the Bio::Seq::LargePrimarySeq module is so that I can 
> manipulate the sequence and get to the necessary functions.  However, 
> I get this error running the script despite including the 
> Bio::Seq:::LargePrimarySeq module:
>
> Can't locate object method "add_SeqFeature" via package 
> "Bio::Seq::LargePrimaryS
> eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at 
> ThesisFrontEndS
> cript.pl line 94, <GeneExpressionData> line 33294.
>
>
> I can send you the code in question if you want to get a better look-see.
> Now, the reason I need the whole sequence is two-fold.  For one, I 
> need to be able to calculate CG% of genes as an experimental control 
> of my project.  The other part is that I need to be able to scan the 
> genome for polyA sites with respect to their orientation to L1 sites, 
> and there's no simple way to do that other than flat-out scanning the 
> code.  I'll definitely look into tweaking the /$tmp directory if that 
> helps, but other than that, I have to at least try and make it work.
>
> Stefan Kirov wrote:
>
>> First you have to answer few questions: how do you get the object?/
>> use Bio::Seq::LargePrimarySeq does not create an object it merely 
>> makes the code available.
>> /If you post you code here it will be much easier to answer your 
>> questions. How do you access the sequence (I hope you have read the 
>> documentation, which states that it is not generally a good idea to 
>> call $seq->seq).
>> How big is you /tmp? What are trying to accomplish and why you need 
>> the whole seq in memory?
>> Stefan
>>
>> I've had the same issue... I ended up breaking down the sequences into
>> manageable fragments but would really like to get the largePrimarySeq
>> working.  When I tried loading a chrom size sequence I just sat back
>> and watched my RAM get used up (2 gigs), then the swap, then the
>> crash....  So if anyone can help it'd benefit both of us!
>>
>> Thanks for any help,
>> Garrett
>>
>>
>> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net 
>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>> <iluminati at earthlink.net 
>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>> wrote:
>>
>>> / I'm having this unuusal problem with loading this particular 
>>> module.  I
>>
>>
>> />/ need b/c I'm working with chromosome-sized sequence files as a 
>> part of
>> />/ my project, but yet it seems to not want to load properly even 
>> when it's
>> />/ loaded using the following statement:
>> />/ />/ use Bio::Seq::LargePrimarySeq;
>> />/ />/ I checked my modules, and the necessary module is there.  It 
>> seems to
>> />/ just not want to load.  Can anyone be of service?
>> />/ />/ _______________________________________________
>> />/ Bioperl-l mailing list
>> />/ Bioperl-l at portal.open-bio.org 
>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> />
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>

From iluminati at earthlink.net  Tue Mar 15 09:27:05 2005
From: iluminati at earthlink.net (iluminati@earthlink.net)
Date: Tue Mar 15 09:22:52 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq]
In-Reply-To: <4236E846.4020202@utk.edu>
References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net>
	<4236E846.4020202@utk.edu>
Message-ID: <4236F0B9.2040509@earthlink.net>

I see your point with regard to splitting it up.  I was debating how 
exactly to do the split myself, and I'll work on that later as I look at 
the code.  One quick question about the code snipet you posted, though.  
Are you trying to create a separate stream from the extant SeqIO?  If 
so, why?  Wouldn't it be redundant?

Stefan Kirov wrote:

> Your first problem is that you cannot access SeqFeature methods 
> (Annotation as well) as LargePrimarySeq inherits (unlike LargeSeq) 
> only from PrimarySeq. Therefore you don't have these methods 
> available. Two approaches: create an empty Seq object to hold the 
> annotation, or read the file into a new LargeSeq object:
> my $id=<>;
> while (<>) {
> chomp;
> $largeseq->add_sequence_as_string;
> }
> I don't know what the performance will be with the second object when 
> you add features. First one is far safer I think.
> Next problem performance screening the sequence. Unless you have 
> something BIG it is likely you will have to split the sequence in at 
> least several chunks (then you can see if this disrupted a signal, 
> site..etc.) or get few gigs more for your RAM (best would be some 
> shared memory and a grid if you want to kill a fly with a tank :-) ).
> Let me know if you have further questions.
> Hope this helps and good luck.
> Stefan
>
> iluminati@earthlink.net wrote:
>
>> Thanks for asking the questions!  In hindsight, I realized that I 
>> glossed over the problem in my frustration.
>> Anyway, here's the drill.  I created a seq object from a 
>> chromosome-sized fasta file like so...
>>
>> my $seqio = new Bio::SeqIO('-format'=>'largefasta',
>>                            '-file'  =>Bio::Root::IO->catfile("/Thesis 
>> Stuff/Chr$Chromosome/chr$Chromosome.fa"));
>>      #Create the seq object
>>    my $seq = $seqio->next_seq();
>>
>> From there, I want to manipulate the sequence and use the functions 
>> generally available to a seq object.  Now, in order to the build the 
>> seq object, I have to use the Bio::Seq::largefasta module.  The 
>> reason I need the Bio::Seq::LargePrimarySeq module is so that I can 
>> manipulate the sequence and get to the necessary functions.  However, 
>> I get this error running the script despite including the 
>> Bio::Seq:::LargePrimarySeq module:
>>
>> Can't locate object method "add_SeqFeature" via package 
>> "Bio::Seq::LargePrimaryS
>> eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at 
>> ThesisFrontEndS
>> cript.pl line 94, <GeneExpressionData> line 33294.
>>
>>
>> I can send you the code in question if you want to get a better 
>> look-see.
>> Now, the reason I need the whole sequence is two-fold.  For one, I 
>> need to be able to calculate CG% of genes as an experimental control 
>> of my project.  The other part is that I need to be able to scan the 
>> genome for polyA sites with respect to their orientation to L1 sites, 
>> and there's no simple way to do that other than flat-out scanning the 
>> code.  I'll definitely look into tweaking the /$tmp directory if that 
>> helps, but other than that, I have to at least try and make it work.
>>
>> Stefan Kirov wrote:
>>
>>> First you have to answer few questions: how do you get the object?/
>>> use Bio::Seq::LargePrimarySeq does not create an object it merely 
>>> makes the code available.
>>> /If you post you code here it will be much easier to answer your 
>>> questions. How do you access the sequence (I hope you have read the 
>>> documentation, which states that it is not generally a good idea to 
>>> call $seq->seq).
>>> How big is you /tmp? What are trying to accomplish and why you need 
>>> the whole seq in memory?
>>> Stefan
>>>
>>> I've had the same issue... I ended up breaking down the sequences into
>>> manageable fragments but would really like to get the largePrimarySeq
>>> working.  When I tried loading a chrom size sequence I just sat back
>>> and watched my RAM get used up (2 gigs), then the swap, then the
>>> crash....  So if anyone can help it'd benefit both of us!
>>>
>>> Thanks for any help,
>>> Garrett
>>>
>>>
>>> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net 
>>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>>> <iluminati at earthlink.net 
>>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>> wrote:
>>>
>>>> / I'm having this unuusal problem with loading this particular 
>>>> module.  I
>>>
>>>
>>>
>>> />/ need b/c I'm working with chromosome-sized sequence files as a 
>>> part of
>>> />/ my project, but yet it seems to not want to load properly even 
>>> when it's
>>> />/ loaded using the following statement:
>>> />/ />/ use Bio::Seq::LargePrimarySeq;
>>> />/ />/ I checked my modules, and the necessary module is there.  It 
>>> seems to
>>> />/ just not want to load.  Can anyone be of service?
>>> />/ />/ _______________________________________________
>>> />/ Bioperl-l mailing list
>>> />/ Bioperl-l at portal.open-bio.org 
>>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>>> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>> />
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From skirov at utk.edu  Tue Mar 15 09:53:43 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar 15 09:48:14 2005
Subject: [Bioperl-l] Strange problem with Bio::Seq::LargePrimarySeq]
In-Reply-To: <4236F0B9.2040509@earthlink.net>
References: <4236313C.4010006@utk.edu> <42364AEF.6050106@earthlink.net>
	<4236E846.4020202@utk.edu> <4236F0B9.2040509@earthlink.net>
Message-ID: <4236F6F7.9070103@utk.edu>


iluminati@earthlink.net wrote:

> I see your point with regard to splitting it up.  I was debating how 
> exactly to do the split myself, and I'll work on that later as I look 
> at the code.  One quick question about the code snipet you posted, 
> though.  Are you trying to create a separate stream from the extant 
> SeqIO?  If so, why?  Wouldn't it be redundant?
>
No, this just reads standard file from STDIN (not through bioperl SeqIO) 
and puts it in a LargeSeq instead of LargePrimarySeq. I don't know why 
largefasta has been implemented to create LargePrimarySeq, not LargeSeq 
object, but this is the current state.
Stefan

> Stefan Kirov wrote:
>
>> Your first problem is that you cannot access SeqFeature methods 
>> (Annotation as well) as LargePrimarySeq inherits (unlike LargeSeq) 
>> only from PrimarySeq. Therefore you don't have these methods 
>> available. Two approaches: create an empty Seq object to hold the 
>> annotation, or read the file into a new LargeSeq object:
>> my $id=<>;
>> while (<>) {
>> chomp;
>> $largeseq->add_sequence_as_string;
>> }
>> I don't know what the performance will be with the second object when 
>> you add features. First one is far safer I think.
>> Next problem performance screening the sequence. Unless you have 
>> something BIG it is likely you will have to split the sequence in at 
>> least several chunks (then you can see if this disrupted a signal, 
>> site..etc.) or get few gigs more for your RAM (best would be some 
>> shared memory and a grid if you want to kill a fly with a tank :-) ).
>> Let me know if you have further questions.
>> Hope this helps and good luck.
>> Stefan
>>
>> iluminati@earthlink.net wrote:
>>
>>> Thanks for asking the questions!  In hindsight, I realized that I 
>>> glossed over the problem in my frustration.
>>> Anyway, here's the drill.  I created a seq object from a 
>>> chromosome-sized fasta file like so...
>>>
>>> my $seqio = new Bio::SeqIO('-format'=>'largefasta',
>>>                            '-file'  
>>> =>Bio::Root::IO->catfile("/Thesis 
>>> Stuff/Chr$Chromosome/chr$Chromosome.fa"));
>>>      #Create the seq object
>>>    my $seq = $seqio->next_seq();
>>>
>>> From there, I want to manipulate the sequence and use the functions 
>>> generally available to a seq object.  Now, in order to the build the 
>>> seq object, I have to use the Bio::Seq::largefasta module.  The 
>>> reason I need the Bio::Seq::LargePrimarySeq module is so that I can 
>>> manipulate the sequence and get to the necessary functions.  
>>> However, I get this error running the script despite including the 
>>> Bio::Seq:::LargePrimarySeq module:
>>>
>>> Can't locate object method "add_SeqFeature" via package 
>>> "Bio::Seq::LargePrimaryS
>>> eq" (perhaps you forgot to load "Bio::Seq::LargePrimarySeq"?) at 
>>> ThesisFrontEndS
>>> cript.pl line 94, <GeneExpressionData> line 33294.
>>>
>>>
>>> I can send you the code in question if you want to get a better 
>>> look-see.
>>> Now, the reason I need the whole sequence is two-fold.  For one, I 
>>> need to be able to calculate CG% of genes as an experimental control 
>>> of my project.  The other part is that I need to be able to scan the 
>>> genome for polyA sites with respect to their orientation to L1 
>>> sites, and there's no simple way to do that other than flat-out 
>>> scanning the code.  I'll definitely look into tweaking the /$tmp 
>>> directory if that helps, but other than that, I have to at least try 
>>> and make it work.
>>>
>>> Stefan Kirov wrote:
>>>
>>>> First you have to answer few questions: how do you get the object?/
>>>> use Bio::Seq::LargePrimarySeq does not create an object it merely 
>>>> makes the code available.
>>>> /If you post you code here it will be much easier to answer your 
>>>> questions. How do you access the sequence (I hope you have read the 
>>>> documentation, which states that it is not generally a good idea to 
>>>> call $seq->seq).
>>>> How big is you /tmp? What are trying to accomplish and why you need 
>>>> the whole seq in memory?
>>>> Stefan
>>>>
>>>> I've had the same issue... I ended up breaking down the sequences into
>>>> manageable fragments but would really like to get the largePrimarySeq
>>>> working.  When I tried loading a chrom size sequence I just sat back
>>>> and watched my RAM get used up (2 gigs), then the swap, then the
>>>> crash....  So if anyone can help it'd benefit both of us!
>>>>
>>>> Thanks for any help,
>>>> Garrett
>>>>
>>>>
>>>> On Mon, 14 Mar 2005 16:22:40 -0500, iluminati at earthlink.net 
>>>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>>>> <iluminati at earthlink.net 
>>>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>> wrote:
>>>>
>>>>> / I'm having this unuusal problem with loading this particular 
>>>>> module.  I
>>>>
>>>>
>>>>
>>>>
>>>> />/ need b/c I'm working with chromosome-sized sequence files as a 
>>>> part of
>>>> />/ my project, but yet it seems to not want to load properly even 
>>>> when it's
>>>> />/ loaded using the following statement:
>>>> />/ />/ use Bio::Seq::LargePrimarySeq;
>>>> />/ />/ I checked my modules, and the necessary module is there.  
>>>> It seems to
>>>> />/ just not want to load.  Can anyone be of service?
>>>> />/ />/ _______________________________________________
>>>> />/ Bioperl-l mailing list
>>>> />/ Bioperl-l at portal.open-bio.org 
>>>> <http://portal.open-bio.org/mailman/listinfo/bioperl-l>
>>>> />/ http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>> />
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@portal.open-bio.org
>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>

From sasalacolla at libero.it  Tue Mar 15 10:18:17 2005
From: sasalacolla at libero.it (sasalacolla@libero.it)
Date: Tue Mar 15 10:14:05 2005
Subject: [Bioperl-l] help me with "blastall call crashed:-1"
Message-ID: <IDEFUH$2B8B1E413D4B6E087E8386B56BD6B10C@libero.it>

Hi, please help me. I tried to use psort, but i only got this message: 
 
Fatal error: 
------------- EXCEPTION  ------------- 
MSG: blastall call crashed: -1 /usr/bin/blastall -p  blastp  
-d  /usr/local/psort//conf/analysis/sclblast/gramneg/sclblast  
-i  /tmp/IBcvdGTw4w  -e  1e-09  -o  /tmp/NaRiG0fdH8  -F  F 
 
STACK 
Bio::Tools::Run::StandAloneBlast::_runblast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732 
STACK 
Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680STACK 
Bio::Tools::Run::StandAloneBlast::blastall /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536 
STACK 
Bio::Tools::Run::SCLBlast::blast /usr/local/share/perl/5.8.4/Bio/Tools/Run/SCLBlast.pm:134 
STACK 
Bio::Tools::PSort::Module::SCLBlast::run /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Module/SCLBlast.pm:72 
STACK 
Bio::Tools::PSort::Pathway::__ANON__ /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:194 
STACK 
Bio::Tools::PSort::Pathway::traverse /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:157 
STACK 
Bio::Tools::PSort::classify /usr/local/share/perl/5.8.4/Bio/Tools/PSort.pm:160 
STACK (eval) /usr/local/bin/psort:318 
STACK toplevel /usr/local/bin/psort:318 
 
-------------------------------------- 
 

____________________________________________________________
6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero!
Scaricalo su INTERNET GRATIS 6X http://www.libero.it


From sanges at biogem.it  Tue Mar 15 11:13:41 2005
From: sanges at biogem.it (Remo Sanges)
Date: Tue Mar 15 11:10:02 2005
Subject: [Bioperl-l] help me with "blastall call crashed:-1"
In-Reply-To: <IDEFUH$2B8B1E413D4B6E087E8386B56BD6B10C@libero.it>
References: <IDEFUH$2B8B1E413D4B6E087E8386B56BD6B10C@libero.it>
Message-ID: <667061348983a652590b02efabc6637d@biogem.it>

On Mar 15, 2005, at 4:18 PM, sasalacolla@@libero..it wrote:

> Hi, please help me. I tried to use psort, but i only got this message:
>
> Fatal error:
> ------------- EXCEPTION  -------------
> MSG: blastall call crashed: -1 /usr/bin/blastall -p  blastp
> -d  /usr/local/psort//conf/analysis/sclblast/gramneg/sclblast
> -i  /tmp/IBcvdGTw4w  -e  1e-09  -o  /tmp/NaRiG0fdH8  -F  F
>
> STACK
> Bio::Tools::Run::StandAloneBlast::_runblast 
> /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732
> STACK
> Bio::Tools::Run::StandAloneBlast::_generic_local_blast 
> /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680STACK
> Bio::Tools::Run::StandAloneBlast::blastall 
> /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536
> STACK
> Bio::Tools::Run::SCLBlast::blast 
> /usr/local/share/perl/5.8.4/Bio/Tools/Run/SCLBlast.pm:134
> STACK
> Bio::Tools::PSort::Module::SCLBlast::run 
> /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Module/SCLBlast.pm:72
> STACK
> Bio::Tools::PSort::Pathway::__ANON__ 
> /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:194
> STACK
> Bio::Tools::PSort::Pathway::traverse 
> /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:157
> STACK
> Bio::Tools::PSort::classify 
> /usr/local/share/perl/5.8.4/Bio/Tools/PSort.pm:160
> STACK (eval) /usr/local/bin/psort:318
> STACK toplevel /usr/local/bin/psort:318

Please when you ask for help, consider to send the code that
is failing, in other ways we don't have a good starting point
to help you and many people will trash your message...

BTW
I have never used PSort.pm but your problem come from a blast
call, so my two cents from StandAloneBlast.pm considerations:

in your error message It seems to me that you at least has a wrong
definition of the local database directory, see here:

> -d  /usr/local/psort//conf/analysis/sclblast/gramneg/sclblast

this probably means that:

1 your BLASTDATADIR is defined to be /usr/local/psort/
Even if this is the right location you should avoid the final '/'
But probably your database is in /conf/analysis/sclblast/gramneg
folder right?

2 you have passed the database with the full path into your
    params: /conf/analysis/sclblast/gramneg/sclblast
    when you needed to simply use 'sclblast'

This is a cut from the code of the module:

If local BLAST databases are not stored in the standard
/data directory, the variable BLASTDATADIR will need to be set 
explicitly

You need to enable Blast to find the directory containing the databases.
This can be done in (at least) two different ways:
   1. define an environmental variable BLASTDATADIR:
       export BLASTDATADIR=/conf/analysis/sclblast/gramneg   or
   2. include a definition of an environmental variable BLASTDATADIR in
       every script that will
      use StandAloneBlast.pm.
	
      BEGIN {$ENV{BLASTDATADIR} = ''/conf/analysis/sclblast/gramneg"; }


HTH

Remo

From chad at dieselwurks.com  Tue Mar 15 11:31:21 2005
From: chad at dieselwurks.com (Chad Matsalla)
Date: Tue Mar 15 11:25:53 2005
Subject: [Bioperl-l] naive question about Bio::Tools::Primer3
Message-ID: <Pine.LNX.4.50.0503150957130.19705-100000@sausage.usask.ca>


Greetings!

> How do I best get the individual result lines from the primer3 output
> file, using Bio::Tools::Primer3?
>
> I'll include the script I tried and the file it parsed.
> When I run it, I get "HASH(0xccfc)".
>
> #!/usr/bin/perl -w
> use lib "/Users/Ned/Documents/Perl/bioperl_source/bioperl-1.4";
> use Bio::AlignIO;
> use Bio::Tools::Primer3;# read a primer3 output file
> my $primer3=Bio::Tools::Primer3->new(-file=>"p3test1.out");
> #put the left- and right-primer stuff into hashes.
> my $primer=$primer3->next_primer;
> print "The right primer in the stream is ",
> $primer->get_primer('-right_primer')->seq->seq, "\n";
> # to return results
> print $primer3->primer_results(0,'PRIMER_LEFT_INPUT');
                  ^^^^^^^^^^^^^^

I added a couple of examples into t/primer3.t on how this can be done.
Everything is fine in your script until the primer_results line.

The answer to your question is to *not* use the method primer_results()
because that method does not actually create Bio::Seq::PrimedSeq
objects. You should be accessing primers from the stream by the
next_primer method.

Further to that I think that the method primer_results should be renamed
to _primer_results to indicate that it is a private method.

Does anybody object?


Chad Matsalla

From s0460205 at sms.ed.ac.uk  Tue Mar 15 13:37:36 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Tue Mar 15 13:32:00 2005
Subject: [Bioperl-l] Swissprot query - Help!
Message-ID: <1110911856.42372b7016563@sms.ed.ac.uk>

Hi, sorry for the obvious question but I'm really new to Perl/BioPerl!!

I want to run a script that sends a query to swissprot and returns the list of
sequences as Seq objects. I have tried the following code which throws an
exception 'MSG: Must speciy a value for uids to query'.

use Bio::DB::SwissProt;

$query = "Arabidopsis[ORGN] AND topoisomerase[TITL]";

$sp_obj = Bio::DB::SwissProt->new;

$stream_obj = $sp_obj->get_Stream_by_query($query);

while ($seq_obj = $stream_obj->next_seq) {
#print out the id
    print $seq_obj->display_id, "\n";
}

exit;


Any help is greatly appreciated!
From j1gregor at biomail.ucsd.edu  Tue Mar 15 16:07:46 2005
From: j1gregor at biomail.ucsd.edu (James Gregory)
Date: Tue Mar 15 16:02:27 2005
Subject: [Bioperl-l] cannot find path to blastall
Message-ID: <Pine.GSO.4.61.0503151305150.3278@biomail>


I'm trying to set up a standalone blast and I'm getting an error message 
that says

MSG: cannot find path to blastall

code:

my @params = (program => 'blastp',
               database => 'db.psq');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $path = '/path/to/blastall/';
$path = $factory->program_path($path);

#BLAST
my $blast_report = $factory->blastall($blast_seq);

any help would be appreciated.

James Gregory
University of California, San Diego
Department of Biological Sciences
From brian_osborne at cognia.com  Tue Mar 15 16:16:59 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Tue Mar 15 16:12:58 2005
Subject: [Bioperl-l] cannot find path to blastall
In-Reply-To: <Pine.GSO.4.61.0503151305150.3278@biomail>
Message-ID: <GPENLDEIJJHJLHOAJBBPEEOPCCAA.brian_osborne@cognia.com>

James,

On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the
actual location of blastall? On a Linux machine this might be something like
"/usr/local/bin/blastall". We need to know a bit more.

Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory
Sent: Tuesday, March 15, 2005 4:08 PM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] cannot find path to blastall


I'm trying to set up a standalone blast and I'm getting an error message
that says

MSG: cannot find path to blastall

code:

my @params = (program => 'blastp',
               database => 'db.psq');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
my $path = '/path/to/blastall/';
$path = $factory->program_path($path);

#BLAST
my $blast_report = $factory->blastall($blast_seq);

any help would be appreciated.

James Gregory
University of California, San Diego
Department of Biological Sciences
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From j1gregor at biomail.ucsd.edu  Tue Mar 15 16:30:25 2005
From: j1gregor at biomail.ucsd.edu (James Gregory)
Date: Tue Mar 15 16:24:50 2005
Subject: [Bioperl-l] cannot find path to blastall
In-Reply-To: <GPENLDEIJJHJLHOAJBBPEEOPCCAA.brian_osborne@cognia.com>
References: <GPENLDEIJJHJLHOAJBBPEEOPCCAA.brian_osborne@cognia.com>
Message-ID: <Pine.GSO.4.61.0503151322470.3278@biomail>

fixed it, i thought you could manually set the path within the perl 
script.  I just added blastall to usr/bin/local (on a unix machine) and it 
works, except now I'm having other troubles.

i ran
formatdb -i SLR16.1_prot.txt -o T -n subtilis

where SLR16.1_prot.txt is a fasta formatted file.  I get these files from 
formatdb.

subtilis.phr  subtilis.pin  subtilis.psd  subtilis.psi  subtilis.psq

from my understanding the .psq file is the one you want so i have

#create seq object for blast
my $blast_seq = Bio::Seq->new( '-id' => "$seq_name",
                                '-seq' => "$prot_seq");


#set BLAST params
my @params = (program => 'blastp',
               database => '/path/to/file/subtilis.psq');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

#BLAST
my $blast_report = $factory->blastall($blast_seq);


but i'm getting this error:

Could not find index files for database 
/home/j1gregor/transposon/subtilis.psq

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: blastall call crashed: 256 /usr/local/bin/blastall -p  blastp  -d 
"/home/j1gregor/transposon/subtilis.psq"  -i  /tmp/onOzNhelp8  -o 
/tmp/HlsmLpIFrT

STACK: Error::throw
STACK: Bio::Root::Root::throw 
/usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:328
STACK: Bio::Tools::Run::StandAloneBlast::_runblast 
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732
STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast 
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680
STACK: Bio::Tools::Run::StandAloneBlast::blastall 
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536
STACK: GFP_find.pl:85

thanks again,
James

On Tue, 15 Mar 2005, Brian Osborne wrote:

> James,
>
> On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the
> actual location of blastall? On a Linux machine this might be something like
> "/usr/local/bin/blastall". We need to know a bit more.
>
> Brian O.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory
> Sent: Tuesday, March 15, 2005 4:08 PM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] cannot find path to blastall
>
>
>
> I'm trying to set up a standalone blast and I'm getting an error message
> that says
>
> MSG: cannot find path to blastall
>
> code:
>
> my @params = (program => 'blastp',
>               database => 'db.psq');
> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
> my $path = '/path/to/blastall/';
> $path = $factory->program_path($path);
>
> #BLAST
> my $blast_report = $factory->blastall($blast_seq);
>
> any help would be appreciated.
>
> James Gregory
> University of California, San Diego
> Department of Biological Sciences
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
From brian_osborne at cognia.com  Tue Mar 15 16:43:26 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Tue Mar 15 16:37:49 2005
Subject: [Bioperl-l] Swissprot query - Help!
In-Reply-To: <1110911856.42372b7016563@sms.ed.ac.uk>
Message-ID: <GPENLDEIJJHJLHOAJBBPKEPACCAA.brian_osborne@cognia.com>

SG,

Well, I think my example code got you into this, I think I should help you
out!

You can't actually query Swissprot this way, using those text values and
field names, you can only do these text queries using Genbank currently.
You'd use Bio::DB::Query::GenBank for this, not Bio::DB::GenBank. If you
want to query Swissprot you're limited to ids and accession numbers. I will
clarify the HOWTO, it's a bit unclear on this point.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
Sent: Tuesday, March 15, 2005 1:38 PM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] Swissprot query - Help!


Hi, sorry for the obvious question but I'm really new to Perl/BioPerl!!

I want to run a script that sends a query to swissprot and returns the list
of
sequences as Seq objects. I have tried the following code which throws an
exception 'MSG: Must speciy a value for uids to query'.

use Bio::DB::SwissProt;

$query = "Arabidopsis[ORGN] AND topoisomerase[TITL]";

$sp_obj = Bio::DB::SwissProt->new;

$stream_obj = $sp_obj->get_Stream_by_query($query);

while ($seq_obj = $stream_obj->next_seq) {
#print out the id
    print $seq_obj->display_id, "\n";
}

exit;


Any help is greatly appreciated!
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From skirov at utk.edu  Tue Mar 15 16:46:13 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar 15 16:40:49 2005
Subject: [Bioperl-l] Error reporting/Validation implemented
In-Reply-To: <4235FEF4.2070901@gpc-biotech.com>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
	<4235FEF4.2070901@gpc-biotech.com>
Message-ID: <423757A5.8050405@utk.edu>

Mingyi,
Few things:
I used your parser to produce Bioperl objects based on some of the high 
level features and compared it ot what I have. Your parser is 
considerably faster (about twice), but it is still hard to tell as I am 
descending further  in the hierarchy with mine. At the same time I don't 
think the difference will vanish, so I will start building over your 
parser to produce bioperl objects. I am not sure exactly how I am going 
to deal with the relationships that are necessary, but I'll deal with it 
when I finsih everything else.
By the way it took 9 minutes on a 64 bit Xeon  3.4GHz even with Bioperl 
objects construction on the whole Homo_sapiens ASN file. The data that 
went inside the objects was: general desc of the genes (symbol, name, 
summary, etc.), organsism descr. but none of the truly big parts. 
Unfortunately, I am leaving tomorrow for a conference, so I will have 
some more next week earliest. Thanks for sharing the code!
Stefan

Mingyi Liu wrote:

> Hi, there,
>
> I just implemented basic error reporting and validation 
> functionalities in my Entrez Gene parser in Perl (the regex version).  
> The validation will catch all non-conforming data, while error 
> reporting reports line number, error type, and the first 20 
> (customizable) characters of the offending data (but the line number 
> could be incorrect if the format resulted in an exception, which is 
> hard to deal with for ASN.1-formatted data, although easy for XML 
> parsers).
> The speed for the parser of course slowed down, but I'd say it'd still 
> beat most parsers hands down.  The full human genome now takes a bit 
> over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 
> GHz CPU.  So I don't think my parser's speed has much to do with 
> performing validation or not.
>
> I had also communicated with Stefan Kirov and turns out the dead 
> entries and 0-sized (should be 1-sized) arrays were simply related to 
> data trimming options.  So far, so good.
>
> If anyone is interested, check it out at 
> http://www.sourceforge.net/projects/egparser/.
>
> Regards,
>
> Mingyi
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From brian_osborne at cognia.com  Tue Mar 15 16:46:03 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Tue Mar 15 16:40:55 2005
Subject: [Bioperl-l] cannot find path to blastall
In-Reply-To: <Pine.GSO.4.61.0503151322470.3278@biomail>
Message-ID: <GPENLDEIJJHJLHOAJBBPOEPACCAA.brian_osborne@cognia.com>

James,

Does:

my @params = (program => 'blastp',
              database => '"/home/j1gregor/transposon/subtilis');

work?

Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory
Sent: Tuesday, March 15, 2005 4:30 PM
To: Brian Osborne
Cc: bioperl-l@bioperl.org
Subject: RE: [Bioperl-l] cannot find path to blastall


fixed it, i thought you could manually set the path within the perl
script.  I just added blastall to usr/bin/local (on a unix machine) and it
works, except now I'm having other troubles.

i ran
formatdb -i SLR16.1_prot.txt -o T -n subtilis

where SLR16.1_prot.txt is a fasta formatted file.  I get these files from
formatdb.

subtilis.phr  subtilis.pin  subtilis.psd  subtilis.psi  subtilis.psq

from my understanding the .psq file is the one you want so i have

#create seq object for blast
my $blast_seq = Bio::Seq->new( '-id' => "$seq_name",
                                '-seq' => "$prot_seq");


#set BLAST params
my @params = (program => 'blastp',
               database => '/path/to/file/subtilis.psq');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);


#BLAST
my $blast_report = $factory->blastall($blast_seq);


but i'm getting this error:

Could not find index files for database
/home/j1gregor/transposon/subtilis.psq

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: blastall call crashed: 256 /usr/local/bin/blastall -p  blastp  -d
"/home/j1gregor/transposon/subtilis.psq"  -i  /tmp/onOzNhelp8  -o
/tmp/HlsmLpIFrT

STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:328
STACK: Bio::Tools::Run::StandAloneBlast::_runblast
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732
STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680
STACK: Bio::Tools::Run::StandAloneBlast::blastall
/usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536
STACK: GFP_find.pl:85

thanks again,
James

On Tue, 15 Mar 2005, Brian Osborne wrote:

> James,
>
> On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the
> actual location of blastall? On a Linux machine this might be something
like
> "/usr/local/bin/blastall". We need to know a bit more.
>
> Brian O.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory
> Sent: Tuesday, March 15, 2005 4:08 PM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] cannot find path to blastall
>
>
>
> I'm trying to set up a standalone blast and I'm getting an error message
> that says
>
> MSG: cannot find path to blastall
>
> code:
>
> my @params = (program => 'blastp',
>               database => 'db.psq');
> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
> my $path = '/path/to/blastall/';
> $path = $factory->program_path($path);
>
> #BLAST
> my $blast_report = $factory->blastall($blast_seq);
>
> any help would be appreciated.
>
> James Gregory
> University of California, San Diego
> Department of Biological Sciences
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From j1gregor at biomail.ucsd.edu  Tue Mar 15 17:00:18 2005
From: j1gregor at biomail.ucsd.edu (James Gregory)
Date: Tue Mar 15 16:54:42 2005
Subject: [Bioperl-l] cannot find path to blastall
In-Reply-To: <GPENLDEIJJHJLHOAJBBPOEPACCAA.brian_osborne@cognia.com>
References: <GPENLDEIJJHJLHOAJBBPOEPACCAA.brian_osborne@cognia.com>
Message-ID: <Pine.GSO.4.61.0503151354510.3278@biomail>

i think that solved the problem.  but more problems.

[blastall] WARNING:  [000.000]  >BA-124-EB41_B01.ab1.Seq: Unable to open 
BLOSUM62
[blastall] WARNING:  [000.000]  >BA-124-EB41_B01.ab1.Seq: 
BlastScoreBlkMatFill returned non-zero status
[blastall] WARNING:  [000.000]  >BA-124-EB41_B01.ab1.Seq: SetUpBlastSearch 
failed.

do i need to put the entre ./ncbi_toolbox/ncbi/bin into my path 
(usr/local/bin)?  although I don't think BLOSUM62 is there.. couldn't find 
it in the ncbi toolbox.

James

On Tue, 15 Mar 2005, Brian Osborne wrote:

> James,
>
> Does:
>
> my @params = (program => 'blastp',
>              database => '"/home/j1gregor/transposon/subtilis');
>
> work?
>
> Brian O.
>
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory
> Sent: Tuesday, March 15, 2005 4:30 PM
> To: Brian Osborne
> Cc: bioperl-l@bioperl.org
> Subject: RE: [Bioperl-l] cannot find path to blastall
>
>
> fixed it, i thought you could manually set the path within the perl
> script.  I just added blastall to usr/bin/local (on a unix machine) and it
> works, except now I'm having other troubles.
>
> i ran
> formatdb -i SLR16.1_prot.txt -o T -n subtilis
>
> where SLR16.1_prot.txt is a fasta formatted file.  I get these files from
> formatdb.
>
> subtilis.phr  subtilis.pin  subtilis.psd  subtilis.psi  subtilis.psq
>
> from my understanding the .psq file is the one you want so i have
>
> #create seq object for blast
> my $blast_seq = Bio::Seq->new( '-id' => "$seq_name",
>                                '-seq' => "$prot_seq");
>
>
> #set BLAST params
> my @params = (program => 'blastp',
>               database => '/path/to/file/subtilis.psq');
> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>
>
> #BLAST
> my $blast_report = $factory->blastall($blast_seq);
>
>
> but i'm getting this error:
>
> Could not find index files for database
> /home/j1gregor/transposon/subtilis.psq
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: blastall call crashed: 256 /usr/local/bin/blastall -p  blastp  -d
> "/home/j1gregor/transposon/subtilis.psq"  -i  /tmp/onOzNhelp8  -o
> /tmp/HlsmLpIFrT
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.3/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::StandAloneBlast::_runblast
> /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:732
> STACK: Bio::Tools::Run::StandAloneBlast::_generic_local_blast
> /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:680
> STACK: Bio::Tools::Run::StandAloneBlast::blastall
> /usr/lib/perl5/site_perl/5.8.3/Bio/Tools/Run/StandAloneBlast.pm:536
> STACK: GFP_find.pl:85
>
> thanks again,
> James
>
> On Tue, 15 Mar 2005, Brian Osborne wrote:
>
>> James,
>>
>> On Unix? Windows? Cygwin? Something else? Also is "/path/to/blastall" the
>> actual location of blastall? On a Linux machine this might be something
> like
>> "/usr/local/bin/blastall". We need to know a bit more.
>>
>> Brian O.
>>
>>
>>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org
>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of James Gregory
>> Sent: Tuesday, March 15, 2005 4:08 PM
>> To: bioperl-l@bioperl.org
>> Subject: [Bioperl-l] cannot find path to blastall
>>
>>
>>
>> I'm trying to set up a standalone blast and I'm getting an error message
>> that says
>>
>> MSG: cannot find path to blastall
>>
>> code:
>>
>> my @params = (program => 'blastp',
>>               database => 'db.psq');
>> my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
>> my $path = '/path/to/blastall/';
>> $path = $factory->program_path($path);
>>
>> #BLAST
>> my $blast_report = $factory->blastall($blast_seq);
>>
>> any help would be appreciated.
>>
>> James Gregory
>> University of California, San Diego
>> Department of Biological Sciences
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
From mingyi.liu at gpc-biotech.com  Tue Mar 15 17:01:36 2005
From: mingyi.liu at gpc-biotech.com (Mingyi Liu)
Date: Tue Mar 15 16:56:07 2005
Subject: [Bioperl-l] Error reporting/Validation implemented
In-Reply-To: <423757A5.8050405@utk.edu>
References: <F2C643E2-937B-11D9-B647-000A959EB4C4@gmx.net>
	<4234564D.7010906@gpc-biotech.com>
	<97e5143a0b434ab721cd023556a1a5b2@dalkescientific.com>
	<4235FEF4.2070901@gpc-biotech.com> <423757A5.8050405@utk.edu>
Message-ID: <42375B40.4070207@gpc-biotech.com>

Stefan Kirov wrote:

> Mingyi,
> Few things:
> I used your parser to produce Bioperl objects based on some of the 
> high level features and compared it ot what I have. Your parser is 
> considerably faster (about twice), but it is still hard to tell as I 
> am descending further  in the hierarchy with mine. At the same time I 
> don't think the difference will vanish, so I will start building over 
> your parser to produce bioperl objects. I am not sure exactly how I am 
> going to deal with the relationships that are necessary, but I'll deal 
> with it when I finsih everything else.

Hi, Stefan,

Thanks for the comparison result!  That was fast!  Please let me know if 
you need some help using the data structure of my parser.  I'll try to 
provide a skeleton code tonight for you (or maybe in the next couple of 
days since you're away anyway) that comes from my code that extracts all 
data (as far as I can tell) from Entrez Gene.  This way although it 
still does not construct objects for you, at least it's going to be 
easier to find the stuff you want for object construction, which is 
definitely the toughest step of creating a bioperl parser for Entrez Gene.

BTW, I just released version 1.04 with some simple improvements such as 
attempts (only on *NIX) to open file over 2 GB even if the perl version 
used does not support it (so that the file 'All_Data' to work for me 
without recompiling my Perl), 'file' option in 'new' method, etc.  It's 
more convenient to use (check the "regex_parser_test.pl" in V1.04 for 
usage example), somewhat like SeqIO's usage (send in 'file' in new() and 
call next_seq to get next record).

>
> By the way it took 9 minutes on a 64 bit Xeon  3.4GHz even with 
> Bioperl objects construction on the whole Homo_sapiens ASN file. 

Thanks for sharing the benchmark! It's definitely faster than my Xeon 
2.4 GHz.  I just ran my parser V1.04 on the file All_Data that contains 
all Entrez Gene genomes (about 7.4 GB) and it took the parser 98 minutes 
to finish with no error found.

> The data that went inside the objects was: general desc of the genes 
> (symbol, name, summary, etc.), organsism descr. but none of the truly 
> big parts. Unfortunately, I am leaving tomorrow for a conference, so I 
> will have some more next week earliest. Thanks for sharing the code!
> Stefan

Glad to be of help!

Best,

Mingyi

From amtd9 at umr.edu  Tue Mar 15 17:04:07 2005
From: amtd9 at umr.edu (Mane, Ajay (UMR-Student))
Date: Tue Mar 15 17:01:57 2005
Subject: [Bioperl-l] bl2seq of NCBI
Message-ID: <58AF0CF509606A49B1770AB5DFF811CE110813@UMR-CMAIL1.umr.edu>

Hi,
 
I am new to bioperl. I want to use the bl2seq tool of NCBI giving the input query sequences in a perl script.
I have gone through the documentation, but not clear how to start. Can anyone send a sample perl script which 
uses the bl2seq tool. What all needs to be installed.
 
Thanks,
Ajay

________________________________

From: bioperl-l-bounces@portal.open-bio.org on behalf of Stefan Kirov
Sent: Tue 3/15/2005 3:46 PM
To: Mingyi Liu
Cc: bioperl-l@portal.open-bio.org; Andrew Dalke
Subject: Re: [Bioperl-l] Error reporting/Validation implemented


Mingyi,
Few things:
I used your parser to produce Bioperl objects based on some of the high
level features and compared it ot what I have. Your parser is
considerably faster (about twice), but it is still hard to tell as I am
descending further  in the hierarchy with mine. At the same time I don't
think the difference will vanish, so I will start building over your
parser to produce bioperl objects. I am not sure exactly how I am going
to deal with the relationships that are necessary, but I'll deal with it
when I finsih everything else.
By the way it took 9 minutes on a 64 bit Xeon  3.4GHz even with Bioperl
objects construction on the whole Homo_sapiens ASN file. The data that
went inside the objects was: general desc of the genes (symbol, name,
summary, etc.), organsism descr. but none of the truly big parts.
Unfortunately, I am leaving tomorrow for a conference, so I will have
some more next week earliest. Thanks for sharing the code!
Stefan

Mingyi Liu wrote:

> Hi, there,
>
> I just implemented basic error reporting and validation
> functionalities in my Entrez Gene parser in Perl (the regex version). 
> The validation will catch all non-conforming data, while error
> reporting reports line number, error type, and the first 20
> (customizable) characters of the offending data (but the line number
> could be incorrect if the format resulted in an exception, which is
> hard to deal with for ASN.1-formatted data, although easy for XML
> parsers).
> The speed for the parser of course slowed down, but I'd say it'd still
> beat most parsers hands down.  The full human genome now takes a bit
> over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4
> GHz CPU.  So I don't think my parser's speed has much to do with
> performing validation or not.
>
> I had also communicated with Stefan Kirov and turns out the dead
> entries and 0-sized (should be 1-sized) arrays were simply related to
> data trimming options.  So far, so good.
>
> If anyone is interested, check it out at
> http://www.sourceforge.net/projects/egparser/.
>
> Regards,
>
> Mingyi
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From diriano at rz.uni-potsdam.de  Tue Mar 15 17:52:31 2005
From: diriano at rz.uni-potsdam.de (=?iso-8859-1?Q?Diego_Mauricio_Ria=F1o_Pach=F3n?=)
Date: Tue Mar 15 17:48:18 2005
Subject: [Bioperl-l] cannot find path to blastall
References: <GPENLDEIJJHJLHOAJBBPOEPACCAA.brian_osborne@cognia.com>
	<Pine.GSO.4.61.0503151354510.3278@biomail>
Message-ID: <002601c529b1$ac87d360$ac4bfea9@diegoriano>

Hi James

BLOSUM and PAM matrices should be in the data subdir of the main blast
directory.
I would say that the best it to kept all the blast executables together with
your data dir in on directory, something like:
blast_dir
    data
In blast_dir you leave the execs, and in data the matrices, then add
blast_dir to your path, in unix usung bash you do it like this:
export PATH=$PATH:/path/to/blast_dir

I hope this helps

Diego
_______________________________________
Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
Potsdam University
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:0331/977-2809
http://www.geocities.com/dmrp.geo/


From charlesh at admin.stedwards.edu  Tue Mar 15 18:44:16 2005
From: charlesh at admin.stedwards.edu (chauser)
Date: Tue Mar 15 18:39:32 2005
Subject: [Bioperl-l] SeqIO - masked seqs
Message-ID: <20aed013cebdc496f11e367760508075@admin.stedwards.edu>

All,

I ran into a glitch when reading sets of EST reads where some reads are 
masked in their entirety - i.e. all bases are X's.
Is there a way to either modify the alphabet to accept X or some other 
solution?

thanks,

chuck


------------- EXCEPTION  -------------
MSG: Got a sequence with no letters in it cannot guess alphabet []
STACK Bio::PrimarySeq::_guess_alphabet 
/usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837
STACK Bio::Seq::SeqFastaSpeedFactory::create 
/usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137
STACK Bio::SeqIO::fasta::next_seq 
/usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143
STACK main::RAW ESTcleanup.pl:81
STACK toplevel ESTcleanup.pl:49

From lzhtom at hotmail.com  Wed Mar 16 01:49:03 2005
From: lzhtom at hotmail.com (zhihua li)
Date: Wed Mar 16 01:43:42 2005
Subject: [Bioperl-l] help on getting annotations
Message-ID: <BAY12-F3224DB16815FA83EA8E49EC7480@phx.gbl>

Hi netter!

I have a series of GenBank Accession Numbers(GB78091, GB90876,...) and 
wanna get as much information as possible about each items. I want to know 
their UniGene ID so that I can tell if there are redundancies among them; I 
want to get their gene descriptions or GO annotations so as to group them 
into functional groups; I want to know their KEGG pathway IDs so that I can 
tell which of them are in the same biological pathway, etc.....

Of course I could submit the seris of accession numbers to each different 
database (GenBank, GO, KEGG...) and get their annotations respectively. But 
as the seris contains a large number of items, I think it's better to write 
a perl script (or use an existing bioperl function) to have it done 
automatically.

Could anyone give me a hint about how i can write the script or use the 
corresponding bioperl function?  I'm new to both perl and bioperl.

Thanks a lot!

_________________________________________________________________
�����������ѽ��н�������ʹ�� MSN Messenger:  http://messenger.msn.com/cn  

From Marc.Logghe at devgen.com  Wed Mar 16 02:53:14 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Mar 16 02:48:18 2005
Subject: [Bioperl-l] cannot find path to blastall
Message-ID: <BEE28BF86078B6429D6C780635718E21905274@morelia.be.devgen.com>

> executables together with your data dir in on directory, 
> something like:
> blast_dir
>     data
> In blast_dir you leave the execs, and in data the matrices, 
> then add blast_dir to your path, in unix usung bash you do it 
> like this:
> export PATH=$PATH:/path/to/blast_dir

You can say to blastall where to find the data files by setting the environmental variable BLASTMAT. If you are not sure what that should be, do a search for BLOSUM62. In my case it is in /usr/share/ncbi/data/.
Then you do 'export BLASTMAT=/usr/share/ncbi/data/'
Or you set it in you Perl script 
$ENV{'BLASTMAT'} = '/usr/share/ncbi/data/';
HTH,
Marc 

From Marc.Logghe at devgen.com  Wed Mar 16 03:16:09 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Wed Mar 16 03:11:56 2005
Subject: [Bioperl-l] SeqIO - masked seqs
Message-ID: <BEE28BF86078B6429D6C780635718E21905275@morelia.be.devgen.com>

 
> All,
> 
> I ran into a glitch when reading sets of EST reads where some 
> reads are masked in their entirety - i.e. all bases are X's.
> Is there a way to either modify the alphabet to accept X or 
> some other solution?

I was not able to trace the actual fix. But there was a thread in
december/january about that.
In one of the last messages Nathan was about the fix this:
http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html

Brian added a comment on this alphabet() issue.
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI
O.pm?cvsroot=bioperl
Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ?
Guess it should be fixed there.
Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to the
Bio::SeqFeatureI architecture ?
Marc

From sdavis2 at mail.nih.gov  Wed Mar 16 06:32:56 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Mar 16 06:27:50 2005
Subject: [Bioperl-l] help on getting annotations
In-Reply-To: <BAY12-F3224DB16815FA83EA8E49EC7480@phx.gbl>
References: <BAY12-F3224DB16815FA83EA8E49EC7480@phx.gbl>
Message-ID: <75f3e4f27a029fd291be8551157eecfc@mail.nih.gov>

Try http://source.stanford.edu/cgi-bin/source/sourceSearch if you are 
using human, mouse, or rat.  If not, then this will be a multi-step 
process (there isn't a bioperl function to do this--you will have to 
write some code).

Sean

On Mar 16, 2005, at 1:49 AM, zhihua li wrote:

> Hi netter!
>
> I have a series of GenBank Accession Numbers(GB78091, GB90876,...) and 
> wanna get as much information as possible about each items. I want to 
> know their UniGene ID so that I can tell if there are redundancies 
> among them; I want to get their gene descriptions or GO annotations so 
> as to group them into functional groups; I want to know their KEGG 
> pathway IDs so that I can tell which of them are in the same 
> biological pathway, etc.....
>
> Of course I could submit the seris of accession numbers to each 
> different database (GenBank, GO, KEGG...) and get their annotations 
> respectively. But as the seris contains a large number of items, I 
> think it's better to write a perl script (or use an existing bioperl 
> function) to have it done automatically.
>
> Could anyone give me a hint about how i can write the script or use 
> the corresponding bioperl function?  I'm new to both perl and bioperl.
>
> Thanks a lot!
>
> _________________________________________________________________
> ?????????????? MSN Messenger:  http://messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Wed Mar 16 06:38:34 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed Mar 16 06:33:02 2005
Subject: [Bioperl-l] help on getting annotations
In-Reply-To: <BAY12-F3224DB16815FA83EA8E49EC7480@phx.gbl>
References: <BAY12-F3224DB16815FA83EA8E49EC7480@phx.gbl>
Message-ID: <7a5ae3bc874b1fe959f80d01d60a4fbf@mail.nih.gov>

A bit of an off-topic reply coming!

I just noticed that you are also posting to the BioConductor list about 
some microarray-related issues.  The BioConductor project has a package 
called AnnBuilder that will build a HUGE annotation package for you 
based on your genbank accession numbers.  Then, you can have your 
annotation AND microarray data available via R without having to read 
various text files, etc.  There are also R functions to perform all the 
usual statistical analyses (enriched ontology categories, KEGG 
pathways, etc.).  If you are using R/Bioconductor to do your analyses, 
you should really look at AnnBuilder and annotate packages (and related 
GOStats).

Sean

On Mar 16, 2005, at 1:49 AM, zhihua li wrote:

> Hi netter!
>
> I have a series of GenBank Accession Numbers(GB78091, GB90876,...) and 
> wanna get as much information as possible about each items. I want to 
> know their UniGene ID so that I can tell if there are redundancies 
> among them; I want to get their gene descriptions or GO annotations so 
> as to group them into functional groups; I want to know their KEGG 
> pathway IDs so that I can tell which of them are in the same 
> biological pathway, etc.....
>
> Of course I could submit the seris of accession numbers to each 
> different database (GenBank, GO, KEGG...) and get their annotations 
> respectively. But as the seris contains a large number of items, I 
> think it's better to write a perl script (or use an existing bioperl 
> function) to have it done automatically.
>
> Could anyone give me a hint about how i can write the script or use 
> the corresponding bioperl function?  I'm new to both perl and bioperl.
>
> Thanks a lot!
>
> _________________________________________________________________
> ?????????????? MSN Messenger:  http://messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From kenji at vettatech.com  Wed Mar 16 14:16:26 2005
From: kenji at vettatech.com (Leonardo Kenji Shikida)
Date: Wed Mar 16 14:13:00 2005
Subject: [Bioperl-l] how to parse the GenPept sequence object to get the
	'DBSOURCE' field
Message-ID: <4238860A.8050705@vettatech.com>

does anyone know how to parse the GenPept sequence object to get the 'DBSOURCE' field?

e.g. human.protein.gpff

LOCUS       NP_000358                245 aa            linear   PRI 31-OCT-2000
DEFINITION  thiopurine S-methyltransferase [Homo sapiens].
ACCESSION   NP_000358
VERSION     NP_000358.1  GI:4507653
DBSOURCE    REFSEQ: accession NM_000367.1  <<==
KEYWORDS    .
SOURCE      Homo sapiens (human)

I found no answer reading the docs, and there is the same unanswered question in this list archives at

http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html

thanks in advance

K.

From khoueiry at lgpd.univ-mrs.fr  Tue Mar 15 04:23:25 2005
From: khoueiry at lgpd.univ-mrs.fr (khoueiry)
Date: Wed Mar 16 17:25:06 2005
Subject: [Bioperl-l] Xmfa2GFF
Message-ID: <1110878606.888.3.camel@DavidLinux>

Hello everybody,

I want to know if there is a bioperl script that convert xmfa files
format into a GFF format to use with Gbrowse. The idea is to create the
GFF file to browse alignements with Gbrowse. 

Thanks...

From charlesh at admin.stedwards.edu  Wed Mar 16 08:48:46 2005
From: charlesh at admin.stedwards.edu (chauser)
Date: Wed Mar 16 17:25:09 2005
Subject: [Bioperl-l] SeqIO - masked seqs
In-Reply-To: <BEE28BF86078B6429D6C780635718E21905275@morelia.be.devgen.com>
References: <BEE28BF86078B6429D6C780635718E21905275@morelia.be.devgen.com>
Message-ID: <fb3fbd09490686a2cbf2dcead14c611b@admin.stedwards.edu>

Hi Marc,

I updated to the current CVS and get the same error.  If I tack on a  
single valid base to the offending clone(below) SeqIO reads it.

# $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $

o Version

  This is Bioperl version 1.5 from CVS HEAD


 >1115008E10.y1  CHROMAT_FILE: 1115008E10.y1 PHD_FILE:  
1115008E10.y1.phd.1 CHEM: term
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXX


------------- EXCEPTION  -------------
MSG: Got a sequence with no letters in it cannot guess alphabet []
STACK Bio::PrimarySeq::_guess_alphabet  
/usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837
STACK Bio::Seq::SeqFastaSpeedFactory::create  
/usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137
STACK Bio::SeqIO::fasta::next_seq  
/usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143
STACK main::RAW ESTcount.pl:81
STACK toplevel ESTcount.pl:49


Chuck


On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote:

>
>> All,
>>
>> I ran into a glitch when reading sets of EST reads where some
>> reads are masked in their entirety - i.e. all bases are X's.
>> Is there a way to either modify the alphabet to accept X or
>> some other solution?
>
> I was not able to trace the actual fix. But there was a thread in
> december/january about that.
> In one of the last messages Nathan was about the fix this:
> http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html
>
> Brian added a comment on this alphabet() issue.
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
> SeqI
> O.pm?cvsroot=bioperl
> Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ?
> Guess it should be fixed there.
> Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to the
> Bio::SeqFeatureI architecture ?
> Marc
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 2538 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050316/975dbc52/attachment.bin
From Marc.Logghe at devgen.com  Thu Mar 17 03:29:25 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Thu Mar 17 03:24:10 2005
Subject: [Bioperl-l] SeqIO - masked seqs
Message-ID: <BEE28BF86078B6429D6C780635718E21905283@morelia.be.devgen.com>

Hi chuck,
It seems to be fixed after all. The original problem was actually when
you explicitely set the alphabet yourself, bioperl tries to guess the
alphabet anyhow.
Meaning, when you set the alphabet now, it will work. I tested it like
this:
#!/usr/bin/perl
use strict;
use Bio::SeqIO;

my $in = Bio::SeqIO->new(-fh => \*DATA, -format => 'fasta');
$in->alphabet('dna');  # it fails when you comment out this line !!!
my $out = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'fasta');
my $seq = $in->next_seq;
$out->write_seq($seq);
 
__DATA__
>1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: 1115008E10.y1.phd.1
CHEM: term
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXX

HTH
Marc
 

________________________________

	From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of chauser
	Sent: Wednesday, March 16, 2005 2:49 PM
	To: Marc Logghe
	Cc: bioperl-l@portal.open-bio.org
	Subject: Re: [Bioperl-l] SeqIO - masked seqs
	
	
	Hi Marc, 

	I updated to the current CVS and get the same error. If I tack
on a single valid base to the offending clone(below) SeqIO reads it. 

	# $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $ 

	o Version 

	This is Bioperl version 1.5 from CVS HEAD 


	>1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE:
1115008E10.y1.phd.1 CHEM: term 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
	XXXXXXXXXXX 


	------------- EXCEPTION ------------- 
	MSG: Got a sequence with no letters in it cannot guess alphabet
[] 
	STACK Bio::PrimarySeq::_guess_alphabet
/usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837 
	STACK Bio::Seq::SeqFastaSpeedFactory::create
/usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137 
	STACK Bio::SeqIO::fasta::next_seq
/usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143 
	STACK main::RAW ESTcount.pl:81 
	STACK toplevel ESTcount.pl:49 


	Chuck 


	On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote: 


			All, 

			I ran into a glitch when reading sets of EST
reads where some 
			reads are masked in their entirety - i.e. all
bases are X's. 
			Is there a way to either modify the alphabet to
accept X or 
			some other solution? 


		I was not able to trace the actual fix. But there was a
thread in 
		december/january about that. 
		In one of the last messages Nathan was about the fix
this: 
	
http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html 

		Brian added a comment on this alphabet() issue. 
	
http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI

		O.pm?cvsroot=bioperl 
		Have you tried bioperl release 1.5.0 or
bioperl-release-1-5-0-rc2 ? 
		Guess it should be fixed there. 
		Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0
related to the 
		Bio::SeqFeatureI architecture ? 
		Marc 


From nathanhaigh at ukonline.co.uk  Thu Mar 17 04:06:28 2005
From: nathanhaigh at ukonline.co.uk (Nathan Haigh)
Date: Thu Mar 17 04:01:33 2005
Subject: [Bioperl-l] SeqIO - masked seqs
In-Reply-To: <fb3fbd09490686a2cbf2dcead14c611b@admin.stedwards.edu>
References: <BEE28BF86078B6429D6C780635718E21905275@morelia.be.devgen.com>
	<fb3fbd09490686a2cbf2dcead14c611b@admin.stedwards.edu>
Message-ID: <42394894.80506@ukonline.co.uk>

Without going back and double checking, i think this is how things stand 
with the current CVS (and probably the 1.5 release). There was a 
modification in the module that trys to guess the alphabet of the 
sequence in question (X was added to the set of characters that were 
removed from the sequence prior to attempting to guess the alphabet) 
this resulted in the error shown when you have a fully masked sequence. 
I think the fix i implemented was in Bio::SeqIO::fasta which allowed you 
to do set the alphabet manually thus not allowing Bioperl to guess the 
alphabet.

soemthing like this should curcumvent this problem:

$in  = Bio::SeqIO->new(-file => "inputfilename" , 
                       -format => 'Fasta',
			-alphabet => 'dna');

Let us know how you get on
Nathan


chauser wrote:

> Hi Marc,
>
> I updated to the current CVS and get the same error. If I tack on a 
> single valid base to the offending clone(below) SeqIO reads it.
>
> # $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $
>
> o Version
>
> This is Bioperl version 1.5 from CVS HEAD
>
>
>
> >1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: 
> 1115008E10.y1.phd.1 CHEM: term
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXX
>
>
>
> ------------- EXCEPTION -------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> STACK Bio::PrimarySeq::_guess_alphabet 
> /usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837
> STACK Bio::Seq::SeqFastaSpeedFactory::create 
> /usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137
> STACK Bio::SeqIO::fasta::next_seq 
> /usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143
> STACK main::RAW ESTcount.pl:81
> STACK toplevel ESTcount.pl:49
>
>
> Chuck
>
>
>
> On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote:
>
>
>         All,
>
>         I ran into a glitch when reading sets of EST reads where some
>         reads are masked in their entirety - i.e. all bases are X's.
>         Is there a way to either modify the alphabet to accept X or
>         some other solution?
>
>
>     I was not able to trace the actual fix. But there was a thread in
>     december/january about that.
>     In one of the last messages Nathan was about the fix this:
>     http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html
>
>     Brian added a comment on this alphabet() issue.
>     http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI
>
>     O.pm?cvsroot=bioperl
>     Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ?
>     Guess it should be fixed there.
>     Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to
>     the
>     Bio::SeqFeatureI architecture ?
>     Marc
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From Anthony.Underwood at hpa.org.uk  Thu Mar 17 05:17:47 2005
From: Anthony.Underwood at hpa.org.uk (SRMD, Col - Underwood, Anthony)
Date: Thu Mar 17 05:28:58 2005
Subject: [Bioperl-l] Non-implemented methods and bioperl
	documentation/ContigAnalysis
Message-ID: <B71051573CAF1B4196BC4D9601BB58C40E5DE6@hermes.phls.org.uk>

Hi Bioperlers
 
The ContigAnaysis  method "single_stand" is not implemented even though
documented in the documentation. For methods that are not implemented should
they not be highlighted as such within the documented so that people do not
write code reliant on the method only to find that it throws an error saying
this isn't implemented but it's not your fault. It would have been handier
to know this at an earlier stage!
 
Any thoughts?
 
Anthony 
 
Dr Anthony Underwood
Bioinformatics Group | Genomics, Proteomics and Bioinformatics Unit
Centre for Infections
Health Protection Agency
61 Colindale Avenue
London
NW9 5HT
t: 0208 3276466  f: 0208 3276738  e:anthony.underwood@hpa.org.uk
 

-----------------------------------------
**************************************************************************
The information contained in the EMail and any attachments is confidential
and intended solely and for the attention and use of the named
addressee(s). It may not be disclosed to any other person without the
express authority of the HPA, or the intended recipient, or both. If you
are not the intended recipient, you must not disclose, copy, distribute or
retain this message or any part of it. This footnote also confirms that
this EMail has been swept for computer viruses, but please re-sweep any
attachments before opening or saving. HTTP://www.HPA.org.uk
**************************************************************************
From brian_osborne at cognia.com  Thu Mar 17 09:06:49 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Thu Mar 17 09:04:08 2005
Subject: [Bioperl-l] how to parse the GenPept sequence object to get
	the'DBSOURCE' field
In-Reply-To: <4238860A.8050705@vettatech.com>
Message-ID: <GPENLDEIJJHJLHOAJBBPAEBCCDAA.brian_osborne@cognia.com>

K,

I've added some code to SeqIO/genbank.pm that appears to work but I can't
commit it until I ask the Bioperl designers a question. Namely, it appears
that this DBSOURCE field is specific to Genbank Protein, so the work of
creating the Annotation::SimpleValue should be in genbank.pm, not
RichSeq.pm, right?

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo
Kenji Shikida
Sent: Wednesday, March 16, 2005 2:16 PM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] how to parse the GenPept sequence object to get
the'DBSOURCE' field


does anyone know how to parse the GenPept sequence object to get the
'DBSOURCE' field?

e.g. human.protein.gpff

LOCUS       NP_000358                245 aa            linear   PRI
31-OCT-2000
DEFINITION  thiopurine S-methyltransferase [Homo sapiens].
ACCESSION   NP_000358
VERSION     NP_000358.1  GI:4507653
DBSOURCE    REFSEQ: accession NM_000367.1  <<==
KEYWORDS    .
SOURCE      Homo sapiens (human)

I found no answer reading the docs, and there is the same unanswered
question in this list archives at

http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html

thanks in advance

K.

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From sasalacolla at libero.it  Thu Mar 17 09:24:11 2005
From: sasalacolla at libero.it (sasalacolla@libero.it)
Date: Thu Mar 17 09:19:09 2005
Subject: [Bioperl-l] help me with Bio-Tools-PSort-2.0.4
Message-ID: <IDI2OB$80482EF1CBF3ADD947C074E515205908@libero.it>

Hi Bioperler,  
please rescue me.  
I tried to use a local PSORTb program in $PSORT_ROOT/bin contained in the    
Psortb module: Bio-Tools-PSort-2.0.4 (available at    
http://www.psort.org/downloads/index.html), but I only got this error message:  
  
  
Fatal error:  
------------- EXCEPTION  -------------  
MSG: blastall call crashed: -1 /usr/bin/blastall -p  blastp   
-d  /usr/local/psort/conf/analysis/sclblast/gramneg/sclblast   
-i  /tmp/OsUPf63CDs  -e  1e-09  -o  /tmp/8JSpO7hLfz  -F  F  
  
STACK  
Bio::Tools::Run::StandAloneBlast::_runblast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:732  
STACK  
Bio::Tools::Run::StandAloneBlast::_generic_local_blast /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:680STACK  
Bio::Tools::Run::StandAloneBlast::blastall /usr/share/perl5/Bio/Tools/Run/StandAloneBlast.pm:536  
STACK  
Bio::Tools::Run::SCLBlast::blast /usr/local/share/perl/5.8.4/Bio/Tools/Run/SCLBlast.pm:134  
STACK  
Bio::Tools::PSort::Module::SCLBlast::run /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Module/SCLBlast.pm:72  
STACK  
Bio::Tools::PSort::Pathway::__ANON__ /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:194  
STACK  
Bio::Tools::PSort::Pathway::traverse /usr/local/share/perl/5.8.4/Bio/Tools/PSort/Pathway.pm:157  
STACK  
Bio::Tools::PSort::classify /usr/local/share/perl/5.8.4/Bio/Tools/PSort.pm:160  
STACK (eval) /usr/local/bin/psort:318  
STACK toplevel /usr/local/bin/psort:318  
  
--------------------------------------  
  
  
Reading the configuration    
instructions of this module i set the following environmental variables in    
bashrc:    
    
export PSORT_ROOT='/usr/local/psort'    
export PSORT_HMMTOP='/home/sandro/Inst_tools_Rev_Vacc/PSORTb/hmmtop2.1'    
export PSORT_PFTOOLS='/usr/bin'    
export BLASTDIR='/usr/bin'    
    
since blastall is located in my machine in /usr/bin/blastall.   
I tried to solve the problem setting in bashrc:  
export BLASTDATADIR='/conf/analysis/sclblast/gramneg'.   
Anyway nothing changed in the error message, and nothing changes even setting  
absurd addresses for BLASTDATADIR!  
 
Any suggestion? thank you all,guys 


____________________________________________________________
6X velocizzare la tua navigazione a 56k? 6X Web Accelerator di Libero!
Scaricalo su INTERNET GRATIS 6X http://www.libero.it


From sanges at biogem.it  Thu Mar 17 09:49:58 2005
From: sanges at biogem.it (Remo Sanges)
Date: Thu Mar 17 09:45:09 2005
Subject: [Bioperl-l] help me with Bio-Tools-PSort-2.0.4
In-Reply-To: <IDI2OB$80482EF1CBF3ADD947C074E515205908@libero.it>
References: <IDI2OB$80482EF1CBF3ADD947C074E515205908@libero.it>
Message-ID: <43c88d82c20dcbfe6c4ddbe3810fec9b@biogem.it>


On Mar 17, 2005, at 3:24 PM, sasalacolla@@libero..it wrote:

> Reading the configuration
> instructions of this module i set the following environmental 
> variables in
> bashrc:
>
> export PSORT_ROOT='/usr/local/psort'
> export PSORT_HMMTOP='/home/sandro/Inst_tools_Rev_Vacc/PSORTb/hmmtop2.1'
> export PSORT_PFTOOLS='/usr/bin'
> export BLASTDIR='/usr/bin'
>
> since blastall is located in my machine in /usr/bin/blastall.
> I tried to solve the problem setting in bashrc:
> export BLASTDATADIR='/conf/analysis/sclblast/gramneg'.
> Anyway nothing changed in the error message, and nothing changes even 
> setting
> absurd addresses for BLASTDATADIR!
>
> Any suggestion? thank you all,guys

Sorry my fault....
Our coffe' machine is broken... ;-)

BLASTDATADIR should point to the 'data' directory of
your blast installation, that one in which you have the
matrixes used by blast binaries.

HTH

Remo

From ewijaya at singnet.com.sg  Wed Mar 16 15:47:43 2005
From: ewijaya at singnet.com.sg (Edward Wijaya)
Date: Thu Mar 17 10:09:45 2005
Subject: [Bioperl-l] Getting IC & Consensus with
	Bio::Matrix::PSM::SiteMatrix - The Code
In-Reply-To: <4235A3F9.6000208@utk.edu>
References: <Pine.LNX.4.44.0503131217210.16978-100000@biosysadmin.com>
	<4235A3F9.6000208@utk.edu> <op.snme5rcepncm2o@mail.singnet.com.sg>
	<42358E7F.7020209@utk.edu>
Message-ID: <op.snqz1tpmpncm2o@mail.singnet.com.sg>

On Mon, 14 Mar 2005 22:47:21 +0800, Stefan Kirov <skirov@utk.edu> wrote:


>>
> Sure, that would be great. Just send it and I will optimize it if I can  
> and put it in. But maybe it should go to Bio::Tools... Any thoughts from  
> anyone else?

Stef,

Sorry for the delay. Attached is the code that compute PWM and IC,
given an array of strings.

Hope it maybe useful.

-- 
Edward WIJAYA
Singapore
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compute_ic_pwm.pl
Type: application/octet-stream
Size: 7075 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050317/de22092c/compute_ic_pwm.obj
From faga at cshl.org  Thu Mar 17 13:42:19 2005
From: faga at cshl.org (Ben Faga)
Date: Thu Mar 17 16:25:14 2005
Subject: [Bioperl-l] Symlink on install
Message-ID: <1111084939.6085.163.camel@ricotta>

Hello everyone,

I've replaced the bp_bulk_load_gff.pl with a script that takes the place
of both itself (mysql version) and bp_pg_bulk_load_gff.pl (postgres
version).

Upon install of bioperl, I want to create a symbolic link from postgres
version to bp_bulk_load_gff.pl so that this change will be transparent
to people who have been using the postgres version.

I have a working solution but I wouldn't mind hearing suggestions and
critiques.

The way that it works is on make, an external script symlink_scripts.pl
gets created with all the necessary path info.  In the postamble of
Makefile.PL, I inserted a line to call the symlink_scripts.pl file.  

Then on install, symlink_scripts.pl is run and creates the symbolic
link.  I used the Perl symlink function to create the link.  On systems
where symlink doesn't work, it catches the error and prints a note to
the user.  That is untested though since I have only tested it on a
fedora box.

If all of this sounds good, I have a question about where I should place
the symlink_scripts.PLS file.  It has been suggested that I might put it
in the maintenance directory.

Any thoughts.

Ben

From lopaki at gmail.com  Thu Mar 17 17:19:28 2005
From: lopaki at gmail.com (Scott Lambdin)
Date: Thu Mar 17 17:21:09 2005
Subject: [Bioperl-l] Does BioPerl like mpiBlast?
Message-ID: <529e768305031714193ab15b9d@mail.gmail.com>

Help please.  The scientists have found a blast job that eats all the
user memory (~4Gigabytes) on the little 32-bit blast server I set up
for them.  I was looking at giving them mpiBLAST so that they can
spread the database over some processes, but a requirement is to have
the BLAST program usable by the BioPerl.  Would it be hard for them to
use mpiBLAST in BioPerl?   That is, harder than using regular NCBI
BLAST?

--Scott
From lopaki at gmail.com  Thu Mar 17 17:19:28 2005
From: lopaki at gmail.com (Scott Lambdin)
Date: Thu Mar 17 17:40:27 2005
Subject: [Bioperl-l] Does BioPerl like mpiBlast?
Message-ID: <529e768305031714193ab15b9d@mail.gmail.com>

Help please.  The scientists have found a blast job that eats all the
user memory (~4Gigabytes) on the little 32-bit blast server I set up
for them.  I was looking at giving them mpiBLAST so that they can
spread the database over some processes, but a requirement is to have
the BLAST program usable by the BioPerl.  Would it be hard for them to
use mpiBLAST in BioPerl?   That is, harder than using regular NCBI
BLAST?

--Scott
From hlapp at gmx.net  Thu Mar 17 19:17:39 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu Mar 17 19:13:26 2005
Subject: [Bioperl-l] how to parse the GenPept sequence object to get
	the'DBSOURCE' field
In-Reply-To: <GPENLDEIJJHJLHOAJBBPAEBCCDAA.brian_osborne@cognia.com>
Message-ID: <233951CE-9743-11D9-8711-000A959EB4C4@gmx.net>

Isn't this a dbxref? So, yes the work should be in genbank.pm but it 
should create a Bio::Annotation::DBLink object instead of a 
SimpleValue. DBLink will also properly represent version, accession, 
and database, instead of just a flat string.

	-hilmar

On Thursday, March 17, 2005, at 06:06  AM, Brian Osborne wrote:

> K,
>
> I've added some code to SeqIO/genbank.pm that appears to work but I 
> can't
> commit it until I ask the Bioperl designers a question. Namely, it 
> appears
> that this DBSOURCE field is specific to Genbank Protein, so the work of
> creating the Annotation::SimpleValue should be in genbank.pm, not
> RichSeq.pm, right?
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo
> Kenji Shikida
> Sent: Wednesday, March 16, 2005 2:16 PM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] how to parse the GenPept sequence object to get
> the'DBSOURCE' field
>
>
> does anyone know how to parse the GenPept sequence object to get the
> 'DBSOURCE' field?
>
> e.g. human.protein.gpff
>
> LOCUS       NP_000358                245 aa            linear   PRI
> 31-OCT-2000
> DEFINITION  thiopurine S-methyltransferase [Homo sapiens].
> ACCESSION   NP_000358
> VERSION     NP_000358.1  GI:4507653
> DBSOURCE    REFSEQ: accession NM_000367.1  <<==
> KEYWORDS    .
> SOURCE      Homo sapiens (human)
>
> I found no answer reading the docs, and there is the same unanswered
> question in this list archives at
>
> http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html
>
> thanks in advance
>
> K.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From s0460205 at sms.ed.ac.uk  Fri Mar 18 06:44:56 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Fri Mar 18 06:39:51 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
Message-ID: <1111146296.423abf385d6e9@sms.ed.ac.uk>


Hi,

Can you please help me with an error message? I have just installed a BioSQL
database and am trying to run the load_ncbi_taxonomy.pl script to get taxonomy
data into my database before I start to load sequences in. The database has
been created and is empty, however, I get the following error message:


Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at
load_ncbi_taxonom.pl line 628
gunzip: taxdata/taxdump.tar.gz: No such file or directory
sh: line 1: cd: taxdata: No such file or directory
tar: taxdump.tar: cannot open: No such file or directory
tar: error is not recoverable: exiting now
loading NCBI taxon database in taxdata:
       ... retrieving all taxon nodes in the database
       ... reading in taxon nodes from nodes.dmp
Couldn't open data file taxdata/nodes.dmp: No such file or directory
rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line 818.
Use of uninitialized value in concatenation (.) or string at
load_ncbi_taxonomy.pl line 820.
rollback failed
From brian_osborne at cognia.com  Fri Mar 18 07:33:00 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Mar 18 07:31:29 2005
Subject: [Bioperl-l] how to parse the GenPept sequence object to get
	the'DBSOURCE' field
In-Reply-To: <233951CE-9743-11D9-8711-000A959EB4C4@gmx.net>
Message-ID: <GPENLDEIJJHJLHOAJBBPAECICDAA.brian_osborne@cognia.com>

Hilmar,

Excellent. OK, I need some suggestions as to values, this is an annotation
that I've never constructed. Here's an example:

DATABASE GenBank

PRIMARY_ID AAC12345

OPTIONAL_ID AAC12345.2

COMMENT: ?

TAGNAME: dblink

NAMESPACE: ?

AUTHORITY: ?

VERSION: 2


Brian O.


-----Original Message-----
From: Hilmar Lapp [mailto:hlapp@gmx.net]
Sent: Thursday, March 17, 2005 7:18 PM
To: Brian Osborne
Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to get
the'DBSOURCE' field


Isn't this a dbxref? So, yes the work should be in genbank.pm but it
should create a Bio::Annotation::DBLink object instead of a
SimpleValue. DBLink will also properly represent version, accession,
and database, instead of just a flat string.

	-hilmar

On Thursday, March 17, 2005, at 06:06  AM, Brian Osborne wrote:

> K,
>
> I've added some code to SeqIO/genbank.pm that appears to work but I
> can't
> commit it until I ask the Bioperl designers a question. Namely, it
> appears
> that this DBSOURCE field is specific to Genbank Protein, so the work of
> creating the Annotation::SimpleValue should be in genbank.pm, not
> RichSeq.pm, right?
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo
> Kenji Shikida
> Sent: Wednesday, March 16, 2005 2:16 PM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] how to parse the GenPept sequence object to get
> the'DBSOURCE' field
>
>
> does anyone know how to parse the GenPept sequence object to get the
> 'DBSOURCE' field?
>
> e.g. human.protein.gpff
>
> LOCUS       NP_000358                245 aa            linear   PRI
> 31-OCT-2000
> DEFINITION  thiopurine S-methyltransferase [Homo sapiens].
> ACCESSION   NP_000358
> VERSION     NP_000358.1  GI:4507653
> DBSOURCE    REFSEQ: accession NM_000367.1  <<==
> KEYWORDS    .
> SOURCE      Homo sapiens (human)
>
> I found no answer reading the docs, and there is the same unanswered
> question in this list archives at
>
> http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html
>
> thanks in advance
>
> K.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From brian_osborne at cognia.com  Fri Mar 18 08:18:31 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Mar 18 08:14:23 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
In-Reply-To: <1111146296.423abf385d6e9@sms.ed.ac.uk>
Message-ID: <GPENLDEIJJHJLHOAJBBPKECJCDAA.brian_osborne@cognia.com>

SG,

=head1 DESCRIPTION

This script loads or updates a biosql schema with the NCBI Taxon
Database. There are a number of options to do with where the biosql
database is (i.e., database name, hostname, user for database,
password, database name).

This script may download the NCBI Taxon Database from the NCBI FTP
server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise it
expects the files to be downloaded already.


Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
Sent: Friday, March 18, 2005 6:45 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Loading taxonomy data into BioSQL


Hi,

Can you please help me with an error message? I have just installed a BioSQL
database and am trying to run the load_ncbi_taxonomy.pl script to get
taxonomy
data into my database before I start to load sequences in. The database has
been created and is empty, however, I get the following error message:


Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at
load_ncbi_taxonom.pl line 628
gunzip: taxdata/taxdump.tar.gz: No such file or directory
sh: line 1: cd: taxdata: No such file or directory
tar: taxdump.tar: cannot open: No such file or directory
tar: error is not recoverable: exiting now
loading NCBI taxon database in taxdata:
       ... retrieving all taxon nodes in the database
       ... reading in taxon nodes from nodes.dmp
Couldn't open data file taxdata/nodes.dmp: No such file or directory
rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line
818.
Use of uninitialized value in concatenation (.) or string at
load_ncbi_taxonomy.pl line 820.
rollback failed
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From s0460205 at sms.ed.ac.uk  Fri Mar 18 08:45:52 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Fri Mar 18 08:40:23 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
In-Reply-To: <GPENLDEIJJHJLHOAJBBPKECJCDAA.brian_osborne@cognia.com>
References: <GPENLDEIJJHJLHOAJBBPKECJCDAA.brian_osborne@cognia.com>
Message-ID: <1111153552.423adb90e3e0d@sms.ed.ac.uk>

I have been trying:

perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass
password -download

and this gave me the error message below.
If I download the ncbi_taxonomy data manually it and direct the perl script to
this using:

perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass
password -directory /home/s0460205/

This seems to get a bit further but still results in error,

"loading NCBI taxon database in /home/s0460205:
   ... retrieving all taxon nodes in the database
   ... reading in taxon nodes from nodes.dmp
Couldn't open data file taxdata/nodes.dmp: No such file or directory
rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line
818.
Use of uninitialized value in concatenation (.) or string at
load_ncbi_taxonomy.pl line 820.
rollback failed

It seems to be choking on finding the nodes.dmp but I'm not sure why?!


Quoting Brian Osborne <brian_osborne@cognia.com>:

> SG,
>
> =head1 DESCRIPTION
>
> This script loads or updates a biosql schema with the NCBI Taxon
> Database. There are a number of options to do with where the biosql
> database is (i.e., database name, hostname, user for database,
> password, database name).
>
> This script may download the NCBI Taxon Database from the NCBI FTP
> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise it
> expects the files to be downloaded already.
>
>
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
> Sent: Friday, March 18, 2005 6:45 AM
> To: bioperl-l@portal.open-bio.org
> Subject: [Bioperl-l] Loading taxonomy data into BioSQL
>
>
>
>
> Hi,
>
> Can you please help me with an error message? I have just installed a BioSQL
> database and am trying to run the load_ncbi_taxonomy.pl script to get
> taxonomy
> data into my database before I start to load sequences in. The database has
> been created and is empty, however, I get the following error message:
>
>
> Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at
> load_ncbi_taxonom.pl line 628
> gunzip: taxdata/taxdump.tar.gz: No such file or directory
> sh: line 1: cd: taxdata: No such file or directory
> tar: taxdump.tar: cannot open: No such file or directory
> tar: error is not recoverable: exiting now
> loading NCBI taxon database in taxdata:
>        ... retrieving all taxon nodes in the database
>        ... reading in taxon nodes from nodes.dmp
> Couldn't open data file taxdata/nodes.dmp: No such file or directory
> rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line
> 818.
> Use of uninitialized value in concatenation (.) or string at
> load_ncbi_taxonomy.pl line 820.
> rollback failed
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>


From s0460205 at sms.ed.ac.uk  Fri Mar 18 09:05:18 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Fri Mar 18 08:59:39 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
In-Reply-To: <1111153552.423adb90e3e0d@sms.ed.ac.uk>
References: <GPENLDEIJJHJLHOAJBBPKECJCDAA.brian_osborne@cognia.com>
	<1111153552.423adb90e3e0d@sms.ed.ac.uk>
Message-ID: <1111154718.423ae01e25896@sms.ed.ac.uk>

I find that if I manually gunzip and tar the download from ncbi then the script
finds the file nodes.dmp (N.B not sure if this is a fault with
load_ncbi_taxonomy.pl or something with my system?!)

The script then tries to load the data into the taxon table but the column
"taxon_id" type is INTEGER but the script thinks it is varchar. So either need
to change the database column to varchar or change the perl script to INTEGER.

Has anyone had this problem?!


Quoting s0460205@sms.ed.ac.uk:

> I have been trying:
>
> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass
> password -download
>
> and this gave me the error message below.
> If I download the ncbi_taxonomy data manually it and direct the perl script
> to
> this using:
>
> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass
> password -directory /home/s0460205/
>
> This seems to get a bit further but still results in error,
>
> "loading NCBI taxon database in /home/s0460205:
>    ... retrieving all taxon nodes in the database
>    ... reading in taxon nodes from nodes.dmp
> Couldn't open data file taxdata/nodes.dmp: No such file or directory
> rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line
> 818.
> Use of uninitialized value in concatenation (.) or string at
> load_ncbi_taxonomy.pl line 820.
> rollback failed
>
> It seems to be choking on finding the nodes.dmp but I'm not sure why?!
>
>
> Quoting Brian Osborne <brian_osborne@cognia.com>:
>
> > SG,
> >
> > =head1 DESCRIPTION
> >
> > This script loads or updates a biosql schema with the NCBI Taxon
> > Database. There are a number of options to do with where the biosql
> > database is (i.e., database name, hostname, user for database,
> > password, database name).
> >
> > This script may download the NCBI Taxon Database from the NCBI FTP
> > server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise it
> > expects the files to be downloaded already.
> >
> >
> >
> > Brian O.
> >
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org
> > [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
> > Sent: Friday, March 18, 2005 6:45 AM
> > To: bioperl-l@portal.open-bio.org
> > Subject: [Bioperl-l] Loading taxonomy data into BioSQL
> >
> >
> >
> >
> > Hi,
> >
> > Can you please help me with an error message? I have just installed a
> BioSQL
> > database and am trying to run the load_ncbi_taxonomy.pl script to get
> > taxonomy
> > data into my database before I start to load sequences in. The database has
> > been created and is empty, however, I get the following error message:
> >
> >
> > Cannot open Local file taxdata/taxdump.tar.gz: No such file or directory at
> > load_ncbi_taxonom.pl line 628
> > gunzip: taxdata/taxdump.tar.gz: No such file or directory
> > sh: line 1: cd: taxdata: No such file or directory
> > tar: taxdump.tar: cannot open: No such file or directory
> > tar: error is not recoverable: exiting now
> > loading NCBI taxon database in taxdata:
> >        ... retrieving all taxon nodes in the database
> >        ... reading in taxon nodes from nodes.dmp
> > Couldn't open data file taxdata/nodes.dmp: No such file or directory
> > rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl line
> > 818.
> > Use of uninitialized value in concatenation (.) or string at
> > load_ncbi_taxonomy.pl line 820.
> > rollback failed
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
>
>
>


From hlapp at gmx.net  Fri Mar 18 09:17:34 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 18 09:12:18 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
In-Reply-To: <1111154718.423ae01e25896@sms.ed.ac.uk>
Message-ID: <7920B327-97B8-11D9-BAA9-000A959EB4C4@gmx.net>

Why do you believe the script thinks that taxon_id is a varchar? It 
doesn't AFAIK.

Also, not sure why your Pg (you are using PostgreSQL, right?) is in 
auto-commit mode. That doesn't sound right.

	-hilmar

On Friday, March 18, 2005, at 06:05  AM, SG Edwards wrote:

> I find that if I manually gunzip and tar the download from ncbi then 
> the script
> finds the file nodes.dmp (N.B not sure if this is a fault with
> load_ncbi_taxonomy.pl or something with my system?!)
>
> The script then tries to load the data into the taxon table but the 
> column
> "taxon_id" type is INTEGER but the script thinks it is varchar. So 
> either need
> to change the database column to varchar or change the perl script to 
> INTEGER.
>
> Has anyone had this problem?!
>
>
> Quoting s0460205@sms.ed.ac.uk:
>
>> I have been trying:
>>
>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 
>> -dbpass
>> password -download
>>
>> and this gave me the error message below.
>> If I download the ncbi_taxonomy data manually it and direct the perl 
>> script
>> to
>> this using:
>>
>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 
>> -dbpass
>> password -directory /home/s0460205/
>>
>> This seems to get a bit further but still results in error,
>>
>> "loading NCBI taxon database in /home/s0460205:
>>    ... retrieving all taxon nodes in the database
>>    ... reading in taxon nodes from nodes.dmp
>> Couldn't open data file taxdata/nodes.dmp: No such file or directory
>> rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl 
>> line
>> 818.
>> Use of uninitialized value in concatenation (.) or string at
>> load_ncbi_taxonomy.pl line 820.
>> rollback failed
>>
>> It seems to be choking on finding the nodes.dmp but I'm not sure why?!
>>
>>
>> Quoting Brian Osborne <brian_osborne@cognia.com>:
>>
>>> SG,
>>>
>>> =head1 DESCRIPTION
>>>
>>> This script loads or updates a biosql schema with the NCBI Taxon
>>> Database. There are a number of options to do with where the biosql
>>> database is (i.e., database name, hostname, user for database,
>>> password, database name).
>>>
>>> This script may download the NCBI Taxon Database from the NCBI FTP
>>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise 
>>> it
>>> expects the files to be downloaded already.
>>>
>>>
>>>
>>> Brian O.
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces@portal.open-bio.org
>>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
>>> Sent: Friday, March 18, 2005 6:45 AM
>>> To: bioperl-l@portal.open-bio.org
>>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL
>>>
>>>
>>>
>>>
>>> Hi,
>>>
>>> Can you please help me with an error message? I have just installed a
>> BioSQL
>>> database and am trying to run the load_ncbi_taxonomy.pl script to get
>>> taxonomy
>>> data into my database before I start to load sequences in. The 
>>> database has
>>> been created and is empty, however, I get the following error 
>>> message:
>>>
>>>
>>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or 
>>> directory at
>>> load_ncbi_taxonom.pl line 628
>>> gunzip: taxdata/taxdump.tar.gz: No such file or directory
>>> sh: line 1: cd: taxdata: No such file or directory
>>> tar: taxdump.tar: cannot open: No such file or directory
>>> tar: error is not recoverable: exiting now
>>> loading NCBI taxon database in taxdata:
>>>        ... retrieving all taxon nodes in the database
>>>        ... reading in taxon nodes from nodes.dmp
>>> Couldn't open data file taxdata/nodes.dmp: No such file or directory
>>> rollback ineffective with AutoCommit enabled at 
>>> load_ncbi_taxonomy.pl line
>>> 818.
>>> Use of uninitialized value in concatenation (.) or string at
>>> load_ncbi_taxonomy.pl line 820.
>>> rollback failed
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From s0460205 at sms.ed.ac.uk  Fri Mar 18 09:25:24 2005
From: s0460205 at sms.ed.ac.uk (SG Edwards)
Date: Fri Mar 18 09:20:49 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
In-Reply-To: <7920B327-97B8-11D9-BAA9-000A959EB4C4@gmx.net>
References: <7920B327-97B8-11D9-BAA9-000A959EB4C4@gmx.net>
Message-ID: <1111155924.423ae4d4c422f@sms.ed.ac.uk>

Thanks Hilmar,

Yeah I am using Postgres, should I take it out of auto-commit mode? I thought
the script deals with this but maybe not?

If I run it with:

perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205 -dbpass
 password -directory /home/s0460205/

I get the error message:

loading NCBI taxon database in /home/s0460205:
... retrieving all taxon nodes in the database
    ... reading in taxon nodes from nodes.dmp
    ... insert/update/delete taxon nodes
failed to insert node (1;1;1;no rank;1;0): ERROR: column "taxon_id" is of type
integer but expression is of type character varying
HINT: You will need to rewrite or cast the expression


Quoting Hilmar Lapp <hlapp@gmx.net>:

> Why do you believe the script thinks that taxon_id is a varchar? It
> doesn't AFAIK.
>
> Also, not sure why your Pg (you are using PostgreSQL, right?) is in
> auto-commit mode. That doesn't sound right.
>
> 	-hilmar
>
> On Friday, March 18, 2005, at 06:05  AM, SG Edwards wrote:
>
> > I find that if I manually gunzip and tar the download from ncbi then
> > the script
> > finds the file nodes.dmp (N.B not sure if this is a fault with
> > load_ncbi_taxonomy.pl or something with my system?!)
> >
> > The script then tries to load the data into the taxon table but the
> > column
> > "taxon_id" type is INTEGER but the script thinks it is varchar. So
> > either need
> > to change the database column to varchar or change the perl script to
> > INTEGER.
> >
> > Has anyone had this problem?!
> >
> >
> > Quoting s0460205@sms.ed.ac.uk:
> >
> >> I have been trying:
> >>
> >> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205
> >> -dbpass
> >> password -download
> >>
> >> and this gave me the error message below.
> >> If I download the ncbi_taxonomy data manually it and direct the perl
> >> script
> >> to
> >> this using:
> >>
> >> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205
> >> -dbpass
> >> password -directory /home/s0460205/
> >>
> >> This seems to get a bit further but still results in error,
> >>
> >> "loading NCBI taxon database in /home/s0460205:
> >>    ... retrieving all taxon nodes in the database
> >>    ... reading in taxon nodes from nodes.dmp
> >> Couldn't open data file taxdata/nodes.dmp: No such file or directory
> >> rollback ineffective with AutoCommit enabled at load_ncbi_taxonomy.pl
> >> line
> >> 818.
> >> Use of uninitialized value in concatenation (.) or string at
> >> load_ncbi_taxonomy.pl line 820.
> >> rollback failed
> >>
> >> It seems to be choking on finding the nodes.dmp but I'm not sure why?!
> >>
> >>
> >> Quoting Brian Osborne <brian_osborne@cognia.com>:
> >>
> >>> SG,
> >>>
> >>> =head1 DESCRIPTION
> >>>
> >>> This script loads or updates a biosql schema with the NCBI Taxon
> >>> Database. There are a number of options to do with where the biosql
> >>> database is (i.e., database name, hostname, user for database,
> >>> password, database name).
> >>>
> >>> This script may download the NCBI Taxon Database from the NCBI FTP
> >>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise
> >>> it
> >>> expects the files to be downloaded already.
> >>>
> >>>
> >>>
> >>> Brian O.
> >>>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces@portal.open-bio.org
> >>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG Edwards
> >>> Sent: Friday, March 18, 2005 6:45 AM
> >>> To: bioperl-l@portal.open-bio.org
> >>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL
> >>>
> >>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>> Can you please help me with an error message? I have just installed a
> >> BioSQL
> >>> database and am trying to run the load_ncbi_taxonomy.pl script to get
> >>> taxonomy
> >>> data into my database before I start to load sequences in. The
> >>> database has
> >>> been created and is empty, however, I get the following error
> >>> message:
> >>>
> >>>
> >>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or
> >>> directory at
> >>> load_ncbi_taxonom.pl line 628
> >>> gunzip: taxdata/taxdump.tar.gz: No such file or directory
> >>> sh: line 1: cd: taxdata: No such file or directory
> >>> tar: taxdump.tar: cannot open: No such file or directory
> >>> tar: error is not recoverable: exiting now
> >>> loading NCBI taxon database in taxdata:
> >>>        ... retrieving all taxon nodes in the database
> >>>        ... reading in taxon nodes from nodes.dmp
> >>> Couldn't open data file taxdata/nodes.dmp: No such file or directory
> >>> rollback ineffective with AutoCommit enabled at
> >>> load_ncbi_taxonomy.pl line
> >>> 818.
> >>> Use of uninitialized value in concatenation (.) or string at
> >>> load_ncbi_taxonomy.pl line 820.
> >>> rollback failed
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>


From vaughn at cshl.org  Fri Mar 18 06:55:20 2005
From: vaughn at cshl.org (Matthew Vaughn)
Date: Fri Mar 18 13:32:57 2005
Subject: [Bioperl-l] How to express 'histogram' data in GFF3
Message-ID: <9A6B282E-97A4-11D9-A08F-000A95A26D06@cshl.org>

OK, I've bashed my head against this and have come up short, so now I'm 
asking for help. Recently, I decided to upgrade my development system 
to BioPerl 1.5 and bring all my code up to GFF3 compliance. This of 
course, includes code that generates GFF files for loading into our 
local Generic Genome Browser (1.62).

The problem comes when I try to express histogram data. In the past, 
rows like this worked fine as GFF2

"ChrII	rev1	poly1	1591004	1591068	464.835	-	.	poly1 ChrII:rev1"

but this is invalid for GFF3. As far as I can figure from interpreting 
the GFF3 spec, the same record should look something like this

"ChrII	rev1	poly1	1591004	1591068	464.835	-	.	ID=poly1%3AChrII%3Arev1"

But this violates the GFF3 spec in that ID is now non-unique. Rows 
formatted thusly also fail to display any histogram data in my browser.

I've considered loading the array data as GFF2 and my annotation data 
as GFF3, but that seems, well, inelegant (plus I don't even know if 
that will work)

Any input will be very much appreciated!

Matt

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
Delbruck Laboratory / Martienssen Group
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 367-8469

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2359 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050318/3934a693/smime-0001.bin
From hlapp at gmx.net  Fri Mar 18 23:14:24 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 18 23:11:14 2005
Subject: [Bioperl-l] how to parse the GenPept sequence object to get
	the'DBSOURCE' field
In-Reply-To: <GPENLDEIJJHJLHOAJBBPAECICDAA.brian_osborne@cognia.com>
Message-ID: <60CF6110-982D-11D9-AB95-000A959EB4C4@gmx.net>


On Friday, March 18, 2005, at 04:33  AM, Brian Osborne wrote:

> Hilmar,
>
> Excellent. OK, I need some suggestions as to values, this is an 
> annotation
> that I've never constructed. Here's an example:
>
> DATABASE GenBank
>
> PRIMARY_ID AAC12345
>
> OPTIONAL_ID AAC12345.2

No, leave blank - it is meant for cases where it is really different 
from the primary_id.

>
> COMMENT: ?

right, undef

>
> TAGNAME: dblink

Correct.

>
> NAMESPACE: ?

Ignore. I believe it defaults to database automagically.

>
> AUTHORITY: ?

right, undef

>
> VERSION: 2

right.


Cheers,

	-hilmar

>
>
> Brian O.
>
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net]
> Sent: Thursday, March 17, 2005 7:18 PM
> To: Brian Osborne
> Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org
> Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to 
> get
> the'DBSOURCE' field
>
>
> Isn't this a dbxref? So, yes the work should be in genbank.pm but it
> should create a Bio::Annotation::DBLink object instead of a
> SimpleValue. DBLink will also properly represent version, accession,
> and database, instead of just a flat string.
>
> 	-hilmar
>
> On Thursday, March 17, 2005, at 06:06  AM, Brian Osborne wrote:
>
>> K,
>>
>> I've added some code to SeqIO/genbank.pm that appears to work but I
>> can't
>> commit it until I ask the Bioperl designers a question. Namely, it
>> appears
>> that this DBSOURCE field is specific to Genbank Protein, so the work 
>> of
>> creating the Annotation::SimpleValue should be in genbank.pm, not
>> RichSeq.pm, right?
>>
>> Brian O.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org
>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo
>> Kenji Shikida
>> Sent: Wednesday, March 16, 2005 2:16 PM
>> To: bioperl-l@portal.open-bio.org
>> Subject: [Bioperl-l] how to parse the GenPept sequence object to get
>> the'DBSOURCE' field
>>
>>
>> does anyone know how to parse the GenPept sequence object to get the
>> 'DBSOURCE' field?
>>
>> e.g. human.protein.gpff
>>
>> LOCUS       NP_000358                245 aa            linear   PRI
>> 31-OCT-2000
>> DEFINITION  thiopurine S-methyltransferase [Homo sapiens].
>> ACCESSION   NP_000358
>> VERSION     NP_000358.1  GI:4507653
>> DBSOURCE    REFSEQ: accession NM_000367.1  <<==
>> KEYWORDS    .
>> SOURCE      Homo sapiens (human)
>>
>> I found no answer reading the docs, and there is the same unanswered
>> question in this list archives at
>>
>> http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html
>>
>> thanks in advance
>>
>> K.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From brian_osborne at cognia.com  Fri Mar 18 23:24:40 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Mar 18 23:21:27 2005
Subject: [Bioperl-l] how to parse the GenPept sequence object to get
	the'DBSOURCE' field
In-Reply-To: <60CF6110-982D-11D9-AB95-000A959EB4C4@gmx.net>
Message-ID: <GPENLDEIJJHJLHOAJBBPCEDKCDAA.brian_osborne@cognia.com>

Hilmar and K,

OK, it seems to read and write properly, I'll commit. Give it a try K.

Brian O.

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp@gmx.net]
Sent: Friday, March 18, 2005 11:14 PM
To: Brian Osborne
Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to get
the'DBSOURCE' field


On Friday, March 18, 2005, at 04:33  AM, Brian Osborne wrote:

> Hilmar,
>
> Excellent. OK, I need some suggestions as to values, this is an 
> annotation
> that I've never constructed. Here's an example:
>
> DATABASE GenBank
>
> PRIMARY_ID AAC12345
>
> OPTIONAL_ID AAC12345.2

No, leave blank - it is meant for cases where it is really different 
from the primary_id.

>
> COMMENT: ?

right, undef

>
> TAGNAME: dblink

Correct.

>
> NAMESPACE: ?

Ignore. I believe it defaults to database automagically.

>
> AUTHORITY: ?

right, undef

>
> VERSION: 2

right.


Cheers,

	-hilmar

>
>
> Brian O.
>
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp@gmx.net]
> Sent: Thursday, March 17, 2005 7:18 PM
> To: Brian Osborne
> Cc: Leonardo Kenji Shikida; bioperl-l@portal.open-bio.org
> Subject: Re: [Bioperl-l] how to parse the GenPept sequence object to 
> get
> the'DBSOURCE' field
>
>
> Isn't this a dbxref? So, yes the work should be in genbank.pm but it
> should create a Bio::Annotation::DBLink object instead of a
> SimpleValue. DBLink will also properly represent version, accession,
> and database, instead of just a flat string.
>
> 	-hilmar
>
> On Thursday, March 17, 2005, at 06:06  AM, Brian Osborne wrote:
>
>> K,
>>
>> I've added some code to SeqIO/genbank.pm that appears to work but I
>> can't
>> commit it until I ask the Bioperl designers a question. Namely, it
>> appears
>> that this DBSOURCE field is specific to Genbank Protein, so the work 
>> of
>> creating the Annotation::SimpleValue should be in genbank.pm, not
>> RichSeq.pm, right?
>>
>> Brian O.
>>
>> -----Original Message-----
>> From: bioperl-l-bounces@portal.open-bio.org
>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Leonardo
>> Kenji Shikida
>> Sent: Wednesday, March 16, 2005 2:16 PM
>> To: bioperl-l@portal.open-bio.org
>> Subject: [Bioperl-l] how to parse the GenPept sequence object to get
>> the'DBSOURCE' field
>>
>>
>> does anyone know how to parse the GenPept sequence object to get the
>> 'DBSOURCE' field?
>>
>> e.g. human.protein.gpff
>>
>> LOCUS       NP_000358                245 aa            linear   PRI
>> 31-OCT-2000
>> DEFINITION  thiopurine S-methyltransferase [Homo sapiens].
>> ACCESSION   NP_000358
>> VERSION     NP_000358.1  GI:4507653
>> DBSOURCE    REFSEQ: accession NM_000367.1  <<==
>> KEYWORDS    .
>> SOURCE      Homo sapiens (human)
>>
>> I found no answer reading the docs, and there is the same unanswered
>> question in this list archives at
>>
>> http://bioperl.org/pipermail/bioperl-l/2003-June/012438.html
>>
>> thanks in advance
>>
>> K.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From hlapp at gmx.net  Fri Mar 18 23:40:33 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 18 23:35:32 2005
Subject: [Bioperl-l] Loading taxonomy data into BioSQL
In-Reply-To: <1111155924.423ae4d4c422f@sms.ed.ac.uk>
Message-ID: <07F7F769-9831-11D9-AB95-000A959EB4C4@gmx.net>


On Friday, March 18, 2005, at 06:25  AM, SG Edwards wrote:

> Thanks Hilmar,
>
> Yeah I am using Postgres, should I take it out of auto-commit mode? I  
> thought
> the script deals with this but maybe not?

Why would you ever want to run a database in auto-commit mode unless  
that's the only option you have like with mysql?

If you run this in auto-commit mode the users will see a totally  
inconsistent state for possibly more than half an hour. The script goes  
to great lengths not to leave the transction unless it really doesn't  
know any better.

>
> If I run it with:
>
> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205  
> -dbpass
>  password -directory /home/s0460205/
>
> I get the error message:
>
> loading NCBI taxon database in /home/s0460205:
> ... retrieving all taxon nodes in the database
>     ... reading in taxon nodes from nodes.dmp
>     ... insert/update/delete taxon nodes
> failed to insert node (1;1;1;no rank;1;0): ERROR: column "taxon_id" is  
> of type
> integer but expression is of type character varying
> HINT: You will need to rewrite or cast the expression

OK this is the piece that reveals it. It's a bug in DBD::Pg 1.40  
against 8.0x PostgreSQL servers.

Check here for the thread, a fix is in preparation but apparently  
doesn't fully catch  it yet.

http://gborg.postgresql.org/pipermail/dbdpg-general/2005-March/ 
001514.html

Maybe 1.41 is out already? Or you can downgrade Pg to 7.4.x? Or wait  
until the DBD::Pg people fixed it?

In any event, beyond our control.

	-hilmar

>
>
> Quoting Hilmar Lapp <hlapp@gmx.net>:
>
>> Why do you believe the script thinks that taxon_id is a varchar? It
>> doesn't AFAIK.
>>
>> Also, not sure why your Pg (you are using PostgreSQL, right?) is in
>> auto-commit mode. That doesn't sound right.
>>
>> 	-hilmar
>>
>> On Friday, March 18, 2005, at 06:05  AM, SG Edwards wrote:
>>
>>> I find that if I manually gunzip and tar the download from ncbi then
>>> the script
>>> finds the file nodes.dmp (N.B not sure if this is a fault with
>>> load_ncbi_taxonomy.pl or something with my system?!)
>>>
>>> The script then tries to load the data into the taxon table but the
>>> column
>>> "taxon_id" type is INTEGER but the script thinks it is varchar. So
>>> either need
>>> to change the database column to varchar or change the perl script to
>>> INTEGER.
>>>
>>> Has anyone had this problem?!
>>>
>>>
>>> Quoting s0460205@sms.ed.ac.uk:
>>>
>>>> I have been trying:
>>>>
>>>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205
>>>> -dbpass
>>>> password -download
>>>>
>>>> and this gave me the error message below.
>>>> If I download the ncbi_taxonomy data manually it and direct the perl
>>>> script
>>>> to
>>>> this using:
>>>>
>>>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205
>>>> -dbpass
>>>> password -directory /home/s0460205/
>>>>
>>>> This seems to get a bit further but still results in error,
>>>>
>>>> "loading NCBI taxon database in /home/s0460205:
>>>>    ... retrieving all taxon nodes in the database
>>>>    ... reading in taxon nodes from nodes.dmp
>>>> Couldn't open data file taxdata/nodes.dmp: No such file or directory
>>>> rollback ineffective with AutoCommit enabled at  
>>>> load_ncbi_taxonomy.pl
>>>> line
>>>> 818.
>>>> Use of uninitialized value in concatenation (.) or string at
>>>> load_ncbi_taxonomy.pl line 820.
>>>> rollback failed
>>>>
>>>> It seems to be choking on finding the nodes.dmp but I'm not sure  
>>>> why?!
>>>>
>>>>
>>>> Quoting Brian Osborne <brian_osborne@cognia.com>:
>>>>
>>>>> SG,
>>>>>
>>>>> =head1 DESCRIPTION
>>>>>
>>>>> This script loads or updates a biosql schema with the NCBI Taxon
>>>>> Database. There are a number of options to do with where the biosql
>>>>> database is (i.e., database name, hostname, user for database,
>>>>> password, database name).
>>>>>
>>>>> This script may download the NCBI Taxon Database from the NCBI FTP
>>>>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise
>>>>> it
>>>>> expects the files to be downloaded already.
>>>>>
>>>>>
>>>>>
>>>>> Brian O.
>>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces@portal.open-bio.org
>>>>> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of SG  
>>>>> Edwards
>>>>> Sent: Friday, March 18, 2005 6:45 AM
>>>>> To: bioperl-l@portal.open-bio.org
>>>>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Can you please help me with an error message? I have just  
>>>>> installed a
>>>> BioSQL
>>>>> database and am trying to run the load_ncbi_taxonomy.pl script to  
>>>>> get
>>>>> taxonomy
>>>>> data into my database before I start to load sequences in. The
>>>>> database has
>>>>> been created and is empty, however, I get the following error
>>>>> message:
>>>>>
>>>>>
>>>>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or
>>>>> directory at
>>>>> load_ncbi_taxonom.pl line 628
>>>>> gunzip: taxdata/taxdump.tar.gz: No such file or directory
>>>>> sh: line 1: cd: taxdata: No such file or directory
>>>>> tar: taxdump.tar: cannot open: No such file or directory
>>>>> tar: error is not recoverable: exiting now
>>>>> loading NCBI taxon database in taxdata:
>>>>>        ... retrieving all taxon nodes in the database
>>>>>        ... reading in taxon nodes from nodes.dmp
>>>>> Couldn't open data file taxdata/nodes.dmp: No such file or  
>>>>> directory
>>>>> rollback ineffective with AutoCommit enabled at
>>>>> load_ncbi_taxonomy.pl line
>>>>> 818.
>>>>> Use of uninitialized value in concatenation (.) or string at
>>>>> load_ncbi_taxonomy.pl line 820.
>>>>> rollback failed
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l@portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From cain at cshl.edu  Sat Mar 19 09:23:54 2005
From: cain at cshl.edu (Scott Cain)
Date: Sat Mar 19 09:18:52 2005
Subject: [Bioperl-l] How to express 'histogram' data in GFF3
In-Reply-To: <9A6B282E-97A4-11D9-A08F-000A95A26D06@cshl.org>
References: <9A6B282E-97A4-11D9-A08F-000A95A26D06@cshl.org>
Message-ID: <1111242234.3557.0.camel@localhost.localdomain>

Matt,

First, let me start by saying there are some "unexplored" areas of GFF3
and GBrowse (at least, they are unexplored for me).

While I haven't test this, it should work fine.  What you can do is
create one parent feature that encapsulates the entire range, and then
have the data points be lines of the parent:

ChrII	rev1	region	1	2000000	.	.	.	ID=poly1%3AChrII%3Arev1
ChrII	rev1	poly1	1591004	1591068	464.835	-	.	Parent=poly1%3AChrII%3Arev1

Now whether this will work in a GFF database with GBrowse currently is
an open question (like I said, I haven't tested it); I know it would
work in a chado database and GBrowse.  You might need a custom
aggregator to make it work in a GFF database.

On the other hand, I'm not convinced that having all the lines with the
same ID violates the GFF3 spec, as you could probably view this as one
big feature of the whole range, and therefore the ID applies to that one
feature, not to the individual pieces that make of the lines of GFF.  If
you want, you can send me a small sample set of data and I'll see what I
can do.

Scott


On Fri, 2005-03-18 at 06:55 -0500, Matthew Vaughn wrote: 
> OK, I've bashed my head against this and have come up short, so now I'm 
> asking for help. Recently, I decided to upgrade my development system 
> to BioPerl 1.5 and bring all my code up to GFF3 compliance. This of 
> course, includes code that generates GFF files for loading into our 
> local Generic Genome Browser (1.62).
> 
> The problem comes when I try to express histogram data. In the past, 
> rows like this worked fine as GFF2
> 
> "ChrII	rev1	poly1	1591004	1591068	464.835	-	.	poly1 ChrII:rev1"
> 
> but this is invalid for GFF3. As far as I can figure from interpreting 
> the GFF3 spec, the same record should look something like this
> 
> "ChrII	rev1	poly1	1591004	1591068	464.835	-	.	ID=poly1%3AChrII%3Arev1"
> 
> But this violates the GFF3 spec in that ID is now non-unique. Rows 
> formatted thusly also fail to display any histogram data in my browser.
> 
> I've considered loading the array data as GFF2 and my annotation data 
> as GFF3, but that seems, well, inelegant (plus I don't even know if 
> that will work)
> 
> Any input will be very much appreciated!
> 
> Matt
> 
> --
> Matthew W. Vaughn, Ph.D.
> Cold Spring Harbor Laboratory
> Delbruck Laboratory / Martienssen Group
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> 
> phone: (516) 367-8469
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

From brian_osborne at cognia.com  Sat Mar 19 10:29:33 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Sat Mar 19 10:39:23 2005
Subject: [Bioperl-l] Symlink on install
In-Reply-To: <1111084939.6085.163.camel@ricotta>
Message-ID: <GPENLDEIJJHJLHOAJBBPCEDPCDAA.brian_osborne@cognia.com>

Ben,

Yes, maintenance/ sounds reasonable. What a surprise!

;-)

Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Ben Faga
Sent: Thursday, March 17, 2005 1:42 PM
To: bioperl-l@bioperl.org
Subject: [Bioperl-l] Symlink on install


Hello everyone,

I've replaced the bp_bulk_load_gff.pl with a script that takes the place
of both itself (mysql version) and bp_pg_bulk_load_gff.pl (postgres
version).

Upon install of bioperl, I want to create a symbolic link from postgres
version to bp_bulk_load_gff.pl so that this change will be transparent
to people who have been using the postgres version.

I have a working solution but I wouldn't mind hearing suggestions and
critiques.

The way that it works is on make, an external script symlink_scripts.pl
gets created with all the necessary path info.  In the postamble of
Makefile.PL, I inserted a line to call the symlink_scripts.pl file.  

Then on install, symlink_scripts.pl is run and creates the symbolic
link.  I used the Perl symlink function to create the link.  On systems
where symlink doesn't work, it catches the error and prints a note to
the user.  That is untested though since I have only tested it on a
fedora box.

If all of this sounds good, I have a question about where I should place
the symlink_scripts.PLS file.  It has been suggested that I might put it
in the maintenance directory.

Any thoughts.

Ben

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From glim at mycybernet.net  Sat Mar 19 15:05:00 2005
From: glim at mycybernet.net (glim@mycybernet.net)
Date: Sat Mar 19 15:27:21 2005
Subject: [Bioperl-l] Yet Another Perl Conference, North America,
	2005 Registration now open
Message-ID: <m1DCkda-000YsrC@mail.cybrnet.net>

    ---------->
    Yet Another Perl Conference, North America, 2005 Registration now
    open.

    Conference dates:  Monday - Wednesday 27 - 29 June 2005

    Location:  89 Chestnut Street         http://89chestnut.com/
               University of Toronto
               Toronto, Ontario, Canada

    Info at: http://yapc.org/America

    Direct registration: 
http://donate.perlfoundation.org/index.pl?node=registrant%20info&conference_id=423

    Full registration fee $85 (USD)

    Book now for great deals on accommodations and ensure a space for
    yourself.

    Speaking slots are still open.  If you would like to present at
    YAPC::NA 2005, see: http://yapc.org/America/cfp-2005.shtml

    Details of this announcement:
       http://yapc.org/America/registration-announcement-2005.txt
    <----------

More Details
============

Registration for YAPC::NA (Yet Another Perl Conference, North America)
2005 in Toronto, Ontario, Canada is now open.

The conference registration price is USD$85.  This price includes
admission to all aspects of the conference, respectable amounts of
catering, several activities and a few conference goodies.

The YAPC North America 2005 conference features...

     * Fantastic speakers

         +  most are the core creators of the technology on which
            they present

         +  many are professional IT authors, trainers and conference
            speakers

     * An excellent learning opportunity

     * A chance to meet Perl professionals from all over North America
       and the world

         +  YAPC attendees tend to be very involved in Perl and so are
            another great way to learn more about what the language
            has to offer beyond just what the speakers have to say

     * Extra-curricular / after hours activities

     * A great location in downtown Toronto

All this, and the price is more than an order of magnitude cheaper
than what commercial conferences can offer.  This is because YAPC is a
100% volunteer effort, both from its organizers and its speakers.
Quality is *not* sacrificed to achieve this stunning level of
affordability.

YAPC provides the best value-for-dollar in IT conferences.  And it's a
ton of fun, too.

The dates of the conference are Monday - Wednesday 27-29 June
2005. The location is 89 Chestnut Street in downtown Toronto, Ontario,
Canada. (Note that a different date block was previously announced;
we moved the conference date to accommodate venue availability.)

http://89chestnut.com/  -- a facility within the University of Toronto

If you are at all interested in attending the conference...

                              Book now!

                              Book now!

                              Book now!

We have room for about 400 attendees and we hope to sell out well
in advance of the late June conference date.  However, the critical
matter is that of hotels.

The YAPC::NA 2005 organizers have made group arrangements with several
facilities around the city to provide _excellent_ quality
accommodations in _very_ convenient locations at _terrific_ prices for
the _full_ capacity of conference attendees (around 400 people).

(Finding, booking and paying accommodations is the responsibility of
the attendees, but we will provide you with a list of the hotels and
university dorms to try first based on our group arrangement with them
when you register for the conference.  Also, see the web site at
http://yapc.org/America/accommodations-2005.shtml.  More details will be
up shortly.  The dorm option will be approx. C$55/night, the hotel
options will be more like C$90/night, and for slightly different prices
there will be options for putting more than 1 person in a room.  Exact
details and how to book will be emailed directly to people who have
registered for the conference as soon as they become available.)

*The catch is -- book now!!*  The group reservations will expire in
early May, at which point in time the group rates will mostly still
apply, but the rooms will be given out on an "availability basis".
Which means that someone else outside of the YAPC group can book the
rooms as well.

Make no mistake -- the rooms *will* be sold.  Toronto is a very active
conference city in the summer and there will be _no_ guarantee of
vacancies either at the facilities we made arrangements with or
anywhere else in the city if you leave it to within 6 weeks of the
conference date.  So, if you want to save yourself the likely-fruitless
headache of scrambling around looking for accommodations at the last
minute,

                              Book now!

                              Book now!

                              Book now!

Have any questions?  Email na-help@yapc.org for more details.

Additionally, we are still welcoming submissions for proposals via:

       http://yapc.org/America/cfp-2005.shtml

The close of the call-for-papers is April 18, 2005 at 11:59 pm
(Toronto time).

If you have any questions regarding the call-for-papers or speaking at
YAPC::NA 2005 please email na-author@yapc.org

We would love to hear from potential sponsors. Please contact the
organizers at na-sponsor@yapc.org to learn about the benefits of
sponsorship.


From Nathan.Johnson at astrazeneca.com  Mon Mar 21 11:01:10 2005
From: Nathan.Johnson at astrazeneca.com (Johnson, Nathan)
Date: Mon Mar 21 11:34:38 2005
Subject: [Bioperl-l] cigarline conversion
Message-ID: <BAF5C0B1B1FE5A41B28C71CC977422E9079617DA@UKAPPHRESMSX01>

Hi bioperlers

Does anyone know of a module which handles the conversion of multiple
alignment cigar line format(multiple strings with M and D's but no I's)
cigar line data to a pairwise format (one string with M,D and I's).

SimpleAlign doesn't seem to do what I want :\

Cheers

Nath

From zhoujie at fudan.edu.cn  Tue Mar 22 04:51:40 2005
From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn)
Date: Tue Mar 22 05:23:56 2005
Subject: [Bioperl-l] How to use proxy in Bioperl?
Message-ID: <7c3e427c5712.7c57127c3e42@fudan.edu.cn>

Hi all,

I'm new to bioperl and here is my question: How to use proxy in 
bioperl? For example I'm using get_Seq_by_acc() method, how can I get 
the sequence when I can only access NCBI via proxy? 

Thanks very much.

J Z
From zhoujie at fudan.edu.cn  Tue Mar 22 04:54:27 2005
From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn)
Date: Tue Mar 22 05:26:43 2005
Subject: [Bioperl-l] How to use proxy in Bioperl?
Message-ID: <7c53817c84b9.7c84b97c5381@fudan.edu.cn>

Hi all,

I'm new to bioperl and here is my question: How to use proxy in 
bioperl? For example I'm using get_Seq_by_acc() method, how can I get 
the sequence when I can only access NCBI via proxy? 

Thanks very much.

J Z
From Marc.Logghe at devgen.com  Tue Mar 22 05:39:02 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Tue Mar 22 05:33:38 2005
Subject: [Bioperl-l] How to use proxy in Bioperl?
Message-ID: <BEE28BF86078B6429D6C780635718E219052B2@morelia.be.devgen.com>

Hi JZ,
In the back LWP::Simple is doing the request for you. By default it
reads the proxy from your environment.
Guess setting the HTTP_PROXY env var should solve your problem.
HTH,
Marc

> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org 
> [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> zhoujie@fudan.edu.cn
> Sent: Tuesday, March 22, 2005 10:52 AM
> To: bioperl-l@bioperl.org
> Subject: [Bioperl-l] How to use proxy in Bioperl?
> 
> Hi all,
> 
> I'm new to bioperl and here is my question: How to use proxy 
> in bioperl? For example I'm using get_Seq_by_acc() method, 
> how can I get the sequence when I can only access NCBI via proxy? 
> 
> Thanks very much.
> 
> J Z
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

From Marc.Logghe at devgen.com  Tue Mar 22 05:58:21 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Tue Mar 22 05:52:55 2005
Subject: [Bioperl-l] How to use proxy in Bioperl?
Message-ID: <BEE28BF86078B6429D6C780635718E219052B3@morelia.be.devgen.com>

You can also set it in your script:
$gb=Bio::DB::GenBank->new();
$gb->proxy([ftp, http], 'http://<your.proxy.url>');

Think this works also:
$gb->proxy('http://<your.proxy.url>');

HTH,
Marc

> -----Original Message-----
> From: Marc Logghe 
> Sent: Tuesday, March 22, 2005 11:39 AM
> To: 'zhoujie@fudan.edu.cn'; bioperl-l@bioperl.org
> Subject: RE: [Bioperl-l] How to use proxy in Bioperl?
> 
> Hi JZ,
> In the back LWP::Simple is doing the request for you. By 
> default it reads the proxy from your environment.
> Guess setting the HTTP_PROXY env var should solve your problem.
> HTH,
> Marc
> 
> > -----Original Message-----
> > From: bioperl-l-bounces@portal.open-bio.org
> > [mailto:bioperl-l-bounces@portal.open-bio.org] On Behalf Of 
> > zhoujie@fudan.edu.cn
> > Sent: Tuesday, March 22, 2005 10:52 AM
> > To: bioperl-l@bioperl.org
> > Subject: [Bioperl-l] How to use proxy in Bioperl?
> > 
> > Hi all,
> > 
> > I'm new to bioperl and here is my question: How to use proxy in 
> > bioperl? For example I'm using get_Seq_by_acc() method, how 
> can I get 
> > the sequence when I can only access NCBI via proxy?
> > 
> > Thanks very much.
> > 
> > J Z
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 

From mlemieux at bioinfo.ca  Tue Mar 22 12:26:35 2005
From: mlemieux at bioinfo.ca (Madeleine Lemieux)
Date: Tue Mar 22 17:53:22 2005
Subject: [Bioperl-l] Easy switching from wwwBlast to QBlast
In-Reply-To: <75616934-3FED-11D9-B611-000393C44276@duke.edu>
References: <5B03F9DE-3DFA-11D9-AF99-000A95B139D2@bioinfo.ca>
	<75616934-3FED-11D9-B611-000393C44276@duke.edu>
Message-ID: <0097862bb8ee879eaac2ecc69bc1b668@bioinfo.ca>

Jason,

I've used RemoteBlast.pm (1.5 version) as a template for a new module 
I'm calling LocalServerBlast.pm which lets users submit jobs to a 
wwwBlast server. I've also added a subroutine, wwwBlast_sequence, to 
Perl.pm that mimics blast_sequence. I've added support for both 
procedures to pass CGI parameters through to submit_blast. So one can 
do:
	# for QBlast
	use Bio::Perl;
	my $blast_report = blast_sequence($seq, {'-expect' => 1e-6, 
DESCRIPTIONS => 25});
and
	# for wwwBlast
	use Bio::Perl;
	my $blast_report = wwwBlast_sequence($seq, {'-expect' => 1e-6, 
DESCRIPTIONS => 25});

Only the procedure name has to be changed to switch between wwwBlast 
and QBlast. In fact, if a hash of parameters gets set up that includes 
both QBlast and wwwBlast options mixed in together, it doesn't matter 
since each server only looks at the parameters it recognizes and 
ignores all others; as long as the values of the parameters it does 
recognize are set correctly it just works. The only kludge needed for 
this is for ALIGNMENT_VIEW where QBlast expects a string but wwwBlast 
uses numbers to specify the view option. And in both cases, requesting 
a tabular view will cause the blast result parser to fail. I haven't 
bothered catching that particular error since it's the same behaviour 
in both cases.

The interest for me was to use this for prototyping software that will 
eventually hit the NCBI Blast server but without clogging up the NCBI 
queue or wasting my internet connect time while I'm developing. I can 
also imagine it being useful in a center generating its own sequence 
databases and already using wwwBlast.

Since the wwwBlast server doesn't support queues, there's no concept of 
RID and so no need for retrieve_blast in LocalServerBlast; instead, 
submit_blast returns an array of Bio::Tools::BPlite or 
Bio::Tools::Blast objects.

I've also made a slight change to how RemoteBlast.pm checks the return 
status of blast jobs. The HTML returned from the NCBI server contains a 
status line near the top of the file so I just read far enough in the 
response file to pull that information out and then use that, rather 
than the filesize to decide if the job is ready, waiting, or failed.

I've attached the patch files for Perl.pm and RemoteBlast.pm (cvs diff 
-aur against both 1.4 and 1.5) as well as the LocalServerBlast.pm file. 
I'm not sure what the protocol is for "cared for", "copyright" and 
"author" notices is. I've mostly just modified your and Ewan Birney's 
stuff. I'd be happy to care for these modules.

I haven't written any code for the test suite yet but I'll start 
working on that soon. Also, upon further reflection, I decide not to 
incorporate the support for accession# and gi to blast_sequence. If 
anyone wants that, I can put it back in but for a first pass I didn't 
want to change Perl.pm too much.

I've tested these modules with wwwBlast 2.2.9 and 2.2.10 under MacOS X.

All the best,
Madeleine

> Dear Madeleine -
>
> Great.  Would love for someone to be a maintainer and keeper of this 
> module. All your changes sound great.  I think a new function in 
> Bio::Perl would be the best way to allow providing of a new 
> localserver.  Note that Bio::Perl is supposed to really just be a 
> convenience of just having a list of functions for new users - so 
> there is room for new *well named* functions to be added there.
>
> As for applying the changes - you can submit a patch of differences 
> for your new code versus the current CVS HEAD by making changes and 
> then running "cvs diff -aur " to get the changes in a patch format.  
> You'll want to checkout the code via CVS first - 
> http://cvs.open-bio.org/.  We have to give you an authorized account 
> to be able to apply changes back to the repository though.  Once 
> you've submitted a few fixes to show you understand the toolkit and 
> the coding practices we can see about getting you that account.
>
> -jason
> On Nov 24, 2004, at 4:22 AM, Madeleine Lemieux wrote:
>
>> I've just recently started exploring BioPerl (v.1.4). So far it's 
>> been fun if a little daunting.
>>
>> As an exercise, I decided to try change the blast_sequence subroutine 
>> in Perl.pm so that it would let me send the query to either my local 
>> wwwBlast server or out over my slow, flakey internet connection to 
>> the QBlast server. I did this by adding a parameter LOCALSERVER 
>> which, if set to a URL, redirects the query to that server (e.g. 
>> LOCALSERVER => http://localhost/blast/blast.cgi); otherwise, it 
>> defaults to the server at the NCBI.
>>
>> I've also added support for query by accession or gi # (QBlast only 
>> since wwwBlast doesn't support such queries), submission of multiple 
>> sequences (either in a file or string or string variable), as well as 
>> passing any of the QBlast Put and Get options as parameters. Unlike 
>> the original one, my blast_sequence returns an array of results, not 
>> a single result, so that code calling my version of blast_sequence in 
>> a scalar context would incorrectly get the size of the array.
>>
>> Apart from Perl.pm, the only other file that I had to change was 
>> Bio/Tools/Run/RemoteBlast.pm. I just downloaded the latest release 
>> candidate, 1.5.RC1, and noticed that RemoteBlast.pm has been changed 
>> in ways that overlap with the changes I've made while maintaining 
>> backwards compatibility which my version does not since I was only 
>> working for myself at the time.
>>
>> So my question is: is anyone interested in getting the code I've 
>> developed? If so, a corollary question is: how do I go about 
>> contributing the code? I can pretty easily forward port my changes to 
>> RemoteBlast.pm to the 1.5.RC1 version in order to use the nice 
>> "validate by regexp" trick introduced there and to provide backwards 
>> compatibility. I'm not sure what to do about the Perl.pm module, 
>> though. I guess that the easiest would be to change the name of my 
>> blast_sequence subroutine and add it to Perl.pm since there is no 
>> object interface being altered.
>>
>> As I was working on this, I noticed that the HTML stripping that gets 
>> done on the response from the QBlast server fails on wwwBlast output 
>> since the format of the HTML is a little different (manifests as a 
>> "can't find mid-line data" error when processing the alignments). So 
>> I wrote a generic stripper which removes all HTML tags except those 
>> that contain an end-of-line within the tag itself or an internal, 
>> un-escaped closing angle bracket (>) which wouldn't be valid HTML 
>> anyway, I think. It doesn't touch single angle brackets (>) such as 
>> those found at the beginning of descriptions (>gi ...).
>> 	# html stripper
>> 	# remove simple and closing tags first and then leftover tags
>> 	$str =~ s/<(\/)?\w+>//g;
>> 	$str =~ s/<\D+([^>]*\n*)*>//g;
>>
>> Also, when retrieving RIDs in RemoteBlast.pm (retrieve_rid), the test 
>> for completion relies on the size of the file containing the reply. 
>> This has failed at least once for me. Since there is a status line 
>> near the top of the file in the response, it seems to me that 
>> something along the lines of the following might be more robust:
>> 	# read file until QBlastInfoEnd to pull out status
>> 	my $status = '';
>> 	my $junk = '';
>> 	open(TMP, $tempfile) or $self->throw("cannot open $tempfile");
>>      while( defined (my $line = <TMP>) ) {
>>          last if ($line =~ /QBlastInfoEnd/);
>>          ($junk, $status) = (split /=/, $line) if ($line =~ 
>> /waiting|ready/i);
>>      }
>>      close TMP;
>>
>>      if( $response->is_success ) {
>> 		if ( $status =~ /waiting/i ) {
>>              return 0;
>>           } elsif ( $status =~ /ready/i ) {
>> 		    ...
>> 	     } else { # failed
>> 		    ...
>> 		}
>> 	} ...
>>
>> Finally, let me end by thanking all the BioPerl contributors for 
>> their fine work.
>>
>> Regards,
>> Madeleine
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: RemoteBlast.pm.diff-1.4
Type: application/octet-stream
Size: 22148 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/RemoteBlast.pm.diff-1-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RemoteBlast.pm.diff-1.5
Type: application/octet-stream
Size: 22150 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/RemoteBlast.pm.diff-1-0003.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Perl.pm.diff-1.4
Type: application/octet-stream
Size: 21885 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/Perl.pm.diff-1-0002.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Perl.pm.diff-1.5
Type: application/octet-stream
Size: 19626 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/Perl.pm.diff-1-0003.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LocalServerBlast.pm
Type: application/octet-stream
Size: 16943 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050322/11e8734b/LocalServerBlast-0001.obj
-------------- next part --------------


From sebastien.moretti at igs.cnrs-mrs.fr  Thu Mar 24 06:05:27 2005
From: sebastien.moretti at igs.cnrs-mrs.fr (Sebastien Moretti)
Date: Thu Mar 24 05:59:52 2005
Subject: [Bioperl-l] [How to add features in genbank flat file]
In-Reply-To: <200502151525.38790.moretti@igs.cnrs-mrs.fr>
References: <200502151525.38790.moretti@igs.cnrs-mrs.fr>
Message-ID: <42429EF7.4050504@igs.cnrs-mrs.fr>

Hello,
No one seems to have a solution to this problem I posted a month ago.

So, I changed my mind and use 'wget' to get the GenBank sequences.
I get the full GenBank entry, with most of features.
And I can avoid another bug: COMMENT lines are not well formated with 
the BioPerl script I used (not as COMMENT lines are on NCBI), and blank 
lines are removed.


	#!/usr/bin/perl -w
	
	use strict;
	use diagnostics;
	use File::Cat;
	
	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is 
missing.\n\tTry something like: $0 NM_178432\n\n";
	
	`wget -O output_file.tmp 
"http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32" 
2>/dev/null`;
	
	cat ("output_file.tmp", \*STDOUT);
	unlink("output_file.tmp");
	
	# wget -O output_file 
'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send&sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32'
	
	exit;


Sorry, I don't use BioPerl to Query GenBank (but for other applications) 
but BioPerl 1.5 has not corrected the COMMENT bug and the missing features.

> Hello,
> I saw that Genbank web site have changed:
> Now, features like 'SNPs' are no more included in the EST flat files.
> At the NCBI web site, we must click on 'features: SNP' to add them in our flat 
> file.
> 
> With BioPerl, 1.4 or 1.5, it's the same, the variation features are no more 
> included in the EST flat files that I upload.
> 
> Here is the script I use:
> 	#!/usr/bin/perl -w
> 	
> 	use strict;
> 	use Bio::DB::GenBank;
> 	use Bio::DB::Query::GenBank;
> 	use Bio::SeqIO;
> 	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is missing.
> \n\tTry something like: $0 NM_178432\n\n";
> 	
> 	$acc=$acc."[Accession]";
> 	
> 	my $query_string = "$acc";
> 	my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
> 	                                                 -query=>$query_string);
> 	
> 	my $gb = new Bio::DB::GenBank;
> 	my $stream = $gb->get_Stream_by_query($query);
> 	
> 	my $out=Bio::SeqIO->new(-format=>'genbank');
> 	my $seq = $stream->next_seq();
> 	
> 	my $result=$out->write_seq($seq);
> 	$result =~ s/^1.*$//;
> 	#print $out->write_seq($seq);
> 	print $result;
> 	
> 	exit;
> 
> How can I add most of features to my nucleotide flat files ?
> 
> Thanks

-- 
S?bastien Moretti
http://igs.cnrs-mrs.fr/
CNRS - IGS
31 chemin Joseph Aiguier
13402 Marseille cedex
From cerdman2 at du.edu  Thu Mar 24 12:54:50 2005
From: cerdman2 at du.edu (Colin Erdman)
Date: Thu Mar 24 12:49:26 2005
Subject: [Bioperl-l] Assistance with a BioPerl/Perl project
Message-ID: <0IDV00EJ0B3JVD@smtpout.cair.du.edu>

Hello list,

 
I am a 22 year old bioinformatics and molecular biology major at the
University of Denver. I just accepted a position with a researcher here, and
already have a first assignment. We are working on a comprehensive
chromosome 21 gene database and map and my first task is to update a list of
known (and curated) Human chromosome 21 genes. I have become rapidly
familiar with BioPerl however my adviser needs me to use Entrez Gene to
compare the currently known Chr 21 genes (from query: '21[CHR] AND Homo
sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has
provided in xls and xml format. 

The idea is to take the accession numbers in the provided files, pull the
nucleotide sequence from them, and run those against the sequences for
records found with the Entrez Gene query in order to find any newly
annotated/(discovered/elucidated?) genes for that sequence. I am familiar
with the current problem of BioPerl not directly being able to parse the
EntrezGene object, but have played with the Bio::SeqIO::Gene2accession (&
geneinfo) and the egparser. My programming skills are not completely up to
par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession is
more intuitive, however I am having a terrible time figuring out how to
convert my desired entrezgene results into the legacy gene_info and
gene2accession formats? Any suggestions are greatly appreciated, I am very
new at this, so very simple coding examples and explanations help and are
the best way for me to learn.

 
Thanks all!

colin

From sdavis2 at mail.nih.gov  Thu Mar 24 13:49:40 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu Mar 24 13:44:01 2005
Subject: [Bioperl-l] Assistance with a BioPerl/Perl project
In-Reply-To: <0IDV00EJ0B3JVD@smtpout.cair.du.edu>
References: <0IDV00EJ0B3JVD@smtpout.cair.du.edu>
Message-ID: <0d3beed247327883b45b2e29ca07a864@mail.nih.gov>

If you are starting with Genbank Accession numbers and want to get to 
Entrez Gene, the "standard" way to do that is to use Unigene.  If you 
go to the Entrez website and choose the Unigene database, you can type 
in your accession and you will be taken to a unigene record.  If you 
click on the "links" section, you can then link to Entrez Gene.

To do this in batch mode, I download Hs.data.gz from NCBI at:

ftp://ftp.ncbi.nih.gov/repository/UniGene/

Then, you can use Bio::ClusterIO to parse Unigene.  Grab the 
accession_number part of each sequence (there is an example of doing 
this in the POD documentation).  You can then make a hash like:

push(@{$acc_hash{$acc}},$in->unigene_id};

which maps accessions to unigene ids.

Make a second hash that maps unigene to gene using the file:

ftp://ftp.ncbi.nih.gov/gene/DATA/gene2unigene

which will map the unigene ids to gene.

Then, you have the information you need to map from accession to gene 
via unigene.

Just a note on Entrez Gene:  the Gene does not represent a sequence, 
but instead a set of sequences.  The sequences are Refseq sequences.  
So, you wouldn't be blasting against "Gene" per say, but against the 
one or several Refseq sequences (if there are any) that represent the 
Gene.

Hope this helps.  Standard disclaimer:  as with perl AND 
bioinformatics, there is more than one way to do this.  And keep in 
mind that Entrez Gene is only one source of annotation; for chromosome 
21, there may be other sites that have more information, specifically 
Ensembl.

Sean


On Mar 24, 2005, at 12:54 PM, Colin Erdman wrote:

> Hello list,
>
>
>
> I am a 22 year old bioinformatics and molecular biology major at the
> University of Denver. I just accepted a position with a researcher 
> here, and
> already have a first assignment. We are working on a comprehensive
> chromosome 21 gene database and map and my first task is to update a 
> list of
> known (and curated) Human chromosome 21 genes. I have become rapidly
> familiar with BioPerl however my adviser needs me to use Entrez Gene to
> compare the currently known Chr 21 genes (from query: '21[CHR] AND Homo
> sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has
> provided in xls and xml format.
>
> The idea is to take the accession numbers in the provided files, pull 
> the
> nucleotide sequence from them, and run those against the sequences for
> records found with the Entrez Gene query in order to find any newly
> annotated/(discovered/elucidated?) genes for that sequence. I am 
> familiar
> with the current problem of BioPerl not directly being able to parse 
> the
> EntrezGene object, but have played with the Bio::SeqIO::Gene2accession 
> (&
> geneinfo) and the egparser. My programming skills are not completely 
> up to
> par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession 
> is
> more intuitive, however I am having a terrible time figuring out how to
> convert my desired entrezgene results into the legacy gene_info and
> gene2accession formats? Any suggestions are greatly appreciated, I am 
> very
> new at this, so very simple coding examples and explanations help and 
> are
> the best way for me to learn.
>
>
>
> Thanks all!
>
> colin
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From Kary at ioc.fiocruz.br  Thu Mar 24 15:24:11 2005
From: Kary at ioc.fiocruz.br (Kary Ann Del Carmen Soriano Ocana)
Date: Thu Mar 24 15:23:52 2005
Subject: [Bioperl-l] Help with hmmpfam
Message-ID: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br>

Dear All,

I am new to bioperl and would like (if possible) to obtain some help with the SearcIO module and hmmpfam. I am listing my code below and the output containing the following error:

(partial) output and error:

[kary@vivax inserir_dados]$ perl bioperl_pfam_23_03_05.pl
Passou a definicao do arquivo query
passou abrir o arquivo mmm.hmm

sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/usr/local/bin/hmmpfam  -E 0.0001      1       HMMER2.0  [2.3.2]\nNAME  76GJYz8zFm\nLENG  327\nALPH

I put some "print" commands everywhere to see where I am getting the error and looks like it is not entering/printing the while results (eg: next_result, next_hit). Any help would be greatly appreciated.

Thanks, Kary

************

Script:

#!/usr/bin/perl -w

use lib "/usr/local/bioperl14";
use lib "/usr/local/bioperl-run-1.4";

use Bio::Search::Result::HMMERResult;
use Bio::Tools::Run::Hmmer;
use Bio::Tools::Run::Hmmpfam;
use strict;

my $query;
my $db;
my $seq;
my $dbfile;
my @array;

$query = "sequencia_fasta_4_arg.txt";

print "Passou a definicao do arquivo query\n";

open (READ, "$query") or die "Cannot open $query: $!";
while (my $sequence = <READ>){
	 for ($sequence) {
		&hmmpfam($sequence);
	#print $seq;
	}	
}
close (READ);

print "Passou leitura do arquivo query\n";
#############################################################################################################################################
sub hmmpfam {
	my ($seq) = @_;
	$db = "mmm.hmm";
	open (DH, "$db") or die "Cannot open $db: $!";

	print "passou abrir o arquivo mmm.hmm\n\n";

	while ($dbfile = <DH>){
	
		#Build a Hmmpfam factory
		my @params = ('DB'=>$dbfile,'E'=>0.0001);


		my $factory = Bio::Tools::Run::Hmmpfam->new(@params);


		# Pass the factory a Bio::Seq object or a file name
		# returns a Bio::SearchIO object
		my $search = $factory->run($seq);
		print "Search: $search\n";

		print "Passou search com parametros \n";


		my @feat;

			my $searchio = new Bio::SearchIO(-format => 'hmmer',
                                 -file   => 'result.hmmer') or die print "Error for open the file";

			while (my $result = $searchio->next_result){
				print "come?a o while do NEXT RESULT\n\n";
			while(my $hit = $result->next_hit){
				print "come?a o while do HIT - NEXT HIT\n\n";
			while (my $hsp = my $hit->next_hsp){
				print join("\t", ( my$r->query_name,
						$hsp->query->start,
						$hsp->query->end,
						$hit->name,
						$hsp->hit->start,
						$hsp->hit->end,
						$hsp->score,
						$hsp->evalue,
						$hsp->seq_str,
						)), "\n";
				print "terminou o while dos HSPs\n\n";
				
							 }
							}
									}

}


close (DH);
}

From cerdman2 at du.edu  Thu Mar 24 16:46:36 2005
From: cerdman2 at du.edu (Colin Erdman)
Date: Thu Mar 24 17:42:03 2005
Subject: [Bioperl-l] Assistance with a BioPerl/Perl project
In-Reply-To: <0d3beed247327883b45b2e29ca07a864@mail.nih.gov>
Message-ID: <0IDV00IILLTX7V@smtpout.cair.du.edu>

So in effect, this is just as good as taking the actual nucleotide sequences
(derived using a GenBank lookup) from my static accession number list and
running them through the 'member sequences' of my genes (clusters) of
interest in order to see if any new gene products or information have been
added for that sequence? And where would you suspect that BLASTN will then
fit into the scheme. I apologize for the redundancy, there is just so much
to take in!

Thanks,
Colin

-----Original Message-----
From: Sean Davis [mailto:sdavis2@mail.nih.gov] 
Sent: Thursday, March 24, 2005 11:50 AM
To: Colin Erdman
Cc: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] Assistance with a BioPerl/Perl project

If you are starting with Genbank Accession numbers and want to get to 
Entrez Gene, the "standard" way to do that is to use Unigene.  If you 
go to the Entrez website and choose the Unigene database, you can type 
in your accession and you will be taken to a unigene record.  If you 
click on the "links" section, you can then link to Entrez Gene.

To do this in batch mode, I download Hs.data.gz from NCBI at:

ftp://ftp.ncbi.nih.gov/repository/UniGene/

Then, you can use Bio::ClusterIO to parse Unigene.  Grab the 
accession_number part of each sequence (there is an example of doing 
this in the POD documentation).  You can then make a hash like:

push(@{$acc_hash{$acc}},$in->unigene_id};

which maps accessions to unigene ids.

Make a second hash that maps unigene to gene using the file:

ftp://ftp.ncbi.nih.gov/gene/DATA/gene2unigene

which will map the unigene ids to gene.

Then, you have the information you need to map from accession to gene 
via unigene.

Just a note on Entrez Gene:  the Gene does not represent a sequence, 
but instead a set of sequences.  The sequences are Refseq sequences.  
So, you wouldn't be blasting against "Gene" per say, but against the 
one or several Refseq sequences (if there are any) that represent the 
Gene.

Hope this helps.  Standard disclaimer:  as with perl AND 
bioinformatics, there is more than one way to do this.  And keep in 
mind that Entrez Gene is only one source of annotation; for chromosome 
21, there may be other sites that have more information, specifically 
Ensembl.

Sean


On Mar 24, 2005, at 12:54 PM, Colin Erdman wrote:

> Hello list,
>
>
>
> I am a 22 year old bioinformatics and molecular biology major at the
> University of Denver. I just accepted a position with a researcher 
> here, and
> already have a first assignment. We are working on a comprehensive
> chromosome 21 gene database and map and my first task is to update a 
> list of
> known (and curated) Human chromosome 21 genes. I have become rapidly
> familiar with BioPerl however my adviser needs me to use Entrez Gene to
> compare the currently known Chr 21 genes (from query: '21[CHR] AND Homo
> sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has
> provided in xls and xml format.
>
> The idea is to take the accession numbers in the provided files, pull 
> the
> nucleotide sequence from them, and run those against the sequences for
> records found with the Entrez Gene query in order to find any newly
> annotated/(discovered/elucidated?) genes for that sequence. I am 
> familiar
> with the current problem of BioPerl not directly being able to parse 
> the
> EntrezGene object, but have played with the Bio::SeqIO::Gene2accession 
> (&
> geneinfo) and the egparser. My programming skills are not completely 
> up to
> par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession 
> is
> more intuitive, however I am having a terrible time figuring out how to
> convert my desired entrezgene results into the legacy gene_info and
> gene2accession formats? Any suggestions are greatly appreciated, I am 
> very
> new at this, so very simple coding examples and explanations help and 
> are
> the best way for me to learn.
>
>
>
> Thanks all!
>
> colin
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Thu Mar 24 18:10:08 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu Mar 24 18:04:34 2005
Subject: [Bioperl-l] Assistance with a BioPerl/Perl project
In-Reply-To: <0IDV00IILLTX7V@smtpout.cair.du.edu>
References: <0IDV00IILLTX7V@smtpout.cair.du.edu>
Message-ID: <536997326a76511d9638b5340225f03e@mail.nih.gov>

If I understood you correctly, you are starting with a list of genbank 
accession numbers?  If you start with, for example, CR407631:

Go to:

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=&DB=unigene

and type in that accession.

You will see the resulting Unigene entry and after one click to get 
details you will be at this page:

http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=2

There is a small "links" link just under the search bar.  Normally, you 
can link from there to Gene (but it appears to be broken at the 
moment).  In any case, with the file from below, you can look up a 
unigene id and get the Entrez Gene (if there is one) entry.  The nice 
thing about using Unigene is that there is no blasting involved at all. 
  What you end up with is an Entrez Gene (and bonus Unigene id) 
associated with your accession (most of the time, but some will not be 
in Unigene for various reasons).  You can then mine Gene for whatever 
information you want to assign to the accessions.  For that, you will 
need either a gene parser (from sourceforge) or just use the 
tab-delimited text files from the Gene/DATA ftp site noted in my 
previous email to get the information you want.


--------------------------------
Now, if you really want the easy way to do the above, go to:

http://genome-www5.stanford.edu/cgi-bin/source/sourceBatchSearch

Here, just paste in your accessions and get whatever information back 
you want--very nice site for this.  (They still call it LocusLink ID, 
but that is a Gene ID as well).

Hope this helps.
Sean


On Mar 24, 2005, at 4:46 PM, Colin Erdman wrote:

> So in effect, this is just as good as taking the actual nucleotide 
> sequences
> (derived using a GenBank lookup) from my static accession number list 
> and
> running them through the 'member sequences' of my genes (clusters) of
> interest in order to see if any new gene products or information have 
> been
> added for that sequence? And where would you suspect that BLASTN will 
> then
> fit into the scheme. I apologize for the redundancy, there is just so 
> much
> to take in!
>
> Thanks,
> Colin
>
> -----Original Message-----
> From: Sean Davis [mailto:sdavis2@mail.nih.gov]
> Sent: Thursday, March 24, 2005 11:50 AM
> To: Colin Erdman
> Cc: bioperl-l@portal.open-bio.org
> Subject: Re: [Bioperl-l] Assistance with a BioPerl/Perl project
>
> If you are starting with Genbank Accession numbers and want to get to
> Entrez Gene, the "standard" way to do that is to use Unigene.  If you
> go to the Entrez website and choose the Unigene database, you can type
> in your accession and you will be taken to a unigene record.  If you
> click on the "links" section, you can then link to Entrez Gene.
>
> To do this in batch mode, I download Hs.data.gz from NCBI at:
>
> ftp://ftp.ncbi.nih.gov/repository/UniGene/
>
> Then, you can use Bio::ClusterIO to parse Unigene.  Grab the
> accession_number part of each sequence (there is an example of doing
> this in the POD documentation).  You can then make a hash like:
>
> push(@{$acc_hash{$acc}},$in->unigene_id};
>
> which maps accessions to unigene ids.
>
> Make a second hash that maps unigene to gene using the file:
>
> ftp://ftp.ncbi.nih.gov/gene/DATA/gene2unigene
>
> which will map the unigene ids to gene.
>
> Then, you have the information you need to map from accession to gene
> via unigene.
>
> Just a note on Entrez Gene:  the Gene does not represent a sequence,
> but instead a set of sequences.  The sequences are Refseq sequences.
> So, you wouldn't be blasting against "Gene" per say, but against the
> one or several Refseq sequences (if there are any) that represent the
> Gene.
>
> Hope this helps.  Standard disclaimer:  as with perl AND
> bioinformatics, there is more than one way to do this.  And keep in
> mind that Entrez Gene is only one source of annotation; for chromosome
> 21, there may be other sites that have more information, specifically
> Ensembl.
>
> Sean
>
>
> On Mar 24, 2005, at 12:54 PM, Colin Erdman wrote:
>
>> Hello list,
>>
>>
>>
>> I am a 22 year old bioinformatics and molecular biology major at the
>> University of Denver. I just accepted a position with a researcher
>> here, and
>> already have a first assignment. We are working on a comprehensive
>> chromosome 21 gene database and map and my first task is to update a
>> list of
>> known (and curated) Human chromosome 21 genes. I have become rapidly
>> familiar with BioPerl however my adviser needs me to use Entrez Gene 
>> to
>> compare the currently known Chr 21 genes (from query: '21[CHR] AND 
>> Homo
>> sapiens[ORGN] AND NOT Pseudogene' ) with a list of genes that she has
>> provided in xls and xml format.
>>
>> The idea is to take the accession numbers in the provided files, pull
>> the
>> nucleotide sequence from them, and run those against the sequences for
>> records found with the Entrez Gene query in order to find any newly
>> annotated/(discovered/elucidated?) genes for that sequence. I am
>> familiar
>> with the current problem of BioPerl not directly being able to parse
>> the
>> EntrezGene object, but have played with the Bio::SeqIO::Gene2accession
>> (&
>> geneinfo) and the egparser. My programming skills are not completely
>> up to
>> par, so egparser is tough for me to grasp. Bio::SeqIO::Gene2accession
>> is
>> more intuitive, however I am having a terrible time figuring out how 
>> to
>> convert my desired entrezgene results into the legacy gene_info and
>> gene2accession formats? Any suggestions are greatly appreciated, I am
>> very
>> new at this, so very simple coding examples and explanations help and
>> are
>> the best way for me to learn.
>>
>>
>>
>> Thanks all!
>>
>> colin
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>

From jason.stajich at duke.edu  Thu Mar 24 20:32:54 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 24 20:27:35 2005
Subject: [Bioperl-l] Help with hmmpfam
In-Reply-To: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br>
References: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br>
Message-ID: <c7e8368fcfccbfe63713b19ced39a1e4@duke.edu>

We would really need your hmmpfam output to diagnose the problem  
(result.hmmer)

-jason

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 24, 2005, at 12:24 PM, Kary Ann Del Carmen Soriano Ocana wrote:

> Dear All,
>
> I am new to bioperl and would like (if possible) to obtain some help  
> with the SearcIO module and hmmpfam. I am listing my code below and  
> the output containing the following error:
>
> (partial) output and error:
>
> [kary@vivax inserir_dados]$ perl bioperl_pfam_23_03_05.pl
> Passou a definicao do arquivo query
> passou abrir o arquivo mmm.hmm
>
> sh: -c: line 0: syntax error near unexpected token `('
> sh: -c: line 0: `/usr/local/bin/hmmpfam  -E 0.0001      1        
> HMMER2.0  [2.3.2]\nNAME  76GJYz8zFm\nLENG  327\nALPH
>
> I put some "print" commands everywhere to see where I am getting the  
> error and looks like it is not entering/printing the while results  
> (eg: next_result, next_hit). Any help would be greatly appreciated.
>
> Thanks, Kary
>
> ************
>
> Script:
>
> #!/usr/bin/perl -w
>
> use lib "/usr/local/bioperl14";
> use lib "/usr/local/bioperl-run-1.4";
>
> use Bio::Search::Result::HMMERResult;
> use Bio::Tools::Run::Hmmer;
> use Bio::Tools::Run::Hmmpfam;
> use strict;
>
> my $query;
> my $db;
> my $seq;
> my $dbfile;
> my @array;
>
> $query = "sequencia_fasta_4_arg.txt";
>
> print "Passou a definicao do arquivo query\n";
>
> open (READ, "$query") or die "Cannot open $query: $!";
> while (my $sequence = <READ>){
> 	 for ($sequence) {
> 		&hmmpfam($sequence);
> 	#print $seq;
> 	}	
> }
> close (READ);
>
> print "Passou leitura do arquivo query\n";
> ####################################################################### 
> ######################################################################
> sub hmmpfam {
> 	my ($seq) = @_;
> 	$db = "mmm.hmm";
> 	open (DH, "$db") or die "Cannot open $db: $!";
>
> 	print "passou abrir o arquivo mmm.hmm\n\n";
>
> 	while ($dbfile = <DH>){
> 	
> 		#Build a Hmmpfam factory
> 		my @params = ('DB'=>$dbfile,'E'=>0.0001);
>
>
> 		my $factory = Bio::Tools::Run::Hmmpfam->new(@params);
>
>
> 		# Pass the factory a Bio::Seq object or a file name
> 		# returns a Bio::SearchIO object
> 		my $search = $factory->run($seq);
> 		print "Search: $search\n";
>
> 		print "Passou search com parametros \n";
>
>
> 		my @feat;
>
> 			my $searchio = new Bio::SearchIO(-format => 'hmmer',
>                                  -file   => 'result.hmmer') or die  
> print "Error for open the file";
>
> 			while (my $result = $searchio->next_result){
> 				print "come?a o while do NEXT RESULT\n\n";
> 			while(my $hit = $result->next_hit){
> 				print "come?a o while do HIT - NEXT HIT\n\n";
> 			while (my $hsp = my $hit->next_hsp){
> 				print join("\t", ( my$r->query_name,
> 						$hsp->query->start,
> 						$hsp->query->end,
> 						$hit->name,
> 						$hsp->hit->start,
> 						$hsp->hit->end,
> 						$hsp->score,
> 						$hsp->evalue,
> 						$hsp->seq_str,
> 						)), "\n";
> 				print "terminou o while dos HSPs\n\n";
> 				
> 							 }
> 							}
> 									}
>
> }
>
>
> close (DH);
> }
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


From Mark.Hoebeke at jouy.inra.fr  Thu Mar 24 10:59:22 2005
From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke)
Date: Thu Mar 24 20:30:34 2005
Subject: [Bioperl-l] Hierarchical location parsing
Message-ID: <1111679962.18235.8.camel@homer>

Hi,

confronted with a bug related to hierarchical location parsing[1], I
checked the source code of Bio::Factory::FTLocationFactory.pm (both in
1.5 and bioperl-live). The comments around the code clearly state that
hierarchical locations are not supported.

Is this shortcoming due to performance concerns, or just because it
seems tedious to code ;D ?


Mark

[1] Example of hierarchical location description :
	join(1000,join(2000,join(3000,4000)))


-- 
--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050324/738037c8/attachment.bin
From sdavis2 at mail.nih.gov  Thu Mar 24 20:42:34 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu Mar 24 20:36:42 2005
Subject: [Bioperl-l] Fw: [Gene-announce] Announcing gene2xml to facilitate
	conversion of Entrez Gene ASN.1 to XML and more...
Message-ID: <000b01c530db$eadf7010$1f6df345@WATSON>

Forwarded from a colleague:

----- Original Message ----- 
Sent: Thursday, March 24, 2005 6:33 PM
Subject: Fwd: [Gene-announce] Announcing gene2xml to facilitate conversion 
of Entrez Gene ASN.1 to XML and more...


>>Sender: gene-announce-bounces@ncbi.nlm.nih.gov
>>
>>Contents:
>>   1. new directory on the ftp site
>>   2. release of gene2xml to convert the files in the new directory to XML
>>   3. modifications in Entrez Gene displays
>>   4. modifications in Entrez Gene content
>>   5. Gene chapter in the NCBI handbook
>>
>>1. the new ASN_BINARY subdirectory
>>
>>   We would like to announce that Entrez Gene has added a new subdirectory 
>> to
>>its ftp site, namely /DATA/ASN_BINARY. The subdirectories and files in
>>ASN_BINARY have the same scope as the files in the /DATA/ASN subdirectory,
>>namely comprensive extractions of Entrez Gene records.  The difference in
>>the
>>directories is that the format of files in ASN_BINARY is binary, and the
>>organization of the records is as an Entrezgene set.  The files in the ASN
>>directory are ASN.1 text and the records are concatenated.
>>
>>
>>2. gene2xml
>>
>>   The ASN_BINARY format is being introduced in conjunction with the tool
>>gene2xml, which readily converts the binary ASN.1 to XML.
>>The tool is available from
>>
>>        ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/cmdline/
>>
>>for multiple platforms.
>>
>>   gene2xml.Darwin-7.8.0-Power_Macintosh.gz
>>   gene2xml.Linux-2.4.23-P3-4G-i686.gz
>>   gene2xml.OSF1-V5.1-alpha.gz
>>   gene2xml.SunOS-5.8-sun4u.gz
>>   gene2xml.win32.exe.gz
>>
>>
>>The documentation for this program is provided in this README file:
>>
>>        ftp://ftp.ncbi.nlm.nih.gov/gene/tools/README.
>>
>>   We would like to draw your attention to some of the functions of 
>> gene2xml.
>>If you are interested in Gene records for only one species or strain, and
>>records for that species or strain have not already been provided in a
>>separate file, there is an option (-t) that extracts records based on the
>>NCBI
>>Taxonomy identifier for that species or strain.
>>
>>  There is also an option (-x) to convert the binary ASN.1 Entrezgene set
>>into
>>the concatenated ASN.1 text we have been providing.
>>
>>  Our plan is to provide both formats for an indeterminate period, but 
>> then
>>discontinue production of the files in the ASN directory, because that
>>format
>>can be (and is by us) be reproduced from the gene2xml tool.
>>
>>Please be reminded that the Entrezgene specification is here:
>>
>>http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/entre
>>zgene/entrezgene.asn
>>
>>and the DTD for Entrez Gene is here:
>>
>>         http://www.ncbi.nlm.nih.gov/dtd/
>>
>>
>>3. changes in Entrez Gene display
>>
>>    In the next few days, we will be adding limited context-specific help 
>> to
>>subdivisions of the Entrez Gene graphic (default) display.  This means 
>>that
>>question marks (?) will occur at the far right of the blue bar. These will
>>anchor links to the appropriate subsection of the Entrez Gene help 
>>document.
>>
>>4. changes in content
>>
>>    In the next few days, we will be adding a new subsection to the 
>> record,
>>'Alleles'. This section reports the general characteristics of alleles 
>>that
>>have been described for a gene, and provides links to more detailed
>>information. This function is being phased in gradually; the current set 
>>is
>>for mouse and is being developed from information supplied by Mouse Genome
>>Informatics.
>>
>>
>>5. NCBI Handbook chapter 19
>>
>>
>>The LocusLink chapter of the NCBI handbook
>>
>>http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=handbook
>>.TOC&depth=2
>>
>>has now been replaced with a chapter describing Entrez Gene.
>>
>>http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch19
>>
>>We hope this chapter helps answer your questions.  If not, you can email
>>your questions to info@ncbi.nlm.nih.gov or lodge your comments here:
>>
>>         http://www.ncbi.nlm.nih.gov/RefSeq/update.cgi
>>
>>_______________________________________________
>>Gene-announce mailing list
>>Gene-announce@ncbi.nlm.nih.gov
>>http://www.ncbi.nlm.nih.gov/mailman/listinfo/gene-announce
> 


From jason.stajich at duke.edu  Thu Mar 24 20:42:49 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 24 20:37:01 2005
Subject: [Bioperl-l] cigarline conversion
In-Reply-To: <BAF5C0B1B1FE5A41B28C71CC977422E9079617DA@UKAPPHRESMSX01>
References: <BAF5C0B1B1FE5A41B28C71CC977422E9079617DA@UKAPPHRESMSX01>
Message-ID: <13512c74d64cddf8afb07c4b6bf55b1f@duke.edu>

I think you'll have to write it or steal from Ensembl.

I assume it isn't so hard to do walking through the seqs in the 
alignment.  Propose the algorithim to convert it and maybe some of the 
willing volunteers who listen to the list and want to be contributing 
to bioinformatics will volunteer to code it.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 21, 2005, at 8:01 AM, Johnson, Nathan wrote:

> Hi bioperlers
>
> Does anyone know of a module which handles the conversion of multiple
> alignment cigar line format(multiple strings with M and D's but no I's)
> cigar line data to a pairwise format (one string with M,D and I's).
>
> SimpleAlign doesn't seem to do what I want :\
>
> Cheers
>
> Nath
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From jason.stajich at duke.edu  Thu Mar 24 20:51:28 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 24 20:45:52 2005
Subject: [Bioperl-l] [How to add features in genbank flat file]
In-Reply-To: <42429EF7.4050504@igs.cnrs-mrs.fr>
References: <200502151525.38790.moretti@igs.cnrs-mrs.fr>
	<42429EF7.4050504@igs.cnrs-mrs.fr>
Message-ID: <d198fea42e81b12e19a577292446ec02@duke.edu>

You seem annoyed that no one solved the problem for you - I hope that  
you realize that if you want a specific feature you can also modify the  
module yourself and provide a patch to the project.

As for the specifics of your problem perhaps if you highlight what the  
entrez key-value sets need to be set to in order to get the SNP data we  
can add it to the GenBank::Query as an option.

Removing the blank lines is part of the SeqIO parsing but I suppose a  
state variable could be added in genbank.pm to not skip them  when in  
the 'COMMENT' state if this is a critical feature for you.

If you are just downloading genbank files it looks like you have a good  
solution so I'm glad you were able to figure it out.

-jason

> Hello,
> No one seems to have a solution to this problem I posted a month ago.
>
> So, I changed my mind and use 'wget' to get the GenBank sequences.
> I get the full GenBank entry, with most of features.
> And I can avoid another bug: COMMENT lines are not well formated with  
> the BioPerl script I used (not as COMMENT lines are on NCBI), and  
> blank lines are removed.
>
>
> 	#!/usr/bin/perl -w
> 	
> 	use strict;
> 	use diagnostics;
> 	use File::Cat;
> 	
> 	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is  
> missing.\n\tTry something like: $0 NM_178432\n\n";
> 	
> 	`wget -O output_file.tmp  
> "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendt 
> o=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef 
> _HPRD=32" 2>/dev/null`;
> 	
> 	cat ("output_file.tmp", \*STDOUT);
> 	unlink("output_file.tmp");
> 	
> 	# wget -O output_file  
> 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send& 
> sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC= 
> 16&ef_HPRD=32'
> 	
> 	exit;
>
>
> Sorry, I don't use BioPerl to Query GenBank (but for other  
> applications) but BioPerl 1.5 has not corrected the COMMENT bug and  
> the missing features.
>
>> Hello,
>> I saw that Genbank web site have changed:
>> Now, features like 'SNPs' are no more included in the EST flat files.
>> At the NCBI web site, we must click on 'features: SNP' to add them in  
>> our flat file.
>> With BioPerl, 1.4 or 1.5, it's the same, the variation features are  
>> no more included in the EST flat files that I upload.
>> Here is the script I use:
>> 	#!/usr/bin/perl -w
>> 	
>> 	use strict;
>> 	use Bio::DB::GenBank;
>> 	use Bio::DB::Query::GenBank;
>> 	use Bio::SeqIO;
>> 	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is  
>> missing.
>> \n\tTry something like: $0 NM_178432\n\n";
>> 	
>> 	$acc=$acc."[Accession]";
>> 	
>> 	my $query_string = "$acc";
>> 	my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
>> 	                                                  
>> -query=>$query_string);
>> 	
>> 	my $gb = new Bio::DB::GenBank;
>> 	my $stream = $gb->get_Stream_by_query($query);
>> 	
>> 	my $out=Bio::SeqIO->new(-format=>'genbank');
>> 	my $seq = $stream->next_seq();
>> 	
>> 	my $result=$out->write_seq($seq);
>> 	$result =~ s/^1.*$//;
>> 	#print $out->write_seq($seq);
>> 	print $result;
>> 	
>> 	exit;
>> How can I add most of features to my nucleotide flat files ?
>> Thanks
>
> -- 
> S?bastien Moretti
> http://igs.cnrs-mrs.fr/
> CNRS - IGS
> 31 chemin Joseph Aiguier
> 13402 Marseille cedex
>


From jason.stajich at duke.edu  Thu Mar 24 20:52:52 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 24 20:47:00 2005
Subject: [Bioperl-l] Does BioPerl like mpiBlast?
In-Reply-To: <529e768305031714193ab15b9d@mail.gmail.com>
References: <529e768305031714193ab15b9d@mail.gmail.com>
Message-ID: <b81ef7c56d46fa412229d8f388021523@duke.edu>

Are you saying would it be hard to parse BLAST from MPIBLAST -- no. It 
should already work with Bio::SearchIO.

Is it hard to run MPIBLAST from within bioperl - you could just write a 
simple wrapper module that looks a lot like StandAloneBlast (but 
simpler).

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 17, 2005, at 2:19 PM, Scott Lambdin wrote:

> Help please.  The scientists have found a blast job that eats all the
> user memory (~4Gigabytes) on the little 32-bit blast server I set up
> for them.  I was looking at giving them mpiBLAST so that they can
> spread the database over some processes, but a requirement is to have
> the BLAST program usable by the BioPerl.  Would it be hard for them to
> use mpiBLAST in BioPerl?   That is, harder than using regular NCBI
> BLAST?
>
> --Scott
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From jason.stajich at duke.edu  Thu Mar 24 20:55:38 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu Mar 24 20:49:51 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <1111679962.18235.8.camel@homer>
References: <1111679962.18235.8.camel@homer>
Message-ID: <1ac3134a474c60b4b72dde9289c65f24@duke.edu>

Is there a real example where these types of locations exist - why  
can't it be flattened without the nested joins?  At any rate - I don't  
really care to parse these if they never exist "in-nature".  If your  
bugfix soln works and doesn't slow things down we can use it I guess,  
although I prefer a regexp.  I don't really have time to patch or test  
in the near future so it will have to wait for someone to volunteer to  
get to it.

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 24, 2005, at 7:59 AM, Mark Hoebeke wrote:

> Hi,
>
> confronted with a bug related to hierarchical location parsing[1], I
> checked the source code of Bio::Factory::FTLocationFactory.pm (both in
> 1.5 and bioperl-live). The comments around the code clearly state that
> hierarchical locations are not supported.
>
> Is this shortcoming due to performance concerns, or just because it
> seems tedious to code ;D ?
>
>
> Mark
>
> [1] Example of hierarchical location description :
> 	join(1000,join(2000,join(3000,4000)))
>
>
>
>
>
>
>
>
> --  
> -------------------------- 
> Mark.Hoebeke@jouy.inra.fr----------------------
> Unit? Statistique & G?nome                                     Unit?  
> MIG
> +33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28  
> 85
> +33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29  
> 01
> Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de  
> Vilvert
> F - 91000 Evry                             F - 78352 Jouy-en-Josas  
> CEDEX
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

From iluminati at earthlink.net  Thu Mar 24 21:10:51 2005
From: iluminati at earthlink.net (iluminati@earthlink.net)
Date: Thu Mar 24 21:03:10 2005
Subject: [Bioperl-l] Question about accessing tags in
	Bio::SeqFeature::Generic
Message-ID: <4243732B.9050406@earthlink.net>

I have a question about a the Bio::SeqFeature::Generic that doesn't seem 
clear to me from the docs.  Here's an example of the seq feature I'm 
creating...

my $RepeatElement = new Bio::SeqFeature::Generic(     -start => 
$L1HERVLine[6],
                                            -end => $L1HERVLine[7],
                                            -strand => $L1HERVLine[9],
                                            -source => 'Repeat',
                                            -tag =>{
                                            -repName => $L1HERVLine[10],
                                            -repClass => $L1HERVLine[11],
                                            -repFamily => $L1HERVLine[12]}
                                            );

Now, the feature itself creates fine.  However, it isn't clear how I 
would retrieve information from the tag has.  The get_tag_value() 
function isn't working for me, and I can't access the hash directly.  
What should I do to be able to access the data?  Let me know, and thanks 
in advance.

From hlapp at gmx.net  Fri Mar 25 02:40:03 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri Mar 25 02:34:43 2005
Subject: [Bioperl-l] Question about accessing tags in
	Bio::SeqFeature::Generic
In-Reply-To: <4243732B.9050406@earthlink.net>
Message-ID: <198497D2-9D01-11D9-B83F-000A959EB4C4@gmx.net>

*Always* provide the error message. Nobody of us has a crystal ball. 
'isn't working for me' why? because of error or because you need 
something that it isn't designed to return?

On Thursday, March 24, 2005, at 06:10  PM, iluminati@earthlink.net 
wrote:

> I have a question about a the Bio::SeqFeature::Generic that doesn't 
> seem clear to me from the docs.  Here's an example of the seq feature 
> I'm creating...
>
> my $RepeatElement = new Bio::SeqFeature::Generic(     -start => 
> $L1HERVLine[6],
>                                            -end => $L1HERVLine[7],
>                                            -strand => $L1HERVLine[9],
>                                            -source => 'Repeat',
>                                            -tag =>{
>                                            -repName => $L1HERVLine[10],
>                                            -repClass => 
> $L1HERVLine[11],
>                                            -repFamily => 
> $L1HERVLine[12]}
>                                            );
>
> Now, the feature itself creates fine.  However, it isn't clear how I 
> would retrieve information from the tag has.  The get_tag_value() 
> function isn't working for me, and I can't access the hash directly.  
> What should I do to be able to access the data?  Let me know, and 
> thanks in advance.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From Marc.Logghe at devgen.com  Fri Mar 25 03:04:25 2005
From: Marc.Logghe at devgen.com (Marc Logghe)
Date: Fri Mar 25 02:59:41 2005
Subject: [Bioperl-l] Question about accessing tags
	inBio::SeqFeature::Generic
Message-ID: <BEE28BF86078B6429D6C780635718E210136A535@morelia.be.devgen.com>

> 
> *Always* provide the error message. Nobody of us has a crystal ball. 
> 'isn't working for me' why? because of error or because you 
> need something that it isn't designed to return?

No crystall ball indeed. Intestines of toads might do well also. There
are a lots of toads currently migrating and (unfortunately) lots of them
are flattenend by cars.
Anyhow, what I could see in there is that you probably call
get_tag_values (watch out here, it is 'get_tag_values', plural, not
'get_tag_value', because you might have multiple values for a certain
tag) in scalar context and not in list context. So, you should be doing
something like:
my ($repclass) = $seq->get_tag_values('repClass'); # or
$seq->get_tag_values('-repClass') when you want to keep the hyphens in
your keys
Also the -tag option takes a hash ref, so I think it is better not to
use hyphens in there for the keys.

HTH and the toad intestines have not let me down ;-)
Marc


> 
> On Thursday, March 24, 2005, at 06:10  PM, iluminati@earthlink.net
> wrote:
> 
> > I have a question about a the Bio::SeqFeature::Generic that doesn't 
> > seem clear to me from the docs.  Here's an example of the 
> seq feature 
> > I'm creating...
> >
> > my $RepeatElement = new Bio::SeqFeature::Generic(     -start => 
> > $L1HERVLine[6],
> >                                            -end => $L1HERVLine[7],
> >                                            -strand => 
> $L1HERVLine[9],
> >                                            -source => 'Repeat',
> >                                            -tag =>{
> >                                            -repName => 
> $L1HERVLine[10],
> >                                            -repClass => 
> > $L1HERVLine[11],
> >                                            -repFamily => 
> > $L1HERVLine[12]}
> >                                            );
> >
> > Now, the feature itself creates fine.  However, it isn't 
> clear how I 
> > would retrieve information from the tag has.  The get_tag_value() 
> > function isn't working for me, and I can't access the hash directly.
> > What should I do to be able to access the data?  Let me know, and 
> > thanks in advance.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

From brian_osborne at cognia.com  Fri Mar 25 07:54:05 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Mar 25 07:48:26 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <1111679962.18235.8.camel@homer>
Message-ID: <GPENLDEIJJHJLHOAJBBPMEJECDAA.brian_osborne@cognia.com>

Mark,

I'm afraid I don't know the answer to your question but let me turn the
question around: would you like to help us fix this?

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Mark Hoebeke
Sent: Thursday, March 24, 2005 10:59 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] Hierarchical location parsing


Hi,

confronted with a bug related to hierarchical location parsing[1], I
checked the source code of Bio::Factory::FTLocationFactory.pm (both in
1.5 and bioperl-live). The comments around the code clearly state that
hierarchical locations are not supported.

Is this shortcoming due to performance concerns, or just because it
seems tedious to code ;D ?


Mark

[1] Example of hierarchical location description :
	join(1000,join(2000,join(3000,4000)))


--
--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX


From brian_osborne at cognia.com  Fri Mar 25 07:58:00 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Mar 25 07:52:06 2005
Subject: [Bioperl-l] Help with hmmpfam
In-Reply-To: <8D44604203DAF9438BF9123B4A08C779B2700A@alpha.ioc.fiocruz.br>
Message-ID: <GPENLDEIJJHJLHOAJBBPEEJFCDAA.brian_osborne@cognia.com>

Kary,

It could be that there's something odd about your hmmpfam output file, for
that reason you should probably show us its contents.

Brian O.

-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Kary Ann Del
Carmen Soriano Ocana
Sent: Thursday, March 24, 2005 3:24 PM
To: bioperl-l@portal.open-bio.org
Cc: maruco@gmail.com
Subject: [Bioperl-l] Help with hmmpfam


Dear All,

I am new to bioperl and would like (if possible) to obtain some help with
the SearcIO module and hmmpfam. I am listing my code below and the output
containing the following error:

(partial) output and error:

[kary@vivax inserir_dados]$ perl bioperl_pfam_23_03_05.pl
Passou a definicao do arquivo query
passou abrir o arquivo mmm.hmm

sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/usr/local/bin/hmmpfam  -E 0.0001      1       HMMER2.0
[2.3.2]\nNAME  76GJYz8zFm\nLENG  327\nALPH

I put some "print" commands everywhere to see where I am getting the error
and looks like it is not entering/printing the while results (eg:
next_result, next_hit). Any help would be greatly appreciated.

Thanks, Kary

************

Script:

#!/usr/bin/perl -w

use lib "/usr/local/bioperl14";
use lib "/usr/local/bioperl-run-1.4";

use Bio::Search::Result::HMMERResult;
use Bio::Tools::Run::Hmmer;
use Bio::Tools::Run::Hmmpfam;
use strict;

my $query;
my $db;
my $seq;
my $dbfile;
my @array;

$query = "sequencia_fasta_4_arg.txt";

print "Passou a definicao do arquivo query\n";

open (READ, "$query") or die "Cannot open $query: $!";
while (my $sequence = <READ>){
	 for ($sequence) {
		&hmmpfam($sequence);
	#print $seq;
	}
}
close (READ);

print "Passou leitura do arquivo query\n";
############################################################################
#################################################################
sub hmmpfam {
	my ($seq) = @_;
	$db = "mmm.hmm";
	open (DH, "$db") or die "Cannot open $db: $!";

	print "passou abrir o arquivo mmm.hmm\n\n";

	while ($dbfile = <DH>){

		#Build a Hmmpfam factory
		my @params = ('DB'=>$dbfile,'E'=>0.0001);


		my $factory = Bio::Tools::Run::Hmmpfam->new(@params);


		# Pass the factory a Bio::Seq object or a file name
		# returns a Bio::SearchIO object
		my $search = $factory->run($seq);
		print "Search: $search\n";

		print "Passou search com parametros \n";


		my @feat;

			my $searchio = new Bio::SearchIO(-format => 'hmmer',
                                 -file   => 'result.hmmer') or die print
"Error for open the file";

			while (my $result = $searchio->next_result){
				print "come?a o while do NEXT RESULT\n\n";
			while(my $hit = $result->next_hit){
				print "come?a o while do HIT - NEXT HIT\n\n";
			while (my $hsp = my $hit->next_hsp){
				print join("\t", ( my$r->query_name,
						$hsp->query->start,
						$hsp->query->end,
						$hit->name,
						$hsp->hit->start,
						$hsp->hit->end,
						$hsp->score,
						$hsp->evalue,
						$hsp->seq_str,
						)), "\n";
				print "terminou o while dos HSPs\n\n";

							 }
							}
									}

}


close (DH);
}

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From brian_osborne at cognia.com  Fri Mar 25 11:52:46 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Fri Mar 25 11:46:52 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <1111766199.18772.13.camel@homer>
Message-ID: <GPENLDEIJJHJLHOAJBBPCEJLCDAA.brian_osborne@cognia.com>

Mark,

Can you also attach the sequence file that you used in order to test your
code? That way I can write a test specifically for the parsing of
hierarchical locations.

You wrote "I'm not sure the new patch won't slow down location parsing
considerably..." Have you actually timed the parsing using the old and new
code?

Thanks again,

Brian O.

-----Original Message-----
From: Mark Hoebeke [mailto:Mark.Hoebeke@jouy.inra.fr]
Sent: Friday, March 25, 2005 10:57 AM
To: brian.osborne@cognia.com
Cc: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] Hierarchical location parsing


Hi Brian,

In fact, I filed a bug request (#1765) to which I attached a patch.

I checked that the patched FTLocationFactory.pm and the unpatched one in
the bioperl-live CVS repository exposed the same behaviour when running
'make test'.

Of course, I don't know the variety of location descriptions found in
the test scripts...

Mark


> Mark,
>
> I'm afraid I don't know the answer to your question but let me turn the
> question around: would you like to help us fix this?
>
> Brian O.
>
--
--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX


From hlapp at gmx.net  Sat Mar 26 23:55:08 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat Mar 26 23:49:10 2005
Subject: [Bioperl-l] Question about accessing tags in
	Bio::SeqFeature::Generic
In-Reply-To: <42445787.8020301@earthlink.net>
Message-ID: <6474577C-9E7C-11D9-A406-000A959EB4C4@gmx.net>

$feature->get_tag_values() will throw an exception if the tag does not 
exist. You need to check first using $feature->has_tag().

Apparently you asked for values for tag 'tag', which is not a tag given 
your initialization example. Instead, -repname,  -repClass, etc are 
tags that you can ask the values for.

On Friday, March 25, 2005, at 10:25  AM, iluminati@earthlink.net wrote:

> Fair enough.  Here's the error message...
> ------------- EXCEPTION  -------------
> MSG: asking for tag value that does not exist tag
> STACK Bio::SeqFeature::Generic::get_tag_values 
> C:/Perl/site/lib/Bio/SeqFeature/G
> eneric.pm:501
> STACK main::L1PA1presence L1PA1presence.pm:43
> STACK toplevel ThesisScript.pl:167
>
> --------------------------------------
>  I know that it's supposed to return an array from which I can access 
> the tag values, but if I can't get the array, how can I get the tag 
> values?  Thanks for the help.
>
>
> Hilmar Lapp wrote:
>
>> *Always* provide the error message. Nobody of us has a crystal ball. 
>> 'isn't working for me' why? because of error or because you need 
>> something that it isn't designed to return?
>>
>> On Thursday, March 24, 2005, at 06:10  PM, iluminati@earthlink.net 
>> wrote:
>>
>>> I have a question about a the Bio::SeqFeature::Generic that doesn't 
>>> seem clear to me from the docs.  Here's an example of the seq 
>>> feature I'm creating...
>>>
>>> my $RepeatElement = new Bio::SeqFeature::Generic(     -start => 
>>> $L1HERVLine[6],
>>>                                            -end => $L1HERVLine[7],
>>>                                            -strand => $L1HERVLine[9],
>>>                                            -source => 'Repeat',
>>>                                            -tag =>{
>>>                                            -repName => 
>>> $L1HERVLine[10],
>>>                                            -repClass => 
>>> $L1HERVLine[11],
>>>                                            -repFamily => 
>>> $L1HERVLine[12]}
>>>                                            );
>>>
>>> Now, the feature itself creates fine.  However, it isn't clear how I 
>>> would retrieve information from the tag has.  The get_tag_value() 
>>> function isn't working for me, and I can't access the hash directly. 
>>>  What should I do to be able to access the data?  Let me know, and 
>>> thanks in advance.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
>

From sutripa at vbi.vt.edu  Sat Mar 26 23:57:21 2005
From: sutripa at vbi.vt.edu (Sucheta Tripathy)
Date: Sat Mar 26 23:51:25 2005
Subject: [Bioperl-l] drawing correct orientations of subject strands
Message-ID: <3563.199.3.136.4.1111899441.squirrel@webmail.vbi.vt.edu>


Hi Group,

I have been trying to plot the correct directions of the HSPs of a
standard blast output using Bio::Graphics. I don't know where I am going
wrong,all the arrows are pointing to one direction.

Any help in this will be greatly appreciated.

Here is what I tried:

 use strict;
  use Bio::Graphics;
  use Bio::SearchIO;

  my $file = shift or die "Usage: blast_graphics.pl <blast file>\n";
  my $out_file = shift;
  my $eval = shift;
  my $num_tracks = shift;

  my $searchio = Bio::SearchIO->new(-file   => $file,
                                    -format => 'blast') or die "parse
failed";

  my $result = $searchio->next_result() or die "no result";

  my $panel = Bio::Graphics::Panel->new(-length    => $result->query_length,
                                        -width     => 800,
                                        -pad_left  => 10,
                                        -pad_right => 10,
                                       );

  my $full_length = Bio::SeqFeature::Generic->new(-start       =>  1,
                                                  -end         =>
$result->query_length,
                                                  -display_name=>$result->query_name
                                                    );
  $panel->add_track($full_length,
                    -glyph   => 'arrow',
                    -tick    => 2,
                    -fgcolor => 'black',
                    -double  => 1,
                    -label   => 1,
                   );

  my $track = $panel->add_track(-glyph       => 'graded_segments',
                                -label       => 1,
                                -connector   => 'dashed',
                                -bgcolor     => 'blue',
                                -font2color  => 'red',
                                -lineWidth   => 1,
                                -stranded    => 1,
                                -sort_order  => 'high_score',
                                -description => sub {
                                  my $feature = shift;
                                  return unless
$feature->has_tag('description');
                                  my ($description) =
$feature->each_tag_value('description');
                                  my $score = $feature->score;
                                  "$description, score=$score";
                                 });
my $i=0;
    my $strand;
  while( my $hit = $result->next_hit ) {
    next unless $hit->significance < $eval;
    $i++;
    my $feature = Bio::SeqFeature::Generic->new(-score     =>
$hit->raw_score,
                                                -display_name => $hit->name,
                                                -strand       => $strand,
                                                -tag     => {
                                                             description
=>
$hit->description
                                                            },
                                               );
    while( my $hsp = $hit->next_hsp ) {
      $strand=$hsp->sbjct->strand;
      print "strand is $strand";
      $feature->add_sub_SeqFeature($hsp,'EXPAND');
    }

    $track->add_feature($feature);
  if($i >= $num_tracks){ last;}
  }


open FH,">$out_file" or die "can't open file $out_file for writing\n $!";
  print FH $panel->png;
  close(FH);


many thanks

Sucheta


-- 
Sucheta Tripathy
Virginia Bioinformatics Institute Phase-I
Washington street.
Virginia Tech.
Blacksburg,VA 24061-0447
phone:(540)231-8138
Fax:  (540) 231-2606
From rob at salmonella.org  Sun Mar 27 00:21:40 2005
From: rob at salmonella.org (Rob Edwards)
Date: Sun Mar 27 00:17:19 2005
Subject: [Bioperl-l] drawing correct orientations of subject strands
In-Reply-To: <3563.199.3.136.4.1111899441.squirrel@webmail.vbi.vt.edu>
References: <3563.199.3.136.4.1111899441.squirrel@webmail.vbi.vt.edu>
Message-ID: <79b0d47c658a258b60c75262561dd1f5@salmonella.org>

<snip>

>     my $strand;
>   while( my $hit = $result->next_hit ) {
>     next unless $hit->significance < $eval;
>     $i++;
>     my $feature = Bio::SeqFeature::Generic->new(-score     =>
> $hit->raw_score,
>                                                 -display_name => 
> $hit->name,
>                                                 -strand       => 
> $strand,
>

It looks like at this point $strand is not set to anything. Shouldn't 
you move the while (my $hsp = $hit->next_hsp){ loop above setting 
-strand?

Rob


>                                                 -tag     => {
>                                                              
> description
> =>
> $hit->description
>                                                             },
>                                                );
>     while( my $hsp = $hit->next_hsp ) {
>       $strand=$hsp->sbjct->strand;
>       print "strand is $strand";
>       $feature->add_sub_SeqFeature($hsp,'EXPAND');
>     }
>
>

From zhoujie at fudan.edu.cn  Sun Mar 27 10:07:40 2005
From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn)
Date: Sun Mar 27 10:17:03 2005
Subject: [Bioperl-l] A question about flatting taxonomy database
Message-ID: <d811c1d857ae.d857aed811c1@fudan.edu.cn>

Hi all,

When I'm using "bp_local_taxonomydb_query.pl" to build a local 
taxonomy database and query it, I always get a exception saying:"no 
such file or directory ***, STACK ***", it seems that the nodes file, 
id2names and names2id files are already created, but how does the 
error MSG arise? I have already installed the BerkeleyDB module by 
ppm. Is there anything else that I need to do?

Thanks very much for you help.

J Z
From jason.stajich at duke.edu  Sun Mar 27 17:58:16 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Sun Mar 27 17:52:38 2005
Subject: [Bioperl-l] A question about flatting taxonomy database
In-Reply-To: <d811c1d857ae.d857aed811c1@fudan.edu.cn>
References: <d811c1d857ae.d857aed811c1@fudan.edu.cn>
Message-ID: <1111964296.42473a88d6d90@webmail.duke.edu>

Can you show the command line argument that you passing in? You need to tell 
the script where to find these files.

-jason
-- 
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/


Quoting zhoujie@fudan.edu.cn:

> Hi all,
> 
> When I'm using "bp_local_taxonomydb_query.pl" to build a local 
> taxonomy database and query it, I always get a exception saying:"no 
> such file or directory ***, STACK ***", it seems that the nodes file, 
> id2names and names2id files are already created, but how does the 
> error MSG arise? I have already installed the BerkeleyDB module by 
> ppm. Is there anything else that I need to do?
> 
> Thanks very much for you help.
> 
> J Z
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
From Mark.Hoebeke at jouy.inra.fr  Thu Mar 24 23:52:27 2005
From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke)
Date: Sun Mar 27 17:57:05 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <1ac3134a474c60b4b72dde9289c65f24@duke.edu>
References: <1111679962.18235.8.camel@homer>
	<1ac3134a474c60b4b72dde9289c65f24@duke.edu>
Message-ID: <1111726347.30799.19.camel@homer>

Sorry I messed up the example I gave, but an "in-nature" hierarchical
location can be found in the complete genome of Streptococcus pyogenes
strain MGAS315 (Genbank access number AE014074) :


source  join(1..749107,join(788646..977266,join(1018339..1137553,
                     join(1171973..1230114,join(1271911..1313193,
                     join(1351400..1410541,1450556..1900521))))))

In this case, it seems likely that the joins could be flattened out.
However, when massively feeding Genbank entries into a database it could
be unpractical to re-parse location strings  to determine if 1/ they
contain nested joins and 2/ they can or cannot be flattened out.


I don't know to what extent the FTLocationFactory is tested when running
'make test' on a bioper-live tree, but it yields the same results on
both patched and unpatched trees.

Mark

Le jeudi 24 mars 2005 ? 17:55 -0800, Jason Stajich a ?crit :
> Is there a real example where these types of locations exist - why  
> can't it be flattened without the nested joins?  At any rate - I don't  
> really care to parse these if they never exist "in-nature".  If your  
> bugfix soln works and doesn't slow things down we can use it I guess,  
> although I prefer a regexp.  I don't really have time to patch or test  
> in the near future so it will have to wait for someone to volunteer to  
> get to it.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/

--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/f8dfb372/attachment.bin
From vaughn at cshl.org  Fri Mar 25 07:39:36 2005
From: vaughn at cshl.org (Matthew Vaughn)
Date: Sun Mar 27 17:57:08 2005
Subject: [Bioperl-l] Re: How to express 'histogram' data in GFF3
Message-ID: <F284B790-9D2A-11D9-9265-000A95A26D06@cshl.org>

I posted a question about this a few days ago and have worked out what 
appears to be a definitive answer, thanks to some advice from Scott 
Cain. I thought I'd share what appears to work with BioPerl 1.5 and 
Gbrowse 1.62.

For a given bit of histogram-type data, proper GFF2 formatting was as 
follows:

ChrII	fwd	chip1	0	100	45.4	+	.	chip1 ChrII:fwd

Contrast this with GFF3 format for the same data point

ChrII	fwd	chip1	0	100	45.4	+	.	ID=chip1:ChrII:fwd

Basically, I merged what used to be the group field into an ID tag. 
Technically, the ':' character should be HTML-escaped, leaving the ID 
tag like so

ChrII	fwd	chip1	0	100	45.4	+	.	ID=chip1%3AChrII%3Afwd

Does the fact the ID is not unique violate the GFF3 spec? That's a 
tough question that I leave to the experts.

The gbrowse configuration file aggregators for GFF2 and GFF3 are the 
same, in this case:

aggregators = agg1{chip1:fwd}

Scott suggested that I might need to create a region feature, then 
assign my histogram data points to it as children using the new Parent 
attribute of GFF3. However, it appears that the custom aggregator takes 
care of this. Clicking on the histogram in my current genome browser 
yields a gbrowse_detail page with all the histogram data points within 
the currently displayed span of coordinates.

--
Matthew W. Vaughn, Ph.D.
Cold Spring Harbor Laboratory
Delbruck Laboratory / Martienssen Group
1 Bungtown Road
Cold Spring Harbor, NY 11724

phone: (516) 367-8469
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2359 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/3c0e4b5c/smime.bin
From Mark.Hoebeke at jouy.inra.fr  Fri Mar 25 10:56:39 2005
From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke)
Date: Sun Mar 27 17:57:11 2005
Subject: [Bioperl-l] Hierarchical location parsing
Message-ID: <1111766199.18772.13.camel@homer>

Hi Brian,

In fact, I filed a bug request (#1765) to which I attached a patch.

I checked that the patched FTLocationFactory.pm and the unpatched one in
the bioperl-live CVS repository exposed the same behaviour when running
'make test'.

Of course, I don't know the variety of location descriptions found in
the test scripts...

Mark


> Mark,
> 
> I'm afraid I don't know the answer to your question but let me turn the
> question around: would you like to help us fix this?
> 
> Brian O.
> 
-- 
--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/b14ecffe/attachment.bin
From Mark.Hoebeke at jouy.inra.fr  Fri Mar 25 15:23:42 2005
From: Mark.Hoebeke at jouy.inra.fr (Mark Hoebeke)
Date: Sun Mar 27 17:57:14 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <GPENLDEIJJHJLHOAJBBPCEJLCDAA.brian_osborne@cognia.com>
References: <GPENLDEIJJHJLHOAJBBPCEJLCDAA.brian_osborne@cognia.com>
Message-ID: <1111782222.18772.37.camel@homer>

Brian,

an example of a nested location is found in the 'source' feature of the
Genbank entry having accession AE014074 (Streptococcus pyogenes MGAS315
complete genome). As the file is over 1 Meg in size once compressed it
might not be a good idea to attach it to this mail which is CC'ed to
bioperl-l ;D

Regarding the performance hit of my fix, I feared that replacing a
compiled regexp with a split and a loop over every character of the
string could have a significant impact. As it stands, I timed a simple
parsing script swallowing Genbank files and spitting out each feature
location as a GFF string, on 131 complete microbial genomes. There is no
difference in output between the bioperl-live FTLocationFactory and its
patched version (basically meaning that this test sample did not contain
nested locations). The times are comparable, with even a slight
advantage to the patched version (915.66user 19.53system 15:42.19elapsed
99%CPU vs. 938.06user 17.33system 16:04.15elapsed 99%CPU).

When comparing the outputs of the parser run on a file with a nested
location, it appears that without the bugfix, the nested location yields
an incorrect GFF string as shown by the diff below.

[mark@homer Loc]$ diff MGAS315 MGAS315_patched
1c1
<
join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..1230114,join(1271911..1313193,join(1351400..1410541,1450556..1900521),)
---
>
join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..1230114,join(1271911..1313193,join(1351400..1410541,1450556..1900521))))))

I'm still cautious about the bugfix because I only produced the diffs
on microbial genomes, which probably have simpler location definitions
that higher eukaryotes.

Greetings,

Mark

Le vendredi 25 mars 2005 ? 11:52 -0500, Brian Osborne a ?crit :
> Mark,
> 
> Can you also attach the sequence file that you used in order to test your
> code? That way I can write a test specifically for the parsing of
> hierarchical locations.
> 
> You wrote "I'm not sure the new patch won't slow down location parsing
> considerably..." Have you actually timed the parsing using the old and new
> code?
> 
> Thanks again,
> 
> Brian O.
> 

-- 
--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050325/88bde8b1/attachment.bin
From ymc at paxil.stanford.edu  Fri Mar 25 18:49:53 2005
From: ymc at paxil.stanford.edu (Yee Man Chan)
Date: Sun Mar 27 17:57:18 2005
Subject: [Bioperl-l] Hidden Markov Model in Bioperl?
Message-ID: <Pine.GSO.3.96.1050325152155.14520y-100000@beacon.stanford.edu>


Hi all

	I just wrote a C module to do Hidden Markov Model (HMM) related
calculations. I find that there is no HMM implementation anywhere (there
are parsers for HMMER output however) in Bioperl. I think maybe it will be
a good idea for me to add this module to Bioperl?

	I am thinking of an interface like this:

Bio::Tools::HMM->new("symbols", "states")
- instantiate an HMM object with a string of symbols (each character
corresponds to one symbol) and a string of states. Other parameters of the
model is generated randomly. Good for starting a Baum-Welch training.

Bio::Tools::HMM->new("symbols", "states", array of initial state
probabilities, matrix of state transition probabilities, matrix of
emission probabilities)
- similar to the one before but now we explicit assign the HMM parameters.

Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
- return the probability of an observed sequence.

Bio::Tools::HMM->Viterbi("string of observed sequence")
- return a string of hidden sequence that maximize the probability of the
happening of the observed sequence.

Bio::Tools::HMM->BaumWelchTraining(array of observed sequences)
- uses an array of observed sequences to find the HMM parameters that
locally maximizes the probabilities of these observed sequences. Optional
parameters can be passed to change the tolerance and maximum number of
iteration.

Bio::Tools::HMM->StatisticalTraining(array of observed sequences, array of
hidden state sequences)
- when the hidden state sequence is also known, use it to determine the
parameter of an HMM using statistical method.

Bio::Tools::HMM->getInitArray()
- return the array of initial state probabilities as an @array

Bio::Tools::HMM->getStateMatrix()
- return the matrix of state transition probabilities as MatrixI

Bio::Tools::HMM->getEmissionMatrix()
- return the matrix of emission probabilities as MatrixI

	This should cover the most HMM applications. What do you think? Do
you have other functions in mind?

	I already contributed Bio::Tools::dpAlign before, so I am not a
newbie. If someone thinks it is a good idea to have this in Bioperl, I can
work on it as soon as possible.

Best Regards,
Yee Man

From hlapp at gmx.net  Sun Mar 27 18:18:01 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sun Mar 27 18:14:17 2005
Subject: [Bioperl-l] Hidden Markov Model in Bioperl?
In-Reply-To: <Pine.GSO.3.96.1050325152155.14520y-100000@beacon.stanford.edu>
Message-ID: <76E15569-9F16-11D9-86E3-000A959EB4C4@gmx.net>

Sounds like a cool thing to have in bioperl.

Just one minor comment for naming, in perl/bioperl we typically 
DontUseCapitatilization to delineate words (like in Java) but put 
underscores. Otherwise to my knowledge you're breaking new ground here 
so there is no consistency check with the rest of bioperl to be passed, 
unless I'm missing something.

	-hilmar

On Friday, March 25, 2005, at 03:49  PM, Yee Man Chan wrote:

>
> Hi all
>
> 	I just wrote a C module to do Hidden Markov Model (HMM) related
> calculations. I find that there is no HMM implementation anywhere 
> (there
> are parsers for HMMER output however) in Bioperl. I think maybe it 
> will be
> a good idea for me to add this module to Bioperl?
>
> 	I am thinking of an interface like this:
>
> Bio::Tools::HMM->new("symbols", "states")
> - instantiate an HMM object with a string of symbols (each character
> corresponds to one symbol) and a string of states. Other parameters of 
> the
> model is generated randomly. Good for starting a Baum-Welch training.
>
> Bio::Tools::HMM->new("symbols", "states", array of initial state
> probabilities, matrix of state transition probabilities, matrix of
> emission probabilities)
> - similar to the one before but now we explicit assign the HMM 
> parameters.
>
> Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
> - return the probability of an observed sequence.
>
> Bio::Tools::HMM->Viterbi("string of observed sequence")
> - return a string of hidden sequence that maximize the probability of 
> the
> happening of the observed sequence.
>
> Bio::Tools::HMM->BaumWelchTraining(array of observed sequences)
> - uses an array of observed sequences to find the HMM parameters that
> locally maximizes the probabilities of these observed sequences. 
> Optional
> parameters can be passed to change the tolerance and maximum number of
> iteration.
>
> Bio::Tools::HMM->StatisticalTraining(array of observed sequences, 
> array of
> hidden state sequences)
> - when the hidden state sequence is also known, use it to determine the
> parameter of an HMM using statistical method.
>
> Bio::Tools::HMM->getInitArray()
> - return the array of initial state probabilities as an @array
>
> Bio::Tools::HMM->getStateMatrix()
> - return the matrix of state transition probabilities as MatrixI
>
> Bio::Tools::HMM->getEmissionMatrix()
> - return the matrix of emission probabilities as MatrixI
>
> 	This should cover the most HMM applications. What do you think? Do
> you have other functions in mind?
>
> 	I already contributed Bio::Tools::dpAlign before, so I am not a
> newbie. If someone thinks it is a good idea to have this in Bioperl, I 
> can
> work on it as soon as possible.
>
> Best Regards,
> Yee Man
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From amackey at pcbi.upenn.edu  Mon Mar 28 08:11:33 2005
From: amackey at pcbi.upenn.edu (Aaron J. Mackey)
Date: Mon Mar 28 08:07:47 2005
Subject: [Bioperl-l] Hidden Markov Model in Bioperl?
In-Reply-To: <Pine.GSO.3.96.1050325152155.14520y-100000@beacon.stanford.edu>
References: <Pine.GSO.3.96.1050325152155.14520y-100000@beacon.stanford.edu>
Message-ID: <24c3580c3fa75bee1d50f9c8b9b1c0b1@pcbi.upenn.edu>

Yes, in bioperl-ext, of course ...

On Mar 25, 2005, at 6:49 PM, Yee Man Chan wrote:

> 	I am thinking of an interface like this:
>
> Bio::Tools::HMM->new("symbols", "states")
> - instantiate an HMM object with a string of symbols (each character
> corresponds to one symbol) and a string of states. Other parameters of 
> the
> model is generated randomly. Good for starting a Baum-Welch training.

Why not expand this to be two arrayrefs of symbols or states?  You can 
convert them into whatever encoded single-char alphabet you'd like.  
Think Perl, not C.  This is a feature request, not a requirement, of 
course.

> Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
> - return the probability of an observed sequence.

This is the Forward algorithm P()?  Perhaps an alias to Forward(), and 
the ability to specify an offset/index at which you want the Forward 
value (see below)?  Or is this the product of viterbi factors?

> Bio::Tools::HMM->Viterbi("string of observed sequence")
> - return a string of hidden sequence that maximize the probability of 
> the
> happening of the observed sequence.

this might also return the P() of the viterbi path; and again, instead 
of returning string of symbols, an arrayref of symbols.

> Bio::Tools::HMM->getInitArray()
> Bio::Tools::HMM->getStateMatrix()
> Bio::Tools::HMM->getEmissionMatrix()

Presumably these should be get/set methods?

What's missing is 1) posterior decoding and 2) partial path probability 
(i.e. F_{i}*v_{i+1}*v+{i+2}*...v*_{j-1}*B_{j}/F_{x}, where i < j, F and 
B are Forward and Backward values, v's are viterbi factors for each 
step in the partial path specified from i to j)

I'd also prefer lower case names (BaumWelch could just be called 
"train" or "learn_unsupervised" or somesuch)

Also, see the HMM functions available in Matlab that do the same ...

Good luck,

-Aaron

--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey@pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697

From ymc at paxil.stanford.edu  Mon Mar 28 12:53:03 2005
From: ymc at paxil.stanford.edu (Yee Man Chan)
Date: Mon Mar 28 13:28:05 2005
Subject: [Bioperl-l] Hidden Markov Model in Bioperl?
In-Reply-To: <76E15569-9F16-11D9-86E3-000A959EB4C4@gmx.net>
Message-ID: <Pine.GSO.3.96.1050328095228.18260A-100000@beacon.stanford.edu>


On Sun, 27 Mar 2005, Hilmar Lapp wrote:

> Sounds like a cool thing to have in bioperl.
> 
> Just one minor comment for naming, in perl/bioperl we typically 
> DontUseCapitatilization to delineate words (like in Java) but put 
> underscores. 

That's fine with me. I can use underscores.

Regards,
Yee Man

> Otherwise to my knowledge you're breaking new ground here 
> so there is no consistency check with the rest of bioperl to be passed, 
> unless I'm missing something.
> 
> 	-hilmar
> 
> On Friday, March 25, 2005, at 03:49  PM, Yee Man Chan wrote:
> 
> >
> > Hi all
> >
> > 	I just wrote a C module to do Hidden Markov Model (HMM) related
> > calculations. I find that there is no HMM implementation anywhere 
> > (there
> > are parsers for HMMER output however) in Bioperl. I think maybe it 
> > will be
> > a good idea for me to add this module to Bioperl?
> >
> > 	I am thinking of an interface like this:
> >
> > Bio::Tools::HMM->new("symbols", "states")
> > - instantiate an HMM object with a string of symbols (each character
> > corresponds to one symbol) and a string of states. Other parameters of 
> > the
> > model is generated randomly. Good for starting a Baum-Welch training.
> >
> > Bio::Tools::HMM->new("symbols", "states", array of initial state
> > probabilities, matrix of state transition probabilities, matrix of
> > emission probabilities)
> > - similar to the one before but now we explicit assign the HMM 
> > parameters.
> >
> > Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
> > - return the probability of an observed sequence.
> >
> > Bio::Tools::HMM->Viterbi("string of observed sequence")
> > - return a string of hidden sequence that maximize the probability of 
> > the
> > happening of the observed sequence.
> >
> > Bio::Tools::HMM->BaumWelchTraining(array of observed sequences)
> > - uses an array of observed sequences to find the HMM parameters that
> > locally maximizes the probabilities of these observed sequences. 
> > Optional
> > parameters can be passed to change the tolerance and maximum number of
> > iteration.
> >
> > Bio::Tools::HMM->StatisticalTraining(array of observed sequences, 
> > array of
> > hidden state sequences)
> > - when the hidden state sequence is also known, use it to determine the
> > parameter of an HMM using statistical method.
> >
> > Bio::Tools::HMM->getInitArray()
> > - return the array of initial state probabilities as an @array
> >
> > Bio::Tools::HMM->getStateMatrix()
> > - return the matrix of state transition probabilities as MatrixI
> >
> > Bio::Tools::HMM->getEmissionMatrix()
> > - return the matrix of emission probabilities as MatrixI
> >
> > 	This should cover the most HMM applications. What do you think? Do
> > you have other functions in mind?
> >
> > 	I already contributed Bio::Tools::dpAlign before, so I am not a
> > newbie. If someone thinks it is a good idea to have this in Bioperl, I 
> > can
> > work on it as soon as possible.
> >
> > Best Regards,
> > Yee Man
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 

From ymc at paxil.stanford.edu  Mon Mar 28 13:14:56 2005
From: ymc at paxil.stanford.edu (Yee Man Chan)
Date: Mon Mar 28 13:28:09 2005
Subject: [Bioperl-l] Hidden Markov Model in Bioperl?
In-Reply-To: <24c3580c3fa75bee1d50f9c8b9b1c0b1@pcbi.upenn.edu>
Message-ID: <Pine.GSO.3.96.1050328095307.18260B-100000@beacon.stanford.edu>


On Mon, 28 Mar 2005, Aaron J. Mackey wrote:

> Yes, in bioperl-ext, of course ...

That was my intention to add it to bioperl-ext.

> 
> On Mar 25, 2005, at 6:49 PM, Yee Man Chan wrote:
> 
> > 	I am thinking of an interface like this:
> >
> > Bio::Tools::HMM->new("symbols", "states")
> > - instantiate an HMM object with a string of symbols (each character
> > corresponds to one symbol) and a string of states. Other parameters of 
> > the
> > model is generated randomly. Good for starting a Baum-Welch training.
> 
> Why not expand this to be two arrayrefs of symbols or states?  You can 
> convert them into whatever encoded single-char alphabet you'd like.  
> Think Perl, not C.  This is a feature request, not a requirement, of 
> course.

I thought about that too. But I suppose this is an HMM for Bioperl and I
don't see any usage outside DNA sequences and protein sequences. So maybe
strings are ok? It can be quite tedious if I need to convert a DNA string
to an array of DNA characters to use HMM. Can you give me some biological
examples that can justify this feature request?

> 
> > Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
> > - return the probability of an observed sequence.
> 
> This is the Forward algorithm P()?  Perhaps an alias to Forward(), and 
> the ability to specify an offset/index at which you want the Forward 
> value (see below)?  Or is this the product of viterbi factors?
> 

This is the P(O|lambda), ie given an HMM model and an observed sequence,
what is the probability of seeing this observed sequence. It is equivalent
to sum_1_to_N alpha_T(i) where alpha is the forward function, T is the
length of observed sequence and N is the number of hidden states.

Forward and Backward functions are hidden from this interface for now.

Oh. Should I return this as log(P)? For a sequence of just couple hundred
symbols, P tends to be very close to zero, so maybe log(P) will make more
sense to users?

> > Bio::Tools::HMM->Viterbi("string of observed sequence")
> > - return a string of hidden sequence that maximize the probability of 
> > the
> > happening of the observed sequence.
> 
> this might also return the P() of the viterbi path; and again, instead 
> of returning string of symbols, an arrayref of symbols.
> 

Based on my understanding of the literature, I don't recall seeing any
effort to compute the probability of the hidden state sequence.

> > Bio::Tools::HMM->getInitArray()
> > Bio::Tools::HMM->getStateMatrix()
> > Bio::Tools::HMM->getEmissionMatrix()
> 
> Presumably these should be get/set methods?
> 

Yeah. I should do both get and set.

> What's missing is 1) posterior decoding and 2) partial path probability 
> (i.e. F_{i}*v_{i+1}*v+{i+2}*...v*_{j-1}*B_{j}/F_{x}, where i < j, F and 
> B are Forward and Backward values, v's are viterbi factors for each 
> step in the partial path specified from i to j)
> 

I can add posterior_decoding but I am not sure what
partial_path_probability is. Can you give me a link to some information 
about it?

> I'd also prefer lower case names (BaumWelch could just be called 
> "train" or "learn_unsupervised" or somesuch)

I have two ways to train the HMM, one is without hidden state sequence
supplied (ie BaumWelchTraining) and one is with hidden state sequence (ie
StatisticalTraining). Is the former learn_unsupervised and the latter
learn_supervised in the AI speak?

Regards,
Yee Man

> 
> Also, see the HMM functions available in Matlab that do the same ...
> 
> Good luck,
> 
> -Aaron
> 
> --
> Aaron J. Mackey, Ph.D.
> Dept. of Biology, Goddard 212
> University of Pennsylvania       email:  amackey@pcbi.upenn.edu
> 415 S. University Avenue         office: 215-898-1205
> Philadelphia, PA  19104-6017     fax:    215-746-6697
> 

From zhoujie at fudan.edu.cn  Mon Mar 28 20:27:19 2005
From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn)
Date: Mon Mar 28 20:29:16 2005
Subject: [Bioperl-l] A question about flatting taxonomy database
Message-ID: <f1bcb9f1ee01.f1ee01f1bcb9@fudan.edu.cn>

Sorry, probably the first mail I replyed was lost, so I send it agian 
here.

My conmmand line is:

perl bp_local_taxonomydb_query.pl --nodes nodes.dmp --names names.dmp

I only changed one thing: the directory in the script, I changed it 
to './index' , and it generates the right thing in that directory. But 
when it finished, the script throw out an exception: 

------------------- EXCEPTION  ----------------
MSG: No such file or directory ./index/nodes  
STACK Bio::DB::Taxonomy::flatfile::_db_connect 
C:/Perl/site/lib/Bio\DBTaxonomy\flatfile.pm:325
STACK Bio::DB::Taxonomy::flatfile::new 
C:/Perl/site/lib/Bio\Bio\DB\Taxonomy\flatfile.pm:138
STACK Bio::DB::Taxonomy::new C:/Perl/site/lib/Bio/DB/Taxonomy.pm:104
STACK toplevel bp_local_taxonomy_query.pl:22
-----------------------------------------------

I think I have already told the script the location, by the -directory 
parameter in the new method of Bio::DB::Taxonomy, at line 25 of the 
script.

Is there anything wrong with my process?

J Z

----- ԭ�ʼ� -----
��: Jason Stajich <jason.stajich@duke.edu>
����: ����һ, ���� 28��, 2005 ����6:58
����: Re: [Bioperl-l] A question about flatting taxonomy database

> Can you show the command line argument that you passing in? You 
> need to tell 
> the script where to find these files.
> 
> -jason
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
> 
> 
> Quoting zhoujie@fudan.edu.cn:
> 
> > Hi all,
> > 
> > When I'm using "bp_local_taxonomydb_query.pl" to build a local 
> > taxonomy database and query it, I always get a exception 
> saying:"no 
> > such file or directory ***, STACK ***", it seems that the nodes 
> file, 
> > id2names and names2id files are already created, but how does 
> the 
> > error MSG arise? I have already installed the BerkeleyDB module 
> by 
> > ppm. Is there anything else that I need to do?
> > 
> > Thanks very much for you help.
> > 
> > J Z
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 

From babyleo11 at yahoo.com.sg  Tue Mar 29 01:38:15 2005
From: babyleo11 at yahoo.com.sg (Minyi)
Date: Tue Mar 29 01:32:24 2005
Subject: [Bioperl-l] BLASTP
Message-ID: <20050329063815.52422.qmail@web40614.mail.yahoo.com>

hi all, 
        i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You!


Regards, 
Minyi
 
" This is the beginning of a new day. You have been given this day to use as you will. You can waste it or use it for good. What you do today is important because you are exchanging a day of your life for it. When tomorrow comes, this day will be gone forever; in its place is something that you have left behind...let it be something good. "


Send instant messages to your online friends http://uk.messenger.yahoo.com 
From muratem at eng.uah.edu  Tue Mar 29 07:57:59 2005
From: muratem at eng.uah.edu (Mike Muratet)
Date: Tue Mar 29 07:52:05 2005
Subject: [Bioperl-l] Primer3.pm
Message-ID: <Pine.GSO.4.05.10503290654340.21027-100000@ebs330>

Greetings

I know this has come up before, but I can't seem to track down the answer.
There is a Primer3.pm in Bio/Tools. The latest 1.4 module docs on the web
page place it there. It's also in Bio/Tools/Run through bioperl-run-1.4.
Which is the correct (or best) path/version to use?

thanks

Mike

From brian_osborne at cognia.com  Tue Mar 29 08:09:05 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Tue Mar 29 08:04:31 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <1111782222.18772.37.camel@homer>
Message-ID: <GPENLDEIJJHJLHOAJBBPEEMECDAA.brian_osborne@cognia.com>

Mark,

I didn't see any "join(join..." statements in that Genbank entry, as part of
a source feature or anywhere else. I'm used this URL:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=21909536


Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Mark Hoebeke
Sent: Friday, March 25, 2005 3:24 PM
To: Brian Osborne
Cc: bioperl-l@portal.open-bio.org
Subject: RE: [Bioperl-l] Hierarchical location parsing


Brian,

an example of a nested location is found in the 'source' feature of the
Genbank entry having accession AE014074 (Streptococcus pyogenes MGAS315
complete genome). As the file is over 1 Meg in size once compressed it
might not be a good idea to attach it to this mail which is CC'ed to
bioperl-l ;D

Regarding the performance hit of my fix, I feared that replacing a
compiled regexp with a split and a loop over every character of the
string could have a significant impact. As it stands, I timed a simple
parsing script swallowing Genbank files and spitting out each feature
location as a GFF string, on 131 complete microbial genomes. There is no
difference in output between the bioperl-live FTLocationFactory and its
patched version (basically meaning that this test sample did not contain
nested locations). The times are comparable, with even a slight
advantage to the patched version (915.66user 19.53system 15:42.19elapsed
99%CPU vs. 938.06user 17.33system 16:04.15elapsed 99%CPU).

When comparing the outputs of the parser run on a file with a nested
location, it appears that without the bugfix, the nested location yields
an incorrect GFF string as shown by the diff below.

[mark@homer Loc]$ diff MGAS315 MGAS315_patched
1c1
<
join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301
14,join(1271911..1313193,join(1351400..1410541,1450556..1900521),)
---
>
join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301
14,join(1271911..1313193,join(1351400..1410541,1450556..1900521))))))

I'm still cautious about the bugfix because I only produced the diffs
on microbial genomes, which probably have simpler location definitions
that higher eukaryotes.

Greetings,

Mark

Le vendredi 25 mars 2005 ? 11:52 -0500, Brian Osborne a ?crit :
> Mark,
>
> Can you also attach the sequence file that you used in order to test your
> code? That way I can write a test specifically for the parsing of
> hierarchical locations.
>
> You wrote "I'm not sure the new patch won't slow down location parsing
> considerably..." Have you actually timed the parsing using the old and new
> code?
>
> Thanks again,
>
> Brian O.
>

--
--------------------------Mark.Hoebeke@jouy.inra.fr----------------------
Unit? Statistique & G?nome                                     Unit? MIG
+33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX


From muratem at eng.uah.edu  Tue Mar 29 08:43:54 2005
From: muratem at eng.uah.edu (Mike Muratet)
Date: Tue Mar 29 08:40:11 2005
Subject: [Bioperl-l] Primer3.pm
In-Reply-To: <Pine.GSO.4.05.10503290654340.21027-100000@ebs330>
Message-ID: <Pine.GSO.4.05.10503290739500.22218-100000@ebs330>


On Tue, 29 Mar 2005, Mike Muratet wrote:

> Greetings
> 
> I know this has come up before, but I can't seem to track down the answer.
> There is a Primer3.pm in Bio/Tools. The latest 1.4 module docs on the web
> page place it there. It's also in Bio/Tools/Run through bioperl-run-1.4.
> Which is the correct (or best) path/version to use?
> 
> thanks
> 
> Mike
> 

Hello again

A careful reading of the documentation for each module would indicate that
the former above is called by the latter which is probably the answer to
the question unless someone knows otherwise. There is nothing in the docs
on the bioperl webpage for Bio/Tools/Run/Primer3.pm.

Mike

From palmeida at igc.gulbenkian.pt  Tue Mar 29 09:53:19 2005
From: palmeida at igc.gulbenkian.pt (Paulo Almeida)
Date: Tue Mar 29 09:47:01 2005
Subject: [Bioperl-l] BLASTP
In-Reply-To: <20050329063815.52422.qmail@web40614.mail.yahoo.com>
References: <20050329063815.52422.qmail@web40614.mail.yahoo.com>
Message-ID: <20050329145319.GA8773@bioinf.igc.gulbenkian.pt>

Hi Minyi,

Have you tried running blastp on that file with perl, from the command line? When you run it with cgi, can you check the webserver's log to see if there are any errors, or send the error output to the browser (I think you can do it with CGI::Carp, but I haven't done that in a long time)?

-Paulo

On Tue, Mar 29, 2005 at 07:38:15AM +0100, Minyi wrote:
> hi all, 
>         i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You!

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
From rob at salmonella.org  Tue Mar 29 11:31:56 2005
From: rob at salmonella.org (Rob Edwards)
Date: Tue Mar 29 11:26:08 2005
Subject: [Bioperl-l] Primer3.pm
In-Reply-To: <Pine.GSO.4.05.10503290739500.22218-100000@ebs330>
References: <Pine.GSO.4.05.10503290739500.22218-100000@ebs330>
Message-ID: <67be66fa1184724340efba2b24dd71e0@salmonella.org>

Bio::Tools::Run::Primer3 is the interface to run the primer3 program. 
Bio::Tools::Primer3 is the interface to parse the output from primer3.

If you already have run primer3 (or do it outside bioperl) then you 
don't need the run module, you can use the parsing module and pass in 
the file. If you want to take a sequence object, design primers against 
it using primer3, and get sequence objects back for the primers and the 
products then you need both.

Rob


On Mar 29, 2005, at 5:43 AM, Mike Muratet wrote:

>
>
> On Tue, 29 Mar 2005, Mike Muratet wrote:
>
>> Greetings
>>
>> I know this has come up before, but I can't seem to track down the 
>> answer.
>> There is a Primer3.pm in Bio/Tools. The latest 1.4 module docs on the 
>> web
>> page place it there. It's also in Bio/Tools/Run through 
>> bioperl-run-1.4.
>> Which is the correct (or best) path/version to use?
>>
>> thanks
>>
>> Mike
>>
>
> Hello again
>
> A careful reading of the documentation for each module would indicate 
> that
> the former above is called by the latter which is probably the answer 
> to
> the question unless someone knows otherwise. There is nothing in the 
> docs
> on the bioperl webpage for Bio/Tools/Run/Primer3.pm.
>
> Mike
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From muratem at eng.uah.edu  Tue Mar 29 12:13:22 2005
From: muratem at eng.uah.edu (Mike Muratet)
Date: Tue Mar 29 12:08:23 2005
Subject: [Bioperl-l] More fun with primer3
Message-ID: <Pine.GSO.4.05.10503291106080.3044-100000@ebs330>

Greetings 

Having deduced (and heard from Rob) that Tools::Run::Primer3 produces a
Tools::Primer3 object, I tried the following:

        my $results = $primer3->run;
        print "n results ",$results->number_of_results(),"\n";
            
        my $primer = $results->next_primer();

and got

n results 4

------------- EXCEPTION  -------------
MSG: The target_sequence must be a Bio::Seq to create this object.
STACK Bio::Seq::PrimedSeq::new
/usr/local/lib/perl5/site_perl/5.8.0/Bio/Seq/PrimedSeq.pm:232
STACK Bio::Tools::Primer3::next_primer
/usr/local/lib/perl5/site_perl/5.8.0/Bio/Tools/Primer3.pm:331
STACK toplevel ./extractAlignments.pl:209

I'm at a loss, it's pretty much a cut and paste out of the docs.

Does anybody have any ideas?

Cheers

Mike

PS The perldoc (I have) of Bio::Tools::Run::Primer3 says it returns a
Bio::Tools::Run::Primer3 object and not Bio::Tools::Primer3. I think it's
the latest version.


From skirov at utk.edu  Tue Mar 29 16:21:35 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar 29 16:16:06 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
Message-ID: <4249C6DF.3040100@utk.edu>

Forgot to mention: Devel::Cycle reports cycle references between 
GeneStructure and Transcript and perl has a known issue of not being 
able to destroy such objects.
So I guess my question is: Is this a feature or  a 'feature' :-) .
In any case
Thanks
Stefan
From jason.stajich at duke.edu  Tue Mar 29 16:55:04 2005
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue Mar 29 16:48:59 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
In-Reply-To: <4249C6DF.3040100@utk.edu>
References: <4249C6DF.3040100@utk.edu>
Message-ID: <89d38db7d506ac1f04a15c6457b480f4@duke.edu>

I had problems with too myself and the memleak actually comes back to 
bite if you process a lot of genes.  I tried to track it down but 
didn't realize it was a cycle there.  We just need to put some code in 
the DESTROY block to take care of this.

Can you send the script which reports the cycle so I can re-test the 
changes?

-jason
--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Mar 29, 2005, at 1:21 PM, Stefan Kirov wrote:

> Forgot to mention: Devel::Cycle reports cycle references between 
> GeneStructure and Transcript and perl has a known issue of not being 
> able to destroy such objects.
> So I guess my question is: Is this a feature or  a 'feature' :-) .
> In any case
> Thanks
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

From skirov at utk.edu  Tue Mar 29 16:59:45 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar 29 16:53:54 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
In-Reply-To: <89d38db7d506ac1f04a15c6457b480f4@duke.edu>
References: <4249C6DF.3040100@utk.edu>
	<89d38db7d506ac1f04a15c6457b480f4@duke.edu>
Message-ID: <4249CFD1.4090700@utk.edu>

Actually I did, but my message is considered suspicious  :-( .
I will send it directly to your e-mail.
Stefan

Jason Stajich wrote:

> I had problems with too myself and the memleak actually comes back to 
> bite if you process a lot of genes.  I tried to track it down but 
> didn't realize it was a cycle there.  We just need to put some code in 
> the DESTROY block to take care of this.
>
> Can you send the script which reports the cycle so I can re-test the 
> changes?
>
> -jason
> -- 
> Jason Stajich
> jason.stajich at duke.edu
> http://www.duke.edu/~jes12/
>
> On Mar 29, 2005, at 1:21 PM, Stefan Kirov wrote:
>
>> Forgot to mention: Devel::Cycle reports cycle references between 
>> GeneStructure and Transcript and perl has a known issue of not being 
>> able to destroy such objects.
>> So I guess my question is: Is this a feature or  a 'feature' :-) .
>> In any case
>> Thanks
>> Stefan
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>

From jcsanchez at cib.csic.es  Tue Mar 29 04:47:56 2005
From: jcsanchez at cib.csic.es (Juan Carlos Sanchez Ferrero)
Date: Tue Mar 29 16:55:50 2005
Subject: [Bioperl-l] Re:BLASTP
Message-ID: <4249244C.1050607@cib.csic.es>


Hello,
maybe  you only have a nt db,
but not a protein db accesible by cgi-bin, have you check that?

regards

jc

From skirov at utk.edu  Tue Mar 29 16:14:21 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Tue Mar 29 16:55:53 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
Message-ID: <4249C52D.2050804@utk.edu>

I am working on the Entrezgene parser and tried to use 
Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP 
relationships. I am pretty much done with the parser (based on Mingyi 
Liu low lovel parser), but once I tried to parse a whole file (Homo 
sapiens) I ran out of memory. I think the problem might be 
Bio::SeqFeature::Gene::GeneStructure::add_Transcript.
Here is the code which I used to simulate the problem and the resulting 
report file. It seams adding Bio::SeqFeature::Gene::Exon to 
Bio::SeqFeature::Gene::Transcript do not contribute to the problem.
Any suggestions?
Stefan
-------------- next part --------------
Simulation 2	0MB
Simulation 3	1MB
Simulation 4	1MB
Simulation 5	1MB
Simulation 6	2MB
Simulation 7	2MB
Simulation 8	2MB
Simulation 9	2MB
Simulation 10	3MB
Simulation 11	3MB
Simulation 12	3MB
Simulation 13	4MB
Simulation 14	4MB
Simulation 15	4MB
Simulation 16	5MB
Simulation 17	5MB
Simulation 18	5MB
Simulation 19	5MB
Simulation 20	6MB
Simulation 21	6MB
Simulation 22	6MB
Simulation 23	7MB
Simulation 24	7MB
Simulation 25	7MB
Simulation 26	8MB
Simulation 27	8MB
Simulation 28	8MB
Simulation 29	9MB
Simulation 30	9MB
Simulation 31	9MB
Simulation 32	9MB
Simulation 33	10MB
Simulation 34	10MB
Simulation 35	10MB
Simulation 36	11MB
Simulation 37	11MB
Simulation 38	11MB
Simulation 39	12MB
Simulation 40	12MB
Simulation 41	12MB
Simulation 42	13MB
Simulation 43	13MB
Simulation 44	13MB
Simulation 45	13MB
Simulation 46	14MB
Simulation 47	14MB
Simulation 48	14MB
Simulation 49	15MB
Simulation 50	15MB
Simulation 51	15MB
Simulation 52	16MB
Simulation 53	16MB
Simulation 54	16MB
Simulation 55	16MB
Simulation 56	17MB
Simulation 57	17MB
Simulation 58	17MB
Simulation 59	18MB
Simulation 60	18MB
Simulation 61	18MB
Simulation 62	19MB
Simulation 63	19MB
Simulation 64	19MB
Simulation 65	19MB
Simulation 66	20MB
Simulation 67	20MB
Simulation 68	20MB
Simulation 69	21MB
Simulation 70	21MB
Simulation 71	21MB
Simulation 72	22MB
Simulation 73	22MB
Simulation 74	22MB
Simulation 75	23MB
Simulation 76	23MB
Simulation 77	23MB
Simulation 78	24MB
Simulation 79	24MB
Simulation 80	24MB
Simulation 81	24MB
Simulation 82	25MB
Simulation 83	25MB
Simulation 84	25MB
Simulation 85	26MB
Simulation 86	26MB
Simulation 87	26MB
Simulation 88	27MB
Simulation 89	27MB
Simulation 90	27MB
Simulation 91	27MB
Simulation 92	28MB
Simulation 93	28MB
Simulation 94	28MB
Simulation 95	29MB
Simulation 96	29MB
Simulation 97	29MB
Simulation 98	30MB
Simulation 99	30MB
Simulation 100	30MB
6620	6650
-------------- next part --------------
use Bio::SeqFeature::Gene::Exon;
use Bio::SeqFeature::Gene::Transcript;
use Bio::SeqFeature::Gene::GeneStructure;
use strict;
use Devel::Cycle;

my ($prevmem,$growth,$first);
for my $k (1..100) {
open (FREE, "free -m|");
my $buf=<FREE>;
my $buf=<FREE>;
my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4);
if ($prevmem) {
	$growth+= $mem-$prevmem;
	print "Simulation $k\t$growth","MB\n";
}
else { $first=$mem;}
$prevmem=$mem;
for my $i (1..20) {
my $gstruct=new Bio::SeqFeature::Gene::GeneStructure;
for my $n (0..3) {
my $transcript=new Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n,
                                          -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks');
        

foreach my $e (1.10) {
    my $exonobj=new Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1);
    $transcript->add_exon($exonobj);
}
$gstruct->add_transcript($transcript);
}
}
}
print "$first\t$prevmem\n";
From babyleo11 at yahoo.com.sg  Wed Mar 30 01:09:42 2005
From: babyleo11 at yahoo.com.sg (Minyi)
Date: Wed Mar 30 01:04:46 2005
Subject: [Bioperl-l] BLASTP
In-Reply-To: 6667
Message-ID: <20050330060942.63088.qmail@web40609.mail.yahoo.com>

Hi all, 
         i've solved my problem. Thanks Paulo for asking me to check the webserver's log. The program can't work because i didn't have .ncbirc file. Thanks everyone! Cheers! 

Paulo Almeida <palmeida@igc.gulbenkian.pt> wrote:
Hi Minyi,

Have you tried running blastp on that file with perl, from the command line? When you run it with cgi, can you check the webserver's log to see if there are any errors, or send the error output to the browser (I think you can do it with CGI::Carp, but I haven't done that in a long time)?

-Paulo

On Tue, Mar 29, 2005 at 07:38:15AM +0100, Minyi wrote:
> hi all, 
> i'm doing a program to run blastp on cgi/perl. However, there's no hits found no matter what files i use. But when i run the program using standalone blast with the same files, there are hits found. Also, the same program can work for blastn on cgi/perl and standalone. The only thing it can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank You!

-- 
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel +351 21 446 46 35
fax +351 21 440 79 70
http://www.igc.gulbenkian.pt
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


Regards, 
Minyi
 
" This is the beginning of a new day. You have been given this day to use as you will. You can waste it or use it for good. What you do today is important because you are exchanging a day of your life for it. When tomorrow comes, this day will be gone forever; in its place is something that you have left behind...let it be something good. "


Send instant messages to your online friends http://uk.messenger.yahoo.com 
From brian_osborne at cognia.com  Wed Mar 30 08:12:59 2005
From: brian_osborne at cognia.com (Brian Osborne)
Date: Wed Mar 30 08:08:33 2005
Subject: [Bioperl-l] BLASTP
In-Reply-To: <20050329145319.GA8773@bioinf.igc.gulbenkian.pt>
Message-ID: <GPENLDEIJJHJLHOAJBBPGENHCDAA.brian_osborne@cognia.com>

Paulo and Minyi,

>error output to the browser (I think you can do it with CGI::Carp

Yes:

use CGI::Carp qw(fatalsToBrowser);


Brian O.


-----Original Message-----
From: bioperl-l-bounces@portal.open-bio.org
[mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Paulo Almeida
Sent: Tuesday, March 29, 2005 9:53 AM
To: bioperl-l@portal.open-bio.org
Subject: Re: [Bioperl-l] BLASTP


Hi Minyi,

Have you tried running blastp on that file with perl, from the command line?
When you run it with cgi, can you check the webserver's log to see if there
are any errors, or send the error output to the browser (I think you can do
it with CGI::Carp, but I haven't done that in a long time)?

-Paulo

On Tue, Mar 29, 2005 at 07:38:15AM +0100, Minyi wrote:
> hi all,
>         i'm doing a program to run blastp on cgi/perl. However, there's no
hits found no matter what files i use. But when i run the program using
standalone blast with the same files, there are hits found. Also, the same
program can work for blastn on cgi/perl and standalone. The only thing it
can't work is blastp on cgi/perl. Does anyone know what's the problem? Thank
You!

--
Paulo Almeida
Instituto Gulbenkian de Ciencia
Apartado 14, 2781-901, Oeiras, PORTUGAL
tel  +351 21 446 46 35
fax  +351 21 440 79 70
http://www.igc.gulbenkian.pt
_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From babenko at ncbi.nlm.nih.gov  Wed Mar 30 12:02:59 2005
From: babenko at ncbi.nlm.nih.gov (Babenko, Vladimir (NIH/NLM/NCBI))
Date: Wed Mar 30 11:57:24 2005
Subject: [Bioperl-l] Turning the tree into bifurcating one
Message-ID: <69BA0F938FAC6A4CBEF49461720696F208DDE410@nihexchange16.nih.gov>

    Greetings, 
Is there any possible solution to insert some pseudo-nodes into the tree to
make it bufurkating?
   Some programs can deal only with bifurkating ones...
  Thank you,
        Vladimir
From hlapp at gmx.net  Wed Mar 30 12:13:54 2005
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed Mar 30 12:09:12 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
In-Reply-To: <4249C52D.2050804@utk.edu>
Message-ID: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net>

Those modules probably can use some serious review. If there is a cycle 
then Jason should be on the right path with overriding DESTROY, but 
first one would need to know where the cycle is. I don't recall one 
being there on purpose ...

Sorry to not be of more help ...

	-hilmar

On Tuesday, March 29, 2005, at 01:14  PM, Stefan Kirov wrote:

> I am working on the Entrezgene parser and tried to use 
> Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP 
> relationships. I am pretty much done with the parser (based on Mingyi 
> Liu low lovel parser), but once I tried to parse a whole file (Homo 
> sapiens) I ran out of memory. I think the problem might be 
> Bio::SeqFeature::Gene::GeneStructure::add_Transcript.
> Here is the code which I used to simulate the problem and the 
> resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to 
> Bio::SeqFeature::Gene::Transcript do not contribute to the problem.
> Any suggestions?
> Stefan
> Simulation 2	0MB
> Simulation 3	1MB
> Simulation 4	1MB
> Simulation 5	1MB
> Simulation 6	2MB
> Simulation 7	2MB
> Simulation 8	2MB
> Simulation 9	2MB
> Simulation 10	3MB
> Simulation 11	3MB
> Simulation 12	3MB
> Simulation 13	4MB
> Simulation 14	4MB
> Simulation 15	4MB
> Simulation 16	5MB
> Simulation 17	5MB
> Simulation 18	5MB
> Simulation 19	5MB
> Simulation 20	6MB
> Simulation 21	6MB
> Simulation 22	6MB
> Simulation 23	7MB
> Simulation 24	7MB
> Simulation 25	7MB
> Simulation 26	8MB
> Simulation 27	8MB
> Simulation 28	8MB
> Simulation 29	9MB
> Simulation 30	9MB
> Simulation 31	9MB
> Simulation 32	9MB
> Simulation 33	10MB
> Simulation 34	10MB
> Simulation 35	10MB
> Simulation 36	11MB
> Simulation 37	11MB
> Simulation 38	11MB
> Simulation 39	12MB
> Simulation 40	12MB
> Simulation 41	12MB
> Simulation 42	13MB
> Simulation 43	13MB
> Simulation 44	13MB
> Simulation 45	13MB
> Simulation 46	14MB
> Simulation 47	14MB
> Simulation 48	14MB
> Simulation 49	15MB
> Simulation 50	15MB
> Simulation 51	15MB
> Simulation 52	16MB
> Simulation 53	16MB
> Simulation 54	16MB
> Simulation 55	16MB
> Simulation 56	17MB
> Simulation 57	17MB
> Simulation 58	17MB
> Simulation 59	18MB
> Simulation 60	18MB
> Simulation 61	18MB
> Simulation 62	19MB
> Simulation 63	19MB
> Simulation 64	19MB
> Simulation 65	19MB
> Simulation 66	20MB
> Simulation 67	20MB
> Simulation 68	20MB
> Simulation 69	21MB
> Simulation 70	21MB
> Simulation 71	21MB
> Simulation 72	22MB
> Simulation 73	22MB
> Simulation 74	22MB
> Simulation 75	23MB
> Simulation 76	23MB
> Simulation 77	23MB
> Simulation 78	24MB
> Simulation 79	24MB
> Simulation 80	24MB
> Simulation 81	24MB
> Simulation 82	25MB
> Simulation 83	25MB
> Simulation 84	25MB
> Simulation 85	26MB
> Simulation 86	26MB
> Simulation 87	26MB
> Simulation 88	27MB
> Simulation 89	27MB
> Simulation 90	27MB
> Simulation 91	27MB
> Simulation 92	28MB
> Simulation 93	28MB
> Simulation 94	28MB
> Simulation 95	29MB
> Simulation 96	29MB
> Simulation 97	29MB
> Simulation 98	30MB
> Simulation 99	30MB
> Simulation 100	30MB
> 6620	6650
> use Bio::SeqFeature::Gene::Exon;
> use Bio::SeqFeature::Gene::Transcript;
> use Bio::SeqFeature::Gene::GeneStructure;
> use strict;
> use Devel::Cycle;
>
> my ($prevmem,$growth,$first);
> for my $k (1..100) {
> open (FREE, "free -m|");
> my $buf=<FREE>;
> my $buf=<FREE>;
> my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4);
> if ($prevmem) {
> 	$growth+= $mem-$prevmem;
> 	print "Simulation $k\t$growth","MB\n";
> }
> else { $first=$mem;}
> $prevmem=$mem;
> for my $i (1..20) {
> my $gstruct=new Bio::SeqFeature::Gene::GeneStructure;
> for my $n (0..3) {
> my $transcript=new 
> Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n,
>                                           
> -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks');
>
>
> foreach my $e (1.10) {
>     my $exonobj=new 
> Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1);
>     $transcript->add_exon($exonobj);
> }
> $gstruct->add_transcript($transcript);
> }
> }
> }
> print 
> "$first\t$prevmem\n";_______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------


From skirov at utk.edu  Wed Mar 30 12:23:22 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Wed Mar 30 12:18:26 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
In-Reply-To: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net>
References: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net>
Message-ID: <424AE08A.3080201@utk.edu>

Hilmar,
Reported by Devel::Cycle:
Cycle (1):
        $Bio::SeqFeature::Gene::GeneStructure::HC->{'_transcripts'} => \@HD
                              $HD->[0] =>
\%Bio::SeqFeature::Gene::Transcript::HE
        $Bio::SeqFeature::Gene::Transcript::HE->{'parent'} =>
\%Bio::SeqFeature::Gene::GeneStructure::HC

The problem is $fea in add_transcript adds $self (GeneStructure object) 
as parent: $fea->parent($self )  thus creating the cycle. One can simply 
call $fea->parent(); I guess, but this may need to be in DESTROY.

Hilmar Lapp wrote:

> Those modules probably can use some serious review. If there is a 
> cycle then Jason should be on the right path with overriding DESTROY, 
> but first one would need to know where the cycle is. I don't recall 
> one being there on purpose ...
>
> Sorry to not be of more help ...
>
>     -hilmar
>
> On Tuesday, March 29, 2005, at 01:14  PM, Stefan Kirov wrote:
>
>> I am working on the Entrezgene parser and tried to use 
>> Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP 
>> relationships. I am pretty much done with the parser (based on Mingyi 
>> Liu low lovel parser), but once I tried to parse a whole file (Homo 
>> sapiens) I ran out of memory. I think the problem might be 
>> Bio::SeqFeature::Gene::GeneStructure::add_Transcript.
>> Here is the code which I used to simulate the problem and the 
>> resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to 
>> Bio::SeqFeature::Gene::Transcript do not contribute to the problem.
>> Any suggestions?
>> Stefan
>> Simulation 2    0MB
>> Simulation 3    1MB
>> Simulation 4    1MB
>> Simulation 5    1MB
>> Simulation 6    2MB
>> Simulation 7    2MB
>> Simulation 8    2MB
>> Simulation 9    2MB
>> Simulation 10    3MB
>> Simulation 11    3MB
>> Simulation 12    3MB
>> Simulation 13    4MB
>> Simulation 14    4MB
>> Simulation 15    4MB
>> Simulation 16    5MB
>> Simulation 17    5MB
>> Simulation 18    5MB
>> Simulation 19    5MB
>> Simulation 20    6MB
>> Simulation 21    6MB
>> Simulation 22    6MB
>> Simulation 23    7MB
>> Simulation 24    7MB
>> Simulation 25    7MB
>> Simulation 26    8MB
>> Simulation 27    8MB
>> Simulation 28    8MB
>> Simulation 29    9MB
>> Simulation 30    9MB
>> Simulation 31    9MB
>> Simulation 32    9MB
>> Simulation 33    10MB
>> Simulation 34    10MB
>> Simulation 35    10MB
>> Simulation 36    11MB
>> Simulation 37    11MB
>> Simulation 38    11MB
>> Simulation 39    12MB
>> Simulation 40    12MB
>> Simulation 41    12MB
>> Simulation 42    13MB
>> Simulation 43    13MB
>> Simulation 44    13MB
>> Simulation 45    13MB
>> Simulation 46    14MB
>> Simulation 47    14MB
>> Simulation 48    14MB
>> Simulation 49    15MB
>> Simulation 50    15MB
>> Simulation 51    15MB
>> Simulation 52    16MB
>> Simulation 53    16MB
>> Simulation 54    16MB
>> Simulation 55    16MB
>> Simulation 56    17MB
>> Simulation 57    17MB
>> Simulation 58    17MB
>> Simulation 59    18MB
>> Simulation 60    18MB
>> Simulation 61    18MB
>> Simulation 62    19MB
>> Simulation 63    19MB
>> Simulation 64    19MB
>> Simulation 65    19MB
>> Simulation 66    20MB
>> Simulation 67    20MB
>> Simulation 68    20MB
>> Simulation 69    21MB
>> Simulation 70    21MB
>> Simulation 71    21MB
>> Simulation 72    22MB
>> Simulation 73    22MB
>> Simulation 74    22MB
>> Simulation 75    23MB
>> Simulation 76    23MB
>> Simulation 77    23MB
>> Simulation 78    24MB
>> Simulation 79    24MB
>> Simulation 80    24MB
>> Simulation 81    24MB
>> Simulation 82    25MB
>> Simulation 83    25MB
>> Simulation 84    25MB
>> Simulation 85    26MB
>> Simulation 86    26MB
>> Simulation 87    26MB
>> Simulation 88    27MB
>> Simulation 89    27MB
>> Simulation 90    27MB
>> Simulation 91    27MB
>> Simulation 92    28MB
>> Simulation 93    28MB
>> Simulation 94    28MB
>> Simulation 95    29MB
>> Simulation 96    29MB
>> Simulation 97    29MB
>> Simulation 98    30MB
>> Simulation 99    30MB
>> Simulation 100    30MB
>> 6620    6650
>> use Bio::SeqFeature::Gene::Exon;
>> use Bio::SeqFeature::Gene::Transcript;
>> use Bio::SeqFeature::Gene::GeneStructure;
>> use strict;
>> use Devel::Cycle;
>>
>> my ($prevmem,$growth,$first);
>> for my $k (1..100) {
>> open (FREE, "free -m|");
>> my $buf=<FREE>;
>> my $buf=<FREE>;
>> my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4);
>> if ($prevmem) {
>>     $growth+= $mem-$prevmem;
>>     print "Simulation $k\t$growth","MB\n";
>> }
>> else { $first=$mem;}
>> $prevmem=$mem;
>> for my $i (1..20) {
>> my $gstruct=new Bio::SeqFeature::Gene::GeneStructure;
>> for my $n (0..3) {
>> my $transcript=new 
>> Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n,
>>                                           
>> -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks');
>>
>>
>> foreach my $e (1.10) {
>>     my $exonobj=new 
>> Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1);
>>     $transcript->add_exon($exonobj);
>> }
>> $gstruct->add_transcript($transcript);
>> }
>> }
>> }
>> print 
>> "$first\t$prevmem\n";_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From skirov at utk.edu  Wed Mar 30 12:25:26 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Wed Mar 30 12:22:22 2005
Subject: [Bioperl-l] Possible memory leak in
	Bio::SeqFeature::Gene::GeneStructure?
In-Reply-To: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net>
References: <1818EE1C-A13F-11D9-8431-000A959EB4C4@gmx.net>
Message-ID: <424AE106.2020203@utk.edu>

Oops, actually Bio::SeqFeature::Gene::Transcript::parent does not allow 
as it is undef.... Either should be fixed ot DESTROY needs to directly 
undef $transcript->{parent}.

Hilmar Lapp wrote:

> Those modules probably can use some serious review. If there is a 
> cycle then Jason should be on the right path with overriding DESTROY, 
> but first one would need to know where the cycle is. I don't recall 
> one being there on purpose ...
>
> Sorry to not be of more help ...
>
>     -hilmar
>
> On Tuesday, March 29, 2005, at 01:14  PM, Stefan Kirov wrote:
>
>> I am working on the Entrezgene parser and tried to use 
>> Bio::SeqFeature::Gene::GeneStructure to describe NC/NT to NM and NP 
>> relationships. I am pretty much done with the parser (based on Mingyi 
>> Liu low lovel parser), but once I tried to parse a whole file (Homo 
>> sapiens) I ran out of memory. I think the problem might be 
>> Bio::SeqFeature::Gene::GeneStructure::add_Transcript.
>> Here is the code which I used to simulate the problem and the 
>> resulting report file. It seams adding Bio::SeqFeature::Gene::Exon to 
>> Bio::SeqFeature::Gene::Transcript do not contribute to the problem.
>> Any suggestions?
>> Stefan
>> Simulation 2    0MB
>> Simulation 3    1MB
>> Simulation 4    1MB
>> Simulation 5    1MB
>> Simulation 6    2MB
>> Simulation 7    2MB
>> Simulation 8    2MB
>> Simulation 9    2MB
>> Simulation 10    3MB
>> Simulation 11    3MB
>> Simulation 12    3MB
>> Simulation 13    4MB
>> Simulation 14    4MB
>> Simulation 15    4MB
>> Simulation 16    5MB
>> Simulation 17    5MB
>> Simulation 18    5MB
>> Simulation 19    5MB
>> Simulation 20    6MB
>> Simulation 21    6MB
>> Simulation 22    6MB
>> Simulation 23    7MB
>> Simulation 24    7MB
>> Simulation 25    7MB
>> Simulation 26    8MB
>> Simulation 27    8MB
>> Simulation 28    8MB
>> Simulation 29    9MB
>> Simulation 30    9MB
>> Simulation 31    9MB
>> Simulation 32    9MB
>> Simulation 33    10MB
>> Simulation 34    10MB
>> Simulation 35    10MB
>> Simulation 36    11MB
>> Simulation 37    11MB
>> Simulation 38    11MB
>> Simulation 39    12MB
>> Simulation 40    12MB
>> Simulation 41    12MB
>> Simulation 42    13MB
>> Simulation 43    13MB
>> Simulation 44    13MB
>> Simulation 45    13MB
>> Simulation 46    14MB
>> Simulation 47    14MB
>> Simulation 48    14MB
>> Simulation 49    15MB
>> Simulation 50    15MB
>> Simulation 51    15MB
>> Simulation 52    16MB
>> Simulation 53    16MB
>> Simulation 54    16MB
>> Simulation 55    16MB
>> Simulation 56    17MB
>> Simulation 57    17MB
>> Simulation 58    17MB
>> Simulation 59    18MB
>> Simulation 60    18MB
>> Simulation 61    18MB
>> Simulation 62    19MB
>> Simulation 63    19MB
>> Simulation 64    19MB
>> Simulation 65    19MB
>> Simulation 66    20MB
>> Simulation 67    20MB
>> Simulation 68    20MB
>> Simulation 69    21MB
>> Simulation 70    21MB
>> Simulation 71    21MB
>> Simulation 72    22MB
>> Simulation 73    22MB
>> Simulation 74    22MB
>> Simulation 75    23MB
>> Simulation 76    23MB
>> Simulation 77    23MB
>> Simulation 78    24MB
>> Simulation 79    24MB
>> Simulation 80    24MB
>> Simulation 81    24MB
>> Simulation 82    25MB
>> Simulation 83    25MB
>> Simulation 84    25MB
>> Simulation 85    26MB
>> Simulation 86    26MB
>> Simulation 87    26MB
>> Simulation 88    27MB
>> Simulation 89    27MB
>> Simulation 90    27MB
>> Simulation 91    27MB
>> Simulation 92    28MB
>> Simulation 93    28MB
>> Simulation 94    28MB
>> Simulation 95    29MB
>> Simulation 96    29MB
>> Simulation 97    29MB
>> Simulation 98    30MB
>> Simulation 99    30MB
>> Simulation 100    30MB
>> 6620    6650
>> use Bio::SeqFeature::Gene::Exon;
>> use Bio::SeqFeature::Gene::Transcript;
>> use Bio::SeqFeature::Gene::GeneStructure;
>> use strict;
>> use Devel::Cycle;
>>
>> my ($prevmem,$growth,$first);
>> for my $k (1..100) {
>> open (FREE, "free -m|");
>> my $buf=<FREE>;
>> my $buf=<FREE>;
>> my ($x1,$x2,$mem,$x3)=split(/\s+/,$buf,4);
>> if ($prevmem) {
>>     $growth+= $mem-$prevmem;
>>     print "Simulation $k\t$growth","MB\n";
>> }
>> else { $first=$mem;}
>> $prevmem=$mem;
>> for my $i (1..20) {
>> my $gstruct=new Bio::SeqFeature::Gene::GeneStructure;
>> for my $n (0..3) {
>> my $transcript=new 
>> Bio::SeqFeature::Gene::Transcript(-primary=>'memleak'.$n,
>>                                           
>> -start=>1,-end=>2000,-strand=>, -desc=>'test for memmory leaks');
>>
>>
>> foreach my $e (1.10) {
>>     my $exonobj=new 
>> Bio::SeqFeature::Gene::Exon(-start=>$e*10,-end=>$e*10+9,-strand=>1);
>>     $transcript->add_exon($exonobj);
>> }
>> $gstruct->add_transcript($transcript);
>> }
>> }
>> }
>> print 
>> "$first\t$prevmem\n";_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov@utk.edu
sao@ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

From mlemieux at bioinfo.ca  Wed Mar 30 12:26:58 2005
From: mlemieux at bioinfo.ca (Madeleine Lemieux)
Date: Wed Mar 30 12:26:48 2005
Subject: [Bioperl-l] BLASTP
Message-ID: <7565f93c3ae486b580f54b6a40dcf4bc@bioinfo.ca>

Minyi,

Are you sure both versions of Blast are getting the same e-val? The 
RemoteBlast.pm default is 1e-3 but Perl.pm sets it at 1e-10. I'm not 
sure what the standalone value is.

-Madeleine

From amtd9 at umr.edu  Wed Mar 30 15:07:54 2005
From: amtd9 at umr.edu (Mane, Ajay (UMR-Student))
Date: Wed Mar 30 15:06:02 2005
Subject: [Bioperl-l] BL2SEQ
Message-ID: <58AF0CF509606A49B1770AB5DFF811CE110839@UMR-CMAIL1.umr.edu>

I have mailed earlier to this group, but there was no response. I want to run perl from a command line to get the 
results of NCBI bl2seq tool, which aligns two sequences. Which are the modules to be used ? A reply atleast this time
would be great.
 
- Ajay

________________________________

From: bioperl-l-bounces@portal.open-bio.org on behalf of Madeleine Lemieux
Sent: Wed 3/30/2005 11:26 AM
To: bioperl-l@portal.open-bio.org
Subject: [Bioperl-l] BLASTP


Minyi,

Are you sure both versions of Blast are getting the same e-val? The
RemoteBlast.pm default is 1e-3 but Perl.pm sets it at 1e-10. I'm not
sure what the standalone value is.

-Madeleine

_______________________________________________
Bioperl-l mailing list
Bioperl-l@portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l


From skirov at utk.edu  Wed Mar 30 17:31:44 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Wed Mar 30 17:26:40 2005
Subject: [Bioperl-l] EntrezGene ASN parser
Message-ID: <424B28D0.7030101@utk.edu>

I just finished a Bioperl EntrezGene Parser based on Mingyi Liu's ASN 
Gene parser. It creates two main objects: a Bio::Seq object which 
contains most of the data such as references, description, map location, 
etc; and a Bio::Cluster::SequenceFamily object, which contains the 
refseqs and the gene structure (through NT/NC annotation, represented as 
Bio::SeqFeature::Gene objects).  Another data I make available is the 
uncaptured data. So each time a some data is transfered from the hash 
which represents the parsed data, I am deleting the respective  key. 
Everything else is concidered uncaptured. I am doing this since some 
records could be non-compliant or simply there may be new data supplied 
by NCBI. There will be naturally some data, which is not interesting, 
and therefore is not captured (a lot of redundant data in the 
EntrezGene). So the parser would act like that:
my ($egene,$assoc_seq,$uncaptured)=$egparser->next_seq;
There are few things I need to add (Markers and GO are not yet in these 
objects), but most of work is done. Unless somebody objects, I will 
commit the code (Bio::SeqIO::entrezgene?) when I write the documentation 
to match the standard.
Few notes:
1. It would be nice if there is Bio::Annotation::DBLink::url method. It 
makes sense (I think) since most DB links would refer also to a webpage.
2. It takes now 45 minutes to parse the whole human ASN file, which is 4 
times slower. Keeping uncaptured data slows things down a bit, so I will 
introduce -debug option. Anyway I think the speed is not going to be an 
issue.
3. Due to the cyclic reference in the GeneStructure object I am removing 
the Transcript->{parent} in the parser. This code should be deleted once 
the Transcript object is fixed.
There are also some other minor issues, but I think I will be able to 
fix them by the end of the week.
Please let me know what you think.
Stefan

From kvddrift at earthlink.net  Thu Mar 31 08:53:33 2005
From: kvddrift at earthlink.net (Koen van der Drift)
Date: Thu Mar 31 08:47:33 2005
Subject: [Bioperl-l] bioperl 1.5.0 on OS X
Message-ID: <f382b386d3866184f772cd040e98dd1b@earthlink.net>

Hi,

Finally had some time to test bioperl 1.5.0 on Mac OS X (10.3.8). Note 
that I use fink to create the package.

I get a few failing tests:

t/DB.........................FAILED tests 29-31
         Failed 3/78 tests, 96.15% okay

t/EMBL_DB....................ok 11/15Use of uninitialized value in hash 
element at t/EMBL_DB.t line 99, <GEN3> line 1.
Use of uninitialized value in hash element at t/EMBL_DB.t line 99, 
<GEN3> line 1.
t/EMBL_DB....................FAILED tests 13-15
         Failed 3/15 tests, 80.00% okay

t/Perl.......................ok 10/14
-------------------- WARNING ---------------------
MSG: id (BUM) does not exist
---------------------------------------------------
t/Perl.......................ok 12/14
-------------------- WARNING ---------------------
MSG: acc (NM_006732) does not exist
---------------------------------------------------
t/Perl.......................FAILED tests 11, 13
         Failed 2/14 tests, 85.71% okay


Also, one test couldn't reach a server:

t/FeatureIO..................ok 1/22
-------------------- WARNING ---------------------
MSG: [1/5] tried to fetch 
http://umn.dl.sourceforge.net/sourceforge/song/sofa.definition, but 
server threw 500.  retrying...

If I disable the tests, the package installs fine.


BTW, what is meant by a developer release? Is this not an official 
release, meaning that it contains some experimental code?


cheers,

- Koen.

From awoolfe at hgmp.mrc.ac.uk  Thu Mar 31 09:54:41 2005
From: awoolfe at hgmp.mrc.ac.uk (Adam Woolfe)
Date: Thu Mar 31 09:50:06 2005
Subject: [Bioperl-l] Retrieving hits in order in SeqIO
Message-ID: <Pine.SOL.4.44.0503311554160.8258-100000@bohrium>

Hi
Im trying to retrieve the top hit from a set of blast results in a
single file using SeqIO.

I assumed that SeqIO processed the hits in the same order as in the input
file (i.e. from hits with the lowest evalue onwards) but Ive been getting
some strange results back where the first hit is actually the last one in
the list:

e.g. the desciption lines of the hits in the input file is as follows:

                                                 Score      E
Sequences producing significant alignments:       (bits)  Value


EM:16 chromosome:NCBI35:16:49081023:50081022:1     277    1e-72
EM:20 chromosome:NCBI35:20:49832988:50832987:1      54    3e-05
EM:4 chromosome:NCBI35:4:103445867:104383982:1      40     0.52
EM:17 chromosome:NCBI35:17:32359244:33357400:1      40     0.52
EM:10 chromosome:NCBI35:10:106096982:107096981:1    40     0.52
EM:10 chromosome:NCBI35:10:77173138:78173137:1      40     0.52


As a test a highly stripped version of the perlscript:
-------------------------------------------------------------
$file = "/path/to/infile.blast";

$in2 = new Bio::SearchIO( -format => 'blast',
                          -file => "$file");

	      while( my $result = $in2->next_result ) {

		  while( my $hit = $result->next_hit ) {

		    while( my $hsp = $hit->next_hsp ) {

                           print "hit:".$hit->name." ".$hit->description;
	                                              }
							}
							  }

------------------------------------------------------------
the output of this is:

hit:EM:10 chromosome:NCBI35:10:77173138:78173137:1
hit:EM:16 chromosome:NCBI35:16:49081023:50081022:1
hit:EM:20 chromosome:NCBI35:20:49832988:50832987:1
hit:EM:4 chromosome:NCBI35:4:103445867:104383982:1
hit:EM:17 chromosome:NCBI35:17:32359244:33357400:1
hit:EM:10 chromosome:NCBI35:10:106096982:107096981:1


Why is it not giving me the results in the correct order? In other
examples ive looked at, the top hit is not always the last (as in this
example) so it seems like something very random is going on.

Could anyone shed any light on this, I'd really appreciate it.

many thanks,

Adam

P.S. Im using Bioperl 1.4 on Solaris9


From amtd9 at umr.edu  Thu Mar 31 10:15:28 2005
From: amtd9 at umr.edu (Mane, Ajay (UMR-Student))
Date: Thu Mar 31 10:10:22 2005
Subject: [Bioperl-l] BL2SEQ
Message-ID: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu>

Thanks for the reply.
 
I have lots of sequence pairs which i want to align. Two sequences at a time. I want to form some statistics on the alignment results. That is about the coding regions in nucleotides. I need to manually look at the coding regions everytime and do some analysis. Instead, i want to run a perl file which runs the bl2seq on blast server, gets the results, formats them to provide some statistics which i am interested in. I do not want to install blast locally on my machine. I just have lots of accession numbers in a file and want to display statistics of all of them in some file/files using a perl file. I have succeeded to a limit using the normal perl, by submitting accession numbers and getting results from blast server using http::request methods. But i want to know how to use Bioperl for this job.
 
Thanks,
Ajay

 
________________________________

From: Barry Moore [mailto:barry.moore@genetics.utah.edu]
Sent: Wed 3/30/2005 2:48 PM
To: Mane, Ajay (UMR-Student)
Subject: Re: [Bioperl-l] BL2SEQ


Ajay-

Not sure what you want to do with your blast results, but I think you'd
be pretty limited in doing much analysis using bioperl from the perl
command line.  You might look at the SEALS package
http://www.ncbi.nlm.nih.gov/CBBresearch/Walker/SEALS/, or repost with
more detail about what it is that you are trying to do.

Barry

Mane, Ajay (UMR-Student) wrote:

>I have mailed earlier to this group, but there was no response. I want to run perl from a command line to get the
>results of NCBI bl2seq tool, which aligns two sequences. Which are the modules to be used ? A reply atleast this time
>would be great.
>
>- Ajay
>
>________________________________
>
>From: bioperl-l-bounces@portal.open-bio.org on behalf of Madeleine Lemieux
>Sent: Wed 3/30/2005 11:26 AM
>To: bioperl-l@portal.open-bio.org
>Subject: [Bioperl-l] BLASTP
>
>
>
>Minyi,
>
>Are you sure both versions of Blast are getting the same e-val? The
>RemoteBlast.pm default is 1e-3 but Perl.pm sets it at 1e-10. I'm not
>sure what the standalone value is.
>
>-Madeleine
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
>

--
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT


From sdavis2 at mail.nih.gov  Thu Mar 31 10:41:38 2005
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu Mar 31 10:35:52 2005
Subject: [Bioperl-l] BL2SEQ
In-Reply-To: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu>
References: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu>
Message-ID: <54f578eea98dc5fc7d15e5a68d99fb00@mail.nih.gov>


On Mar 31, 2005, at 10:15 AM, Mane, Ajay ((UMR-Student)) wrote:

> Thanks for the reply.
>
> I have lots of sequence pairs which i want to align. Two sequences at 
> a time. I want to form some statistics on the alignment results. That 
> is about the coding regions in nucleotides. I need to manually look at 
> the coding regions everytime and do some analysis. Instead, i want to 
> run a perl file which runs the bl2seq on blast server, gets the 
> results, formats them to provide some statistics which i am interested 
> in. I do not want to install blast locally on my machine. I just have 
> lots of accession numbers in a file and want to display statistics of 
> all of them in some file/files using a perl file. I have succeeded to 
> a limit using the normal perl, by submitting accession numbers and 
> getting results from blast server using http::request methods. But i 
> want to know how to use Bioperl for this job.
>

I know you said you do not want to install blast on your machine, but 
if you have many accessions and want to blast all pairs, using local 
blast would be very convenient--just blast all sequences against all 
other sequences.  There are binaries for many platforms, so you 
probably wouldn't even have to build the blast executables.

Sean

From barry.moore at genetics.utah.edu  Thu Mar 31 15:04:05 2005
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu Mar 31 14:58:08 2005
Subject: [Bioperl-l] BL2SEQ
In-Reply-To: <54f578eea98dc5fc7d15e5a68d99fb00@mail.nih.gov>
References: <58AF0CF509606A49B1770AB5DFF811CE11083F@UMR-CMAIL1.umr.edu>
	<54f578eea98dc5fc7d15e5a68d99fb00@mail.nih.gov>
Message-ID: <424C57B5.3040309@genetics.utah.edu>

I agree with Sean.  Installing BLAST locally is really quite easy on 
most platforms (Unix and Window from my experience).  Download the 
binaries from here ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST, 
and get documentation here http://www.ncbi.nlm.nih.gov/BLAST/docs/.  You 
would unpack and install the binaries, run formatdb on a fasta file of 
all your sequences, and then use bioperl to loop over all of your 
sequences and blast each one against the database.  See the module 
documentation for StandAloneBlast here to see how to actually run local 
BLAST with bioperl.  Using bioperl can also help alot with parsing all 
the resulting blast output.  Read some of the HOWTO's at 
http://www.bioperl.org/Core/Latest/modules.html.  Specifically, have a 
look at the SearchIO HOWTO, and if you're new to bioperl have a look at 
Beginners HOWTO and SeqIO HOWTO for starters.  If you really want to 
move forward with remote BLAST see this modules documentation 
http://doc.bioperl.org/releases/bioperl-1.4/Bio/Tools/Run/RemoteBlast.html.  
However, if you have alot of sequences, this will become very slow.  You 
won't like it, and neither will NCBI.

Barry

Sean Davis wrote:

>
> On Mar 31, 2005, at 10:15 AM, Mane, Ajay ((UMR-Student)) wrote:
>
>> Thanks for the reply.
>>
>> I have lots of sequence pairs which i want to align. Two sequences at 
>> a time. I want to form some statistics on the alignment results. That 
>> is about the coding regions in nucleotides. I need to manually look 
>> at the coding regions everytime and do some analysis. Instead, i want 
>> to run a perl file which runs the bl2seq on blast server, gets the 
>> results, formats them to provide some statistics which i am 
>> interested in. I do not want to install blast locally on my machine. 
>> I just have lots of accession numbers in a file and want to display 
>> statistics of all of them in some file/files using a perl file. I 
>> have succeeded to a limit using the normal perl, by submitting 
>> accession numbers and getting results from blast server using 
>> http::request methods. But i want to know how to use Bioperl for this 
>> job.
>>
>
> I know you said you do not want to install blast on your machine, but 
> if you have many accessions and want to blast all pairs, using local 
> blast would be very convenient--just blast all sequences against all 
> other sequences.  There are binaries for many platforms, so you 
> probably wouldn't even have to build the blast executables.
>
> Sean
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT

From Mark.Hoebeke at jouy.inra.fr  Wed Mar 30 01:20:18 2005
From: Mark.Hoebeke at jouy.inra.fr (Hoebeke Mark)
Date: Thu Mar 31 15:42:31 2005
Subject: [Bioperl-l] Hierarchical location parsing
In-Reply-To: <GPENLDEIJJHJLHOAJBBPEEMECDAA.brian_osborne@cognia.com>
References: <GPENLDEIJJHJLHOAJBBPEEMECDAA.brian_osborne@cognia.com>
Message-ID: <1112163618.5683.16.camel@hurd>

Hi Brian,

you are right, I reloaded the Genbank file from :

ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Streptococcus_pyogenes_MGAS315/AE14074.gbk

and indeed, the source feature has changed to an ordinary simple
location. It seems they corrected the original submission : the
modification date now reads "mar 9", whereas the date on the release I
initially fetched read "18 jul 2002" (which happens to be the date
mentioned in the LOCUS descriptor).

I guess this makes parsing hierarchical location descriptors a moot
point until I come up with another example...

Mark


Le mardi 29 mars 2005 ? 08:09 -0500, Brian Osborne a ?crit :
> Mark,
> 
> I didn't see any "join(join..." statements in that Genbank entry, as part of
> a source feature or anywhere else. I'm used this URL:
> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=21909536
> 
> 
> Brian O.
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces@portal.open-bio.org
> [mailto:bioperl-l-bounces@portal.open-bio.org]On Behalf Of Mark Hoebeke
> Sent: Friday, March 25, 2005 3:24 PM
> To: Brian Osborne
> Cc: bioperl-l@portal.open-bio.org
> Subject: RE: [Bioperl-l] Hierarchical location parsing
> 
> 
> Brian,
> 
> an example of a nested location is found in the 'source' feature of the
> Genbank entry having accession AE014074 (Streptococcus pyogenes MGAS315
> complete genome). As the file is over 1 Meg in size once compressed it
> might not be a good idea to attach it to this mail which is CC'ed to
> bioperl-l ;D
> 
> Regarding the performance hit of my fix, I feared that replacing a
> compiled regexp with a split and a loop over every character of the
> string could have a significant impact. As it stands, I timed a simple
> parsing script swallowing Genbank files and spitting out each feature
> location as a GFF string, on 131 complete microbial genomes. There is no
> difference in output between the bioperl-live FTLocationFactory and its
> patched version (basically meaning that this test sample did not contain
> nested locations). The times are comparable, with even a slight
> advantage to the patched version (915.66user 19.53system 15:42.19elapsed
> 99%CPU vs. 938.06user 17.33system 16:04.15elapsed 99%CPU).
> 
> When comparing the outputs of the parser run on a file with a nested
> location, it appears that without the bugfix, the nested location yields
> an incorrect GFF string as shown by the diff below.
> 
> [mark@homer Loc]$ diff MGAS315 MGAS315_patched
> 1c1
> <
> join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301
> 14,join(1271911..1313193,join(1351400..1410541,1450556..1900521),)
> ---
> >
> join(1..749107,join(788646..977266,join(1018339..1137553,join(1171973..12301
> 14,join(1271911..1313193,join(1351400..1410541,1450556..1900521))))))
> 
> I'm still cautious about the bugfix because I only produced the diffs
> on microbial genomes, which probably have simpler location definitions
> that higher eukaryotes.
> 
> Greetings,
> 
> Mark
> 
> Le vendredi 25 mars 2005 ? 11:52 -0500, Brian Osborne a ?crit :
> > Mark,
> >
> > Can you also attach the sequence file that you used in order to test your
> > code? That way I can write a test specifically for the parsing of
> > hierarchical locations.
> >
> > You wrote "I'm not sure the new patch won't slow down location parsing
> > considerably..." Have you actually timed the parsing using the old and new
> > code?
> >
> > Thanks again,
> >
> > Brian O.
> >
> 
> --
> --------------------------Mark.Hoebeke@jouy.inra.fr----------------------
> Unit? Statistique & G?nome                                     Unit? MIG
> +33 (0)1 60 87 38 03                  T?l.          +33 (0)1 34 65 28 85
> +33 (0)1 60 87 38 09                  Fax.          +33 (0)1 34 65 29 01
> Tour Evry 2, 523 pl. des Terrasses             INRA - Domaine de Vilvert
> F - 91000 Evry                             F - 78352 Jouy-en-Josas CEDEX
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
-------------------------Mark.Hoebeke@jouy.inra.fr---------------------
Unit? Statistique & G?nome                                    Unit? MIG
+33 (0)1 60 87 38 03                   T?l.        +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                   Fax.        +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses            INRA - Domaine de Vilvert
F - 91000 Evry                            F - 78352 Jouy-en-Josas CEDEX

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050330/3823412d/attachment-0001.bin
From sallyli97 at yahoo.com  Wed Mar 30 10:44:06 2005
From: sallyli97 at yahoo.com (Sally Li)
Date: Thu Mar 31 15:42:35 2005
Subject: [Bioperl-l] $id && $self->display($id)
Message-ID: <20050330154407.19321.qmail@web53608.mail.yahoo.com>

Hi, there,

I wonder what does it means in the following
statement?

$id && $self->display($id)

This is from the constructor (new ()) of object
PrimarySeq.pm.

Thanks!

Sally.


__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 
From awoolfe at rfcgr.mrc.ac.uk  Thu Mar 31 08:37:55 2005
From: awoolfe at rfcgr.mrc.ac.uk (Adam Woolfe)
Date: Thu Mar 31 15:42:37 2005
Subject: [Bioperl-l] Retrieving hits in order in SeqIO
Message-ID: <Pine.SOL.4.44.0503311416320.8258-100000@bohrium>

Hi
Im trying to retrieve the top hit from a set of blast results in a
single file using SeqIO.

I assumed that SeqIO processed the hits in the same order as in the input
file (i.e. from hits with the lowest evalue onwards) but Ive been getting
some strange results back where the first hit is actually the last one in
the list:

e.g. the desciption lines of the hits in the input file is as follows:

                                                 Score      E
Sequences producing significant alignments:       (bits)  Value


EM:16 chromosome:NCBI35:16:49081023:50081022:1     277    1e-72
EM:20 chromosome:NCBI35:20:49832988:50832987:1      54    3e-05
EM:4 chromosome:NCBI35:4:103445867:104383982:1      40     0.52
EM:17 chromosome:NCBI35:17:32359244:33357400:1      40     0.52
EM:10 chromosome:NCBI35:10:106096982:107096981:1    40     0.52
EM:10 chromosome:NCBI35:10:77173138:78173137:1      40     0.52


As a test a highly stripped version of the perlscript:
-------------------------------------------------------------
$file = "/path/to/infile.blast";

$in2 = new Bio::SearchIO( -format => 'blast',
                          -file => "$file");

	      while( my $result = $in2->next_result ) {

		  while( my $hit = $result->next_hit ) {

		    while( my $hsp = $hit->next_hsp ) {

                           print "hit:".$hit->name." ".$hit->description;
	                                              }
							}
							  }

------------------------------------------------------------
the output of this is:

hit:EM:10 chromosome:NCBI35:10:77173138:78173137:1
hit:EM:16 chromosome:NCBI35:16:49081023:50081022:1
hit:EM:20 chromosome:NCBI35:20:49832988:50832987:1
hit:EM:4 chromosome:NCBI35:4:103445867:104383982:1
hit:EM:17 chromosome:NCBI35:17:32359244:33357400:1
hit:EM:10 chromosome:NCBI35:10:106096982:107096981:1


Why is it not giving me the results in the correct order? In other
examples ive looked at, the top hit is not always the last (as in this
example) so it seems like something very random is going on.

Could anyone shed any light on this, I'd really appreciate it.

many thanks,

Adam

P.S. Im using Bioperl 1.4 on Solaris9

From schuh at farmdale.com  Thu Mar 31 16:14:24 2005
From: schuh at farmdale.com (Mike Schuh)
Date: Thu Mar 31 16:09:38 2005
Subject: [Bioperl-l] $id && $self->display($id)
In-Reply-To: <20050330154407.19321.qmail@web53608.mail.yahoo.com>
Message-ID: <Pine.LNX.4.44.0503311309570.11957-100000@drizzle.com>

Sally,

>I wonder what does it means in the following
>statement?
>
>$id && $self->display($id)

Pretty standard Perl construct.  The value of $id is checked and if it is 
"true" (defined and not zero), then the display method of the current 
object is called.

This is shorthand for

 if(defined($id) && $id) {	# to be slightly pedantic
   $self->display($id);
 }

Similar patterns are used in shell scripts, etc.

--
Mike Schuh -- Seattle, Washington USA
http://www.farmdale.com

From skirov at utk.edu  Thu Mar 31 16:23:36 2005
From: skirov at utk.edu (Stefan Kirov)
Date: Thu Mar 31 16:18:14 2005
Subject: [Bioperl-l] $id && $self->display($id)
In-Reply-To: <20050330154407.19321.qmail@web53608.mail.yahoo.com>
References: <20050330154407.19321.qmail@web53608.mail.yahoo.com>
Message-ID: <424C6A58.2030501@utk.edu>

if $id is defined $self->display($id) is being evaluated, which actually 
sets the display id of $self to be $id
Stefan

Sally Li wrote:

>Hi, there,
>
>I wonder what does it means in the following
>statement?
>
>$id && $self->display($id)
>
>This is from the constructor (new ()) of object
>PrimarySeq.pm.
>
>Thanks!
>
>Sally.
>
>
>		
>__________________________________ 
>Do you Yahoo!? 
>Yahoo! Small Business - Try our new resources site!
>http://smallbusiness.yahoo.com/resources/ 
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

From MEC at Stowers-Institute.org  Thu Mar 31 16:41:42 2005
From: MEC at Stowers-Institute.org (Cook, Malcolm)
Date: Thu Mar 31 16:36:07 2005
Subject: [Bioperl-l] patch to FeatureIO.pm for tied interface
Message-ID: <200503312135.j2VLZZfY020817@portal.open-bio.org>

bioperlers,

The following patch to bioperl-live makes up for what was probably a
copy and paste error and lets FeatureIO work with tied handle interface
too.

I would be happy to have write access to cvs repository for this and
other such patches as discovered....

Cheers,

Malcolm Cook


Index: FeatureIO.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/FeatureIO.pm,v
retrieving revision 1.8
diff -c -r1.8 FeatureIO.pm
*** FeatureIO.pm	18 Jan 2005 05:22:11 -0000	1.8
--- FeatureIO.pm	31 Mar 2005 21:34:33 -0000
***************
*** 507,526 ****
  
  sub TIEHANDLE {
      my ($class,$val) = @_;
!     return bless {'seqio' => $val}, $class;
  }
  
  sub READLINE {
    my $self = shift;
!   return $self->{'seqio'}->next_seq() unless wantarray;
    my (@list, $obj);
!   push @list, $obj while $obj = $self->{'seqio'}->next_seq();
    return @list;
  }
  
  sub PRINT {
    my $self = shift;
!   $self->{'seqio'}->write_seq(@_);
  }
  
  1;
--- 507,526 ----
  
  sub TIEHANDLE {
      my ($class,$val) = @_;
!     return bless {'featio' => $val}, $class;
  }
  
  sub READLINE {
    my $self = shift;
!   return $self->{'featio'}->next_feature() unless wantarray;
    my (@list, $obj);
!   push @list, $obj while $obj = $self->{'featio'}->next_feature();
    return @list;
  }
  
  sub PRINT {
    my $self = shift;
!   $self->{'featio'}->write_feature(@_);
  }
  
  1;

From qfdong at iastate.edu  Thu Mar 31 18:15:23 2005
From: qfdong at iastate.edu (Qunfeng)
Date: Thu Mar 31 18:09:58 2005
Subject: [Bioperl-l] pubmed
Message-ID: <6.1.2.0.2.20050331171052.03830ba8@qfdong.mail.iastate.edu>

Hi there,

http://bioperl.org/HOWTOs/Feature-Annotation/anno_from_genbank.html

I am not very familiar with BioPerl. I tried to follow the example showing 
in the above page to retrieve pubmed ID under each Reference tag , i.e., 
$value->pubmed(), but it doesn't work for me for the seq gi#56961711. The 
authors() works for me.  Appreciate any suggestions.

Qunfeng 

From zhoujie at fudan.edu.cn  Thu Mar 31 22:44:49 2005
From: zhoujie at fudan.edu.cn (zhoujie@fudan.edu.cn)
Date: Thu Mar 31 23:30:13 2005
Subject: [Bioperl-l] Help with taxonomy db
Message-ID: <135a991135d544.135d544135a991@fudan.edu.cn>

Hi all,
Would you please help me with this error message in using local 
taxonomy db?

My test code is here:
#-------------------------------------------------------
use Bio::DB::Taxonomy;
my $db = new Bio::DB::Taxonomy(-source    => 'flatfile',
			       -nodesfile => 'nodes.dmp',
			       -namesfile => 'names.dmp',
			       -directory => 'index');

my $id = $db->get_taxonid('Homo sapiens');
print "id is $id for Homo sapiens\n";
#-------------------------------------------------------

The code generates three files in the index 
directory: 'nodes','names2id','id2names'.

but after that I get an error message:

------------- EXCEPTION  -------------
MSG: No such file or directory index/nodes
STACK Bio::DB::Taxonomy::flatfile::_db_connect 
c:/Perl/site/lib/Bio\DB\Taxonomy\
flatfile.pm:325
STACK Bio::DB::Taxonomy::flatfile::new 
c:/Perl/site/lib/Bio\DB\Taxonomy\flatfile
.pm:138
STACK Bio::DB::Taxonomy::new c:/Perl/site/lib/Bio/DB/Taxonomy.pm:104
STACK toplevel local_taxonomy_query.pl:10
--------------------------------------

I'm quite confused with this error, because the nodes file is just in 
there, but why "No such file"?

Can anyone tell me what happening? Any suggestion is appreciated.

J Z