From cuiw at ncbi.nlm.nih.gov  Thu Feb  1 09:47:38 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Thu, 1 Feb 2007 09:47:38 -0500
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov>

This is a simple test from gene ID 3632373 (protein is 46100068) to
contig coordinates: 

perl -MLWP::Simple -e 'map {print $_, "\n" if
/<(Gene-source_src.*?>)(.*)?<$1/} (split "\n",
get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i
d=3632373&retmode=xml}))'

You need to translate protein id to gene id though. 

If the genome is available at Map Viewer, try (the contig name is
NW_101115 from last step)
http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA
PS=genes&cmd=txt

Wenwu Cui, PhD

-----Original Message-----
From: Rainer Machne [mailto:raim at tbi.univie.ac.at] 
Sent: Wednesday, January 31, 2007 4:10 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Dear Bioperl list,

hoping not be on the wrong email list, i would have a short question:

Is there a standard way or are there nice (Bioperl) tools to come from a

gene id (gi) other ids (see below) to the genomic coordinates of the 
respective gene?

We have Fasta files retrieved from NCBI protein Blast in fungal genomes:

 >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago 
maydis 521]
or
 >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida
glabrata]

(we only have gi, ref and gb in my set).

I retrieved all my fasta files from whole fungal genomes with available 
protein sequences at
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi

As I only searched whole finished genomes (not shotgun), I thought it 
would then be easy to get the genomic coordinates and retrieve upstream 
sequences, but we have failed so far to find a consistent way to do this

automatically. Many of the gi entries refer to mRNAs or partial mRNAs 
and the way to the coordinates seems to differ for each case.

Any suggestions would be appreciated.

with kind regards,
Rainer Machne

University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From raim at tbi.univie.ac.at  Thu Feb  1 07:54:21 2007
From: raim at tbi.univie.ac.at (Rainer Machne)
Date: Thu, 01 Feb 2007 13:54:21 +0100
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at>

Barry and Jason,

thanks for your quick and very helpful replies.

I guess we should have done (or repeat) our blast search at 
http://fungal.genome.duke.edu/
to get better mapping from proteins to genomes ?

As I retrieved all my proteins via whole genome blasts we should find 
(most of) them in the genbank files ... a good opportunity for me to 
learn some Bioperl and the other packages you mentioned in case we want 
to do more complex analysis later :-)

Thank you very much!

Rainer


Barry Moore wrote:
> Rainer,
> 
> We use a perl library called CGL written by Mark Yandell and  colleagues 
> (which in turn uses Chris Mungal's BioChaos and  Unflattener.pm referred 
> to by Jason) for this type of task.  The  basic pipeline is convert 
> GenBank files to Chaos XML, then use CGL  with those XML files to get a 
> nice object oriented access to exons,  transcripts, proteins, 
> coordinates and more for of those genes.  I am  currently using this 
> with good success on most GenBank genomes  (unfortunately I haven't been 
> working with the fungal genomes, but it  should work fine).  The Ensembl 
> API provides similar functionality  for Ensembl genomes - but not very 
> many fungi there.
> 
> http://www.yandell-lab.org/cgl/
> http://www.ensembl.org/info/software/core/core_tutorial.html
> 
> Feel free to contact Mark or myself  directly if you are interested  in 
> using CGL.
> 
> Barry
> 
> On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:
> 
>> Dear Bioperl list,
>>
>> hoping not be on the wrong email list, i would have a short question:
>>
>> Is there a standard way or are there nice (Bioperl) tools to come  from a
>> gene id (gi) other ids (see below) to the genomic coordinates of the
>> respective gene?
>>
>> We have Fasta files retrieved from NCBI protein Blast in fungal  genomes:
>>
>>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
>>
>> maydis 521]
>> or
>>
>>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida  glabrata]
>>
>>
>> (we only have gi, ref and gb in my set).
>>
>> I retrieved all my fasta files from whole fungal genomes with  available
>> protein sequences at
>> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
>>
>> As I only searched whole finished genomes (not shotgun), I thought it
>> would then be easy to get the genomic coordinates and retrieve  upstream
>> sequences, but we have failed so far to find a consistent way to do  this
>> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
>> and the way to the coordinates seems to differ for each case.
>>
>> Any suggestions would be appreciated.
>>
>> with kind regards,
>> Rainer Machne
>>
>> University of Vienna
>> Department for Theoretical Chemistry
>> Theoretical Biochemistry Group
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu Feb  1 12:55:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 11:55:27 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
Message-ID: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>


On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:

> Barry and Jason,
>
> thanks for your quick and very helpful replies.
>
> I guess we should have done (or repeat) our blast search at
> http://fungal.genome.duke.edu/
> to get better mapping from proteins to genomes ?
>
> As I retrieved all my proteins via whole genome blasts we should find
> (most of) them in the genbank files ... a good opportunity for me to
> learn some Bioperl and the other packages you mentioned in case we  
> want
> to do more complex analysis later :-)
>
> Thank you very much!
>
> Rainer

If the data is available in GenBank you could run the BLAST searches  
at NCBI and limit the search with an Entrez query:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query

Most (all?) genome files are tagged as complete

I'm not sure but there might be a way of doing this via  
Bio::Tools::Run::RemoteBlast.  Jason, any ideas?

chris

From cjfields at uiuc.edu  Thu Feb  1 13:09:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 12:09:16 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu>

> If the data is available in GenBank you could run the BLAST searches
> at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete

sorry, didn't finish that...

"Most (all?) genome files are tagged as complete, wgs, in progress,  
etc. and can be limited by taxonomy using Fungi[ORGN] or similar."

chris

From jason at bioperl.org  Thu Feb  1 13:36:02 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 10:36:02 -0800
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <D8E2FDBC-AA2E-4EB9-8CB1-F3610776B41C@bioperl.org>


On Feb 1, 2007, at 9:55 AM, Chris Fields wrote:

>
> On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:
>
>> Barry and Jason,
>>
>> thanks for your quick and very helpful replies.
>>
>> I guess we should have done (or repeat) our blast search at
>> http://fungal.genome.duke.edu/
>> to get better mapping from proteins to genomes ?
>>

Well I'm not quite sure of your exact goals.  To find upstream  
regions of known genes, or look at upstream regions of orthologous  
genes?

You can first figure out orthologs based on protein similarities,  
then go in an extract upstream regions for the orthologous genes (I  
provide a link to a big all-vs-all FASTA result at the bottom of the  
page if you want those results, as well as some pairiwise orthology  
assignments, although you may want more or less stringent parameters).

All the GFF and AA data is freely available for download on the site  
for each genome we've annotated or for annotation we've re-formatted  
so you can do things locally and/or modify it to your liking.


>> As I retrieved all my proteins via whole genome blasts we should find
>> (most of) them in the genbank files ... a good opportunity for me to
>> learn some Bioperl and the other packages you mentioned in case we  
>> want
>> to do more complex analysis later :-)
>>
>> Thank you very much!
>>
>> Rainer
>
> If the data is available in GenBank you could run the BLAST  
> searches at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete
>
> I'm not sure but there might be a way of doing this via  
> Bio::Tools::Run::RemoteBlast.  Jason, any ideas?
>
> chris

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From reenayadav at gmail.com  Thu Feb  1 13:38:03 2007
From: reenayadav at gmail.com (Reena Yadav)
Date: Fri, 2 Feb 2007 00:08:03 +0530
Subject: [Bioperl-l] pdb parser
Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com>

hi need to extract pdb atomic coordinates (1ake), and do certain
calculations.
i am going stepwise:
steps that involved are:
(1) reading the atomic coordinates
(2) read the result in a file.

need to understand how to whole xyz line in another file.
could someone help.
R.

From jason at bioperl.org  Thu Feb  1 08:06:42 2007
From: jason at bioperl.org (sandhya khatal)
Date: Thu, 1 Feb 2007 13:06:42 +0000
Subject: [Bioperl-l] Regarding Bioperl program
Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com>

Respected Sir,
                      I want to do a program which gives dendrogram like
UPGMA a clustering method, but i want this dendrogram by using single
linkage or centroid method.Can u help me for this.U have given the  
code for
tree but i want dendrogram as output by using above any method.

Thanks for anticipating.

Regards,
Sandhya Khatal.

From jason at bioperl.org  Thu Feb  1 19:55:26 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 16:55:26 -0800
Subject: [Bioperl-l] Fwd: Regarding Bioperl program
References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com>
Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org>

re-forwarding Sandhya's email to the list so the email address is  
visible.

The approach that is coded in bioperl is for distance based data such  
as evolutionary distance of DNA or protein sequences - I assume you  
are talking about clustering expression data? You may want to focus  
on the available literature and toolkits that focus on expression  
data - something BioPerl doesn't deliberately focus on right now.

-jason
Begin forwarded message:

> From: "sandhya khatal" <sandhya.khatal at gmail.com>
> Date: February 1, 2007 5:06:42 AM PST
> To: jason at bioperl.org
> Subject: Regarding Bioperl program
>
> Respected Sir,
>                      I want to do a program which gives dendrogram  
> like
> UPGMA a clustering method, but i want this dendrogram by using single
> linkage or centroid method.Can u help me for this.U have given the  
> code for
> tree but i want dendrogram as output by using above any method.
>
> Thanks for anticipating.
>
> Regards,
> Sandhya Khatal.

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From lzhtom at hotmail.com  Thu Feb  1 22:20:10 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:20:10 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F24A936E35D7C6B9059EE3CC79B0@phx.gbl>


_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From lzhtom at hotmail.com  Thu Feb  1 22:27:39 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:27:39 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>

Sorry guys, the former empty mail was sent out by mistake.

I'm using Bio::index::Fasta to index a file containing lots of sequences in 
fasta format. All is fine except one thing.

According to the bioperl tutorial and the documents, the following code 
will make a indexed file:

my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
                                     -write_flag => 1);
    $inx->make_index("test.fasta");

And in another script I can access the indexed file by sayinig

$ENV{BIOPERL_INDEX} = "."; # find index in current directory
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
my $seq=$inx->fetch("ent1001");        #fetch the sequence named ent1001

However, after running the first script, I cannot find a new file 
test.fasta.idx in my current directory. And not surprisingly, when I ran 
the second script, perl told me it couldn't find "test.fasta.idx".

What's going on here?

Thanks a lot!

_________________________________________________________________
???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  


From jason at bioperl.org  Fri Feb  2 01:24:44 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 22:24:44 -0800
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
References: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
Message-ID: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>

I don't think BIOPERL_INDEX does anything in the module so that  
documentation is not quite right.  the env variable is used in the  
scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job  
went bad somewhere.

you need to specify the full path you want with -filename - you can  
just prepen the BIOPERL_INDEX to the filename like.
-filename => $ENV{BIOPERL_INDEX}."/$index"

-jason
On Feb 1, 2007, at 7:27 PM, zhihua li wrote:

> Sorry guys, the former empty mail was sent out by mistake.
>
> I'm using Bio::index::Fasta to index a file containing lots of  
> sequences in fasta format. All is fine except one thing.
>
> According to the bioperl tutorial and the documents, the following  
> code will make a indexed file:
>
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
>                                     -write_flag => 1);
>    $inx->make_index("test.fasta");
>
> And in another script I can access the indexed file by sayinig
>
> $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> my $seq=$inx->fetch("ent1001");        #fetch the sequence named  
> ent1001
>
> However, after running the first script, I cannot find a new file  
> test.fasta.idx in my current directory. And not surprisingly, when  
> I ran the second script, perl told me it couldn't find  
> "test.fasta.idx".
>
> What's going on here?
>
> Thanks a lot!
>
> _________________________________________________________________
> ?????????????? MSN Messenger:  http:// 
> messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From marian.thieme at lycos.de  Fri Feb  2 05:06:09 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 10:06:09 +0000
Subject: [Bioperl-l] seqDiff
Message-ID: <101051013116870@lycos-europe.com>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/cb3feed1/attachment.html 

From marian.thieme at lycos.de  Fri Feb  2 06:37:05 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 11:37:05 +0000
Subject: [Bioperl-l] susp. header
Message-ID: <188661178024725@lycos-europe.com>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/d3c3535c/attachment.html 

From lubapardo at gmail.com  Fri Feb  2 09:31:06 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 2 Feb 2007 15:31:06 +0100
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>

Hello, (I am using bioperl-1.5.2_100, linux machine)
I am trying to get the ids of a list of genes using the module
Bio::DB::Query:GenBank. I have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n";
my @a1=<READER_1>;
close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives me
the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards,
Luba Pardo

From hlapp at gmx.net  Fri Feb  2 10:44:02 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:44:02 -0500
Subject: [Bioperl-l] susp. header
In-Reply-To: <188661178024725@lycos-europe.com>
References: <188661178024725@lycos-europe.com>
Message-ID: <EE6A34C7-0579-487E-B529-1F82E714793D@gmx.net>

You are sending HTML emails. You should configure your mailer to  
ideally just send plain text. If you really must have fancy formatted  
emails (i.e., HTML-formatted emails), then configure it such that the  
mailer will send a plain text and a HTML version.

(Many spam filters will flag email the body of which consists of only  
an HTML attachment.)

	-hilmar

On Feb 2, 2007, at 6:37 AM, marian thieme wrote:

> why each message I sent to this list is considered to have a susp.  
> header ?
>
> Marian
>
>  Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit  
> 20 Singles aus Ihrer Umgebung.Meetic.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 11:03:16 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 11:03:16 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <1170432196.2706.661.camel@localhost.localdomain>

Hi Hilmar,

That is a good idea; when I started down this road, it felt like there
would only be a few things that I might want to allow to be different,
but I think you are right that having one standard implementation that
can be subclassed for legacy systems is a good thing.

Scott


On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> 
> > The second main change was to introduce a -flybase_compat argument  
> > when
> > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> > (that are compatable with flybase) will be used, but now the default
> > will be to use current standards:
> 
> Just my $0.02 ... obviously, Flybase may be the only organization  
> that uses an 'old style' or any other way not compliant with 'current  
> standards' (presumably SO), but if it's not the only one then this  
> approach won't scale.
> 
> Also, an argument -flybase_compat suggests to the unsuspecting that  
> this is an endorsed flavor of the standard and fine to use for  
> everyone else too.
> 
> If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
> compliant with the standard as we all want it, keep it free from  
> litter caused by usage of old versions of SO, and create a second  
> module fb-chadoxml.pm that inherits from the first and merely  
> overrides a few things so that it works for Flybase. This way, other  
> organizations with similar needs can follow the path and create their  
> own xyz-chadoxml.pm, rather than having to muck around in the  
> chadoxml.pm that comes with the distribution.
> 
> I'm not sure I fully grasp the underlying issue, so I may not make  
> much sense here. Apologies if that's the case ...
> 
> 	-hilmar
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/2488afc4/attachment.bin 

From bosborne11 at verizon.net  Fri Feb  2 10:27:44 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 02 Feb 2007 10:27:44 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <C1E8C2A0.C967%bosborne11@verizon.net>

Hilmar,

I second your motion, good idea. Let's keep the standard module nice and
clean.

Brian O.


On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

> and create a second
> module fb-chadoxml.pm that inherits from the first and merely
> overrides a few things so that it works for Flybase


From Kevin.M.Brown at asu.edu  Fri Feb  2 10:52:20 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 2 Feb 2007 08:52:20 -0700
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu>

It looks like you have some problems with the code you posted.

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i < @a1;$i++ ) {
        
# is this necessary as you don't seem to use it anywhere later in your
code.
my @a1_s=split/\s+/,$a1[$i];

# you enclosed the variable in '' which means perl won't evaluate it
# changed the query so that perl can evaluate the variable
my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; 
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo
Sent: Friday, February 02, 2007 7:31 AM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;

Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get
the ids of a list of genes using the module Bio::DB::Query:GenBank. I
have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives
me the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards, Luba Pardo _______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Feb  2 11:37:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 10:37:49 -0600
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>

I was going to suggest maybe allowing one to switch out XML handlers/ 
writers based on the style (ala XML::SAX), but I see that chadoxml  
currently uses XML::Writer and there is no next_seq() implemented.   
Oh well...

chris

On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:

> Hi Hilmar,
>
> That is a good idea; when I started down this road, it felt like there
> would only be a few things that I might want to allow to be different,
> but I think you are right that having one standard implementation that
> can be subclassed for legacy systems is a good thing.
>
> Scott
>
>
> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>
>>> The second main change was to introduce a -flybase_compat argument
>>> when
>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>> cvterms
>>> (that are compatable with flybase) will be used, but now the default
>>> will be to use current standards:
>>
>> Just my $0.02 ... obviously, Flybase may be the only organization
>> that uses an 'old style' or any other way not compliant with 'current
>> standards' (presumably SO), but if it's not the only one then this
>> approach won't scale.
>>
>> Also, an argument -flybase_compat suggests to the unsuspecting that
>> this is an endorsed flavor of the standard and fine to use for
>> everyone else too.
>>
>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>> compliant with the standard as we all want it, keep it free from
>> litter caused by usage of old versions of SO, and create a second
>> module fb-chadoxml.pm that inherits from the first and merely
>> overrides a few things so that it works for Flybase. This way, other
>> organizations with similar needs can follow the path and create their
>> own xyz-chadoxml.pm, rather than having to muck around in the
>> chadoxml.pm that comes with the distribution.
>>
>> I'm not sure I fully grasp the underlying issue, so I may not make
>> much sense here. Apologies if that's the case ...
>>
>> 	-hilmar
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Fri Feb  2 11:45:30 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 11:45:30 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>

There must be at least a stub for next_seq(). It may throw a not- 
implemented exception, but it should not just be absent.

	-hilmar

On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:

> I was going to suggest maybe allowing one to switch out XML  
> handlers/writers based on the style (ala XML::SAX), but I see that  
> chadoxml currently uses XML::Writer and there is no next_seq()  
> implemented.  Oh well...
>
> chris
>
> On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
>
>> Hi Hilmar,
>>
>> That is a good idea; when I started down this road, it felt like  
>> there
>> would only be a few things that I might want to allow to be  
>> different,
>> but I think you are right that having one standard implementation  
>> that
>> can be subclassed for legacy systems is a good thing.
>>
>> Scott
>>
>>
>> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>>
>>>> The second main change was to introduce a -flybase_compat argument
>>>> when
>>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>>> cvterms
>>>> (that are compatable with flybase) will be used, but now the  
>>>> default
>>>> will be to use current standards:
>>>
>>> Just my $0.02 ... obviously, Flybase may be the only organization
>>> that uses an 'old style' or any other way not compliant with  
>>> 'current
>>> standards' (presumably SO), but if it's not the only one then this
>>> approach won't scale.
>>>
>>> Also, an argument -flybase_compat suggests to the unsuspecting that
>>> this is an endorsed flavor of the standard and fine to use for
>>> everyone else too.
>>>
>>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>>> compliant with the standard as we all want it, keep it free from
>>> litter caused by usage of old versions of SO, and create a second
>>> module fb-chadoxml.pm that inherits from the first and merely
>>> overrides a few things so that it works for Flybase. This way, other
>>> organizations with similar needs can follow the path and create  
>>> their
>>> own xyz-chadoxml.pm, rather than having to muck around in the
>>> chadoxml.pm that comes with the distribution.
>>>
>>> I'm not sure I fully grasp the underlying issue, so I may not make
>>> much sense here. Apologies if that's the case ...
>>>
>>> 	-hilmar
>> -- 
>> --------------------------------------------------------------------- 
>> ---
>> Scott Cain, Ph. D.                                    
>> cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 12:02:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 12:02:32 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
	<3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
Message-ID: <1170435752.2706.676.camel@localhost.localdomain>

Ah, I'll go ahead and add one, though it will just throw an exception
because this is a write-only adapter.

Scott


On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote:
> There must be at least a stub for next_seq(). It may throw a not- 
> implemented exception, but it should not just be absent.
> 
> 	-hilmar
> 
> On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:
> 
> > I was going to suggest maybe allowing one to switch out XML  
> > handlers/writers based on the style (ala XML::SAX), but I see that  
> > chadoxml currently uses XML::Writer and there is no next_seq()  
> > implemented.  Oh well...
> >
> > chris
> >
> > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
> >
> >> Hi Hilmar,
> >>
> >> That is a good idea; when I started down this road, it felt like  
> >> there
> >> would only be a few things that I might want to allow to be  
> >> different,
> >> but I think you are right that having one standard implementation  
> >> that
> >> can be subclassed for legacy systems is a good thing.
> >>
> >> Scott
> >>
> >>
> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> >>>
> >>>> The second main change was to introduce a -flybase_compat argument
> >>>> when
> >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
> >>>> cvterms
> >>>> (that are compatable with flybase) will be used, but now the  
> >>>> default
> >>>> will be to use current standards:
> >>>
> >>> Just my $0.02 ... obviously, Flybase may be the only organization
> >>> that uses an 'old style' or any other way not compliant with  
> >>> 'current
> >>> standards' (presumably SO), but if it's not the only one then this
> >>> approach won't scale.
> >>>
> >>> Also, an argument -flybase_compat suggests to the unsuspecting that
> >>> this is an endorsed flavor of the standard and fine to use for
> >>> everyone else too.
> >>>
> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
> >>> compliant with the standard as we all want it, keep it free from
> >>> litter caused by usage of old versions of SO, and create a second
> >>> module fb-chadoxml.pm that inherits from the first and merely
> >>> overrides a few things so that it works for Flybase. This way, other
> >>> organizations with similar needs can follow the path and create  
> >>> their
> >>> own xyz-chadoxml.pm, rather than having to muck around in the
> >>> chadoxml.pm that comes with the distribution.
> >>>
> >>> I'm not sure I fully grasp the underlying issue, so I may not make
> >>> much sense here. Apologies if that's the case ...
> >>>
> >>> 	-hilmar
> >> -- 
> >> --------------------------------------------------------------------- 
> >> ---
> >> Scott Cain, Ph. D.                                    
> >> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)                      
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/9acaa3c3/attachment.bin 

From peili at morgan.harvard.edu  Fri Feb  2 10:56:56 2007
From: peili at morgan.harvard.edu (Peili Zhang)
Date: Fri, 02 Feb 2007 10:56:56 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <C1E8C2A0.C967%bosborne11@verizon.net>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
Message-ID: <1170431816.6583.47.camel@jacks>

i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
because i wrote it for fb's data loading task. no need to worry about
flybase compatibility in making the module generic. in fact, at flybase,
i tweak the module frequently to make it work for different scenarios.

cheers,
peili
 
On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> Hilmar,
> 
> I second your motion, good idea. Let's keep the standard module nice and
> clean.
> 
> Brian O.
> 
> 
> On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> 
> > and create a second
> > module fb-chadoxml.pm that inherits from the first and merely
> > overrides a few things so that it works for Flybase
> 
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> 

From cain.cshl at gmail.com  Fri Feb  2 13:05:47 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 13:05:47 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170431816.6583.47.camel@jacks>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
	<1170431816.6583.47.camel@jacks>
Message-ID: <1170439549.2706.683.camel@localhost.localdomain>

Hi Peili,

A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is
fairly simple.  My suggestion is that when you make tweaks for different
scenarios, that you turn the things you are tweaking into methods in
BSIO::chadoxml and then override them in flybase_chadoxml (and commit at
least the chadoxml module) to make it more flexible when other people
have similar scenarios.

Scott


On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote:
> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
> because i wrote it for fb's data loading task. no need to worry about
> flybase compatibility in making the module generic. in fact, at flybase,
> i tweak the module frequently to make it work for different scenarios.
> 
> cheers,
> peili
>  
> On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> > Hilmar,
> > 
> > I second your motion, good idea. Let's keep the standard module nice and
> > clean.
> > 
> > Brian O.
> > 
> > 
> > On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> > 
> > > and create a second
> > > module fb-chadoxml.pm that inherits from the first and merely
> > > overrides a few things so that it works for Flybase
> > 
> > 
> > 
> > -------------------------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services, security?
> > Get stuff done quickly with pre-integrated technology to make your job easier.
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmod-schema at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/a6d23204/attachment.bin 

From cjfields at uiuc.edu  Fri Feb  2 15:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 14:33:46 -0600
Subject: [Bioperl-l] seqDiff
In-Reply-To: <101051013116870@lycos-europe.com>
References: <101051013116870@lycos-europe.com>
Message-ID: <C752CE9D-61A7-4DF2-958E-7162723D0BA9@uiuc.edu>

Judging by the code you'll have to recreate the SeqDiff while  
iterating through various alleles; there is no method to remove  
particular variants or purge them (at least I couldn't find one).

I also noticed SeqDiff doesn't support deletions/insertions either;  
using a null allele (no seq) or leaving out either the mutant or  
original allele leads to errors.  I'll look into the latter, and I  
may try to add a method to at least purge variants and reset dna_mut().

chris

On Feb 2, 2007, at 4:06 AM, marian thieme wrote:

> HI,
>
> is there a way to put out all mutated sequences of a seqdiff object ?
> Suppose I add some variants via:
>
> $dnamut->add_Allele($a2);
> $dnamut->add_Allele($a3);
> $seqDiff->add_Variant($dnamut);
>
> and afterwards want to access the alternative sequences via
> $seqDiff->dna_mut()
>
> which allele is choosen when using dna_mut(), respective can I  
> control to access the first or the second alternate sequence ?
> If yes, how can I do this ?
>
> Regards,
> Marian
>
> Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme  
> Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die  
> Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf  
> www.spain.info
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From MEC at stowers-institute.org  Fri Feb  2 16:47:08 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 2 Feb 2007 15:47:08 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and
	annotations
Message-ID: <CED81D34E37D5043A1211565277A51E50768EDB3@exchkc02.stowers-institute.org>

Lincoln,
 
I don't think that adding this directive is a good idea after all
either.
 
But, I see that you remap the ID= to a load_id attribute which is
preserved in the Bio::DB::SeqFeatureStore database.
 
And then it gets squelched during GFF production by
NormalizedFeature::format_attributes.
 
However, if ID is prone to clashes, then certainly simply renaming the
attribute to be load_id does not preclude clashes from happening, and
only courts disaster.  Don't you think?
 
I'm a little blurry on the GFF3Loader, but it looks like you're using
load_id to facilitate loading parent/child features out of order.  Is
that right?  If so, I suggest you delete all load_id features
immediately after performing a load.  It has not further use.
 
Or, you might consider instead of `round-trip-ids` directive, rather,
give the GFF3Loader  an IDAttribute option which would allow the use of
the loader to preserve the ID values, but to use a named
 
In my case, munging flybase gff,  I would then use it like this:
 
bp_seqfeature_load.PLS --fast --IDAttribute flybaseID
 
which would preserve the ID values in the database but under the
FlybaseID attribute for features so loaded.
 
---------------------------------------------
 
On a related topic:
I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature

_create_subfeatures : ensure that subfeatures get the `source` of their
parent

While doing this I wonder: what is the -class that subfeatures are
getting from their parent...??? I left it in place. Please advise! Fix
my thinking....

----------------------------------------------

Further, I observe that Bio::Graphics::FeatureBase::new handles the
-segments option is to call add_segment.  So, when I create a new
Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the
-segments option gets handled by Bio::Graphics::FeatureBase::new, which,
as mentioned, calls add_segment. The surprising thing to me when thrying
to trace through the class modules and understand what is going on is
that what gets run at this point is not
Bio::Graphics::FeatureBase::add_segment, but rather
Bio::DB::SeqFeature::add_segment, whose semantics is different in at
least one regard, namely, that it does not set the start and stop of the
parent feature from the min and max of the segments.

I have committed a patch to Bio::Graphics::FeatureBase with a comment to
this effect, and have also patched it's add_segment method to copy the
parent's source into the segment.

I hope my commits and suggestions further the cause.  Let me know if
not!
 
-- Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Tuesday, January 30, 2007 4:46 PM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature treamtent of tags and
annotations
	
	
	I've fixed the first issue in CVS. Sorry for the inconsistency.
add_tag_value(), delete_tag_value() and get_Annotations() now all work
as expected.
	
	The problem with the ID column is that it is supposed to be
LOCAL to the GFF3 file and is not intended to be stored in the database.
In contrast, Name can survive roundtripping. Perhaps the thing to do is
to add a flag to the GFF3 file that turns on ID round-tripping, e.g.
	
	##round-trip-ids: 1
	
	If you like this idea, I can implement it.
	
	Lincoln
	
	
	On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln,
		 
		Thanks for your suggestions on approach to my problems
augmenting Flybase annotation.  I am trying to follow them and finding
the following oddities
		 
		The first issue relates to the intermix of 'annotations'
and 'tag values'.  I find that Bio::DB::SeqFeature implements some of
the 'tag' methods and some of the 'Annotation' methods.  Here is a perl
one-liner that shows values stored using add_tag_value are not retreived
with get_tag_values, but rather with get_Annotations.
		 
		> perl -MBio::DB::SeqFeature -e 'my
$f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print
"get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" .
$f->get_Annotations("x");'
		 
		whose output is:
		get_tag_values: 
		get_Annotations:    666
		 
		Tracing this shows me that this results from the fact
that:
		 
		
		Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase
(via Bio::DB::SeqFeature::NormalizedFeature) which does not support
-tags in ->new but rather -attributes, viz:
		 
		
		  -attributes   a hashref of tag value attributes, in
which the key is the tag
		                  and the value is an array reference of
values
		 
		
		And though Bio::Graphics::FeatureBase purports to
implement Bio::SeqFeatureI, it only partially implements the  'tag'
methods (now deprecated and relegated to Bio::AnnotatableI).  In
particular, the '*' methods Bio::SeqFeatureI are not implemented in
Bio::Graphics::FeatureBase 

		  has_tag
		*  add_tag_value
		  get_tag_values
		  get_all_tags
		*  remove_tag
		  get_tagset_values
		  get_Annotations

		As a result, add_tag_value and remove_tag are inherited
from different modules whose understanding of tags is not the same!

		This one-liner :

		>perl -MClass::ISA -MClass::Inspector
-MBio::DB::SeqFeature -e 'my @c =
Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn
qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep
{Class::Inspector->function_exists($_, $fn)} @c)}'

		confirms that they are defined in different packages,
namely:

		add_tag_value: Bio::AnnotatableI 
		get_tag_values: Bio::Graphics::FeatureBase
Bio::AnnotatableI

		
		Proposed solution...  hmmmm ..... I dunno.... maybe the
following patch to Bio::Graphics::FeatureBase->add_tag_value :
		 
		sub add_tag_value {
		  my ($self,$tag, at vals) = @_;
		  push @{$self->{attributes}{$tag}}, @vals;
		}
		
		
		It fixes my use case for now but I'm still concerned and
confused about this variety of methods.  
		 
		Suggestions?
		 

------------------------------------------------------------------------
-

		Also, I think that any "ID" in column 9 of GFF3 float
file should be preserved through a round-trip through a
Bio::DB::SeqFeature store, but this is not yet possible since any ID
attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to
locally patch GFF3Loader::handle_feature method to add the following:

		  # mec at stowers-institute.org
<mailto:mec at stowers-institute.org>  , wondering why not all attributes
are
		  # carried forward, adds ID tag in particular service
of
		  # round-tripping ID, which, though present in database
as load_id
		  # attribute, was getting lost as itself
		  $unreserved->{ID}= $reserved->{ID}     if exists
$reserved->{ID}; 

		Poised to patch.... what d'you think?

		Malcolm Cook
		Stowers Institute for Medical Research - Kansas City,
Missouri
		  

________________________________

			From: lincoln.stein at gmail.com [mailto:
lincoln.stein at gmail.com <mailto:lincoln.stein at gmail.com> ] On Behalf Of
Lincoln Stein
			Sent: Tuesday, December 19, 2006 3:58 PM
			To: Cook, Malcolm
			Cc: bioperl list; lstein at cshl.org
			Subject: Re: bp_seqfeature_load /
Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase
annotation
			
			
			Hi Malcom,
			
			Your second guess was right. The use case of
augmenting an existing gene with additional splice forms isn't provided
for. You can get the functionality by making direct calls to
Bio::DB::SeqFeature::Store methods:
			
			my @genes =
$db->get_features_by_name('FBgn0017545');
			@genes == 1 or die "Didn't get exactly one
gene";
			my $parent = $genes[0];
			
			my $parent = $genes[0];
			my $chr    = $parent->seq_id;
			my $start  = $parent->start;
			my $end    = $parent->end;
			my $strand = $parent->strand;
			
			my $new_splice_form =
$db->new_feature(-primary_tag => 'mRNA',
			                       -source      => 'added',
			                       -seq_id   => '4r',
			                       -strand   => $strand,
			                       -start    => $start+10,
			                       -end      => $end,
			                       );
			$parent->add_SeqFeature($new_splice_form);
			
			for my $pos
([$start+10,$start+100],[$start+200,$end]) {
			  my ($e_start,$e_end) = @$pos;
			  my $exon =
Bio::DB::SeqFeature->new(-primary_tag => 'exon',
			                                      -store
=> $db,
			                      -seq_id      => '4r',
			                      -strand     => $strand,
			                      -start       => $e_start,
			                      -end         => $e_end);
			  $new_splice_form->add_SeqFeature($exon);
			}
			
			I found a bug in updating the seqfeature
database when I wrote this script, so you'll have to get the latest
biperl live. I think you can use this to write a splice form updating
script.
			
			In order to support the idea of adding new
splice forms to an existing gene using the GFF3 format, I will have to
either modify the loader, or write a separate script (probably better to
do the latter). It shouldn't be hard if you'd like to give it a try.
			
			Lincoln
			
			
			On 12/19/06, Cook, Malcolm
<MEC at stowers-institute.org <mailto:MEC at stowers-institute.org>  > wrote: 

				Lincoln and fellow Bio::DB::SeqFeature
travelers,
				
				I find that using bp_seqfeature_load.PLS
to load subfeatures of genes
				already loaded using
bp_seqfeature_load.PLS fails with
				
				------------- EXCEPTION  ------------- 
				MSG: FBgn0017545 doesn't have a primary
id
				STACK
	
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load_fh 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
				STACK toplevel
	
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo

				ad.PLS:76
				
				Where FBgn0017545 is the ID of a gene
previously loaded.
				
				I am unsure how to remedy my situation
and welcome any advise on correct
				or improved approach to my problem.
				
				Here's some detail if it helps.  I am
developing a pipeline to design a 
				microarray probes capable of
distinguishing among splice variants in
				drosophila (using latest Flybase
dmel_r5.1 annotation).  So I
				
				1) load a filtered selection of Flybase
annotation using
				bp_seqfeature_load.  (for testing
purposes, I am using a single gene's 
				worth of annotation, FBgn0017545.gff,
attached).  This is done as
				follows:
				
				        > bp_seqfeature_load.PLS
--create FBgn0017545.gff
				
				2) analyze all the genes in the
database, and create GFF3 output each 
				feature of which has a 'Parent' that is
a previously loaded gene (i.e.
				FBgn0017545).  (These features represent
the unique introns, splice
				sites, and exonic design targets. Output
of this analysis,
				FBgn0017545_matd.gff, is also attached) 
				
				3) load these analysis results into the
same database, as follows:
				
				        > bp_seqfeature_load.PLS
FBgn0017545_matd.gff
				
				It is at this point that I get the above
error.
				
				However, I don't get any error and the
data loads fine if I load the two
				files together, as follows: 
				
				        > bp_seqfeature_load.PLS
--create <(cat FBgn0017545.gff
				FBgn0017545_matd.gff)
				
				So, I suspect that either I am
misunderstanding when/how to use
				bp_seqfeature_load.PLS or else this use
case has not yet arisen and must 
				be provided for somehow.
				
				I am running against bioperl-live
				
				Thanks for your thoughts and assistance,
				
				Malcolm Cook
				Database Applications Manager -
Bioinformatics
				Stowers Institute for Medical Research -
Kansas City, Missouri 
				
				
			-- 
			Lincoln D. Stein
			Cold Spring Harbor Laboratory
			1 Bungtown Road
			Cold Spring Harbor, NY 11724
			(516) 367-8380 (voice)
			(516) 367-8389 (fax)
			FOR URGENT MESSAGES & SCHEDULING, 
			PLEASE CONTACT MY ASSISTANT, 
			SANDRA MICHELSEN, AT michelse at cshl.edu
<mailto:michelse at cshl.edu>  


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From neha_bafs at yahoo.co.in  Mon Feb  5 12:59:03 2007
From: neha_bafs at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 

From jason at bioperl.org  Mon Feb  5 13:10:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 10:10:42 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org>

you want to write the TREE out not the TREE WRITER.

$treeout->write_tree($tree)

not
$treeout->write_tree($treeout);

On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:

> Hello everyone,
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
> /*------------------------------------------------------------*/
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
> use Bio::TreeIO;
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
> exit 0;
>
>
> /*------------------------------------------------------------*/
>
> Running the script through command line:
> Gives the following error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Questions:-
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 13:05:26 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$  ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank  you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 

From hlapp at duke.edu  Fri Feb  2 10:09:57 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:09:57 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>


On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:

> The second main change was to introduce a -flybase_compat argument  
> when
> initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> (that are compatable with flybase) will be used, but now the default
> will be to use current standards:

Just my $0.02 ... obviously, Flybase may be the only organization  
that uses an 'old style' or any other way not compliant with 'current  
standards' (presumably SO), but if it's not the only one then this  
approach won't scale.

Also, an argument -flybase_compat suggests to the unsuspecting that  
this is an endorsed flavor of the standard and fine to use for  
everyone else too.

If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
compliant with the standard as we all want it, keep it free from  
litter caused by usage of old versions of SO, and create a second  
module fb-chadoxml.pm that inherits from the first and merely  
overrides a few things so that it works for Flybase. This way, other  
organizations with similar needs can follow the path and create their  
own xyz-chadoxml.pm, rather than having to muck around in the  
chadoxml.pm that comes with the distribution.

I'm not sure I fully grasp the underlying issue, so I may not make  
much sense here. Apologies if that's the case ...

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From jason at bioperl.org  Mon Feb  5 14:43:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 11:43:09 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com>
References: <209988.63723.qm@web8715.mail.in.yahoo.com>
Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org>

please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your  
code so I don't know if you still have a typo.

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file => $filein, -format => $format);
my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");

while( my $t = $in->next_tree ) {
  $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

> Thank you very much for the reply.
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
> Please help me out with this script.
>
> Thank you.
> Regards,
> Neha
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
> $treeout->write_tree($tree)
>
> not
> $treeout->write_tree($treeout);
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
> Hello everyone,
>
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
> /*------------------------------------------------------------*/
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
> use Bio::TreeIO;
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
> exit 0;
>
>
>
>
> /*------------------------------------------------------------*/
>
>
> Running the script through command line:
> Gives the following error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
> Questions:-
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 14:58:08 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com>
Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com>


Hi,
Thank you for the code.
I tried it but I still get the same exception.

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus1.pl:18


Please find attached the perl file(nexus.pl).


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Please let me know if I am using the correct version.If not, please point me to the latest one.

Thank you.
Regards,
nnahar


Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo.  

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");


while( my $t = $in->next_tree ) { 
 $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

Thank you very much for the reply.


I fixed the code as per your suggestion,but now am getting a different error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


-------------  EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Please help me out with this script.


Thank you.
Regards,
Neha


Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE out not the TREE WRITER.


$treeout->write_tree($tree) 


not 
$treeout->write_tree($treeout);


On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:


Hello everyone,


I am trying  to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :


/*------------------------------------------------------------*/


$ cat nexus.pl
#!/usr/bin/perl -w


use Bio::TreeIO;


($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }


exit 0;


/*------------------------------------------------------------*/


Running the script through command line:
Gives the following error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm


Questions:-


1. Please let me know if I am using the correct version.
If not, please point me to the latest one.


2. Provided that the version I am using is the right one, please let me know what is wrong with the script.


Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"


---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441


http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
     

---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
 

 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441

http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/
 

-Neha Nahar
  " Work  for cause and not for applause, live to express and not to impress !"         

---------------------------------
  Here?s a new way to find what you're looking for - Yahoo! Answers 


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nexus.pl
Type: application/x-perl
Size: 811 bytes
Desc: 1389215665-nexus.pl
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070205/c6453dcf/attachment.bin 

From jason at bioperl.org  Mon Feb  5 17:15:52 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 14:15:52 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com>
References: <36024.1212.qm@web8405.mail.in.yahoo.com>
Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>

Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> <nexus.pl>


From lzhtom at hotmail.com  Mon Feb  5 22:31:56 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Tue, 06 Feb 2007 03:31:56 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>
Message-ID: <BAY110-F28F9C9145AC24F2D0E0D34C79F0@phx.gbl>

Thanks a lot!

After checking out the script bp_index, I changed the syntax to:
 my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE');
$inx->make_index("test.fasta");


Now I have a index file test.fasta.idx in my current directory. And I can 
use it in my later script
by saying 
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");

So now everything is OK. But I don't understand why I have to use that 
syntax. And why the syntax provided in the document didn't work?


>From: Jason Stajich <jason at bioperl.org>
>To: zhihua li <lzhtom at hotmail.com>
>CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com
>Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
>Date: Thu, 1 Feb 2007 22:24:44 -0800
>
>I don't think BIOPERL_INDEX does anything in the module so that
>documentation is not quite right.  the env variable is used in the
>scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job
>went bad somewhere.
>
>you need to specify the full path you want with -filename - you can
>just prepen the BIOPERL_INDEX to the filename like.
>-filename => $ENV{BIOPERL_INDEX}."/$index"
>
>-jason
>On Feb 1, 2007, at 7:27 PM, zhihua li wrote:
>
> > Sorry guys, the former empty mail was sent out by mistake.
> >
> > I'm using Bio::index::Fasta to index a file containing lots of
> > sequences in fasta format. All is fine except one thing.
> >
> > According to the bioperl tutorial and the documents, the following
> > code will make a indexed file:
> >
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
> >                                     -write_flag => 1);
> >    $inx->make_index("test.fasta");
> >
> > And in another script I can access the indexed file by sayinig
> >
> > $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> > my $seq=$inx->fetch("ent1001");        #fetch the sequence named
> > ent1001
> >
> > However, after running the first script, I cannot find a new file
> > test.fasta.idx in my current directory. And not surprisingly, when
> > I ran the second script, perl told me it couldn't find
> > "test.fasta.idx".
> >
> > What's going on here?
> >
> > Thanks a lot!
> >
> > _________________________________________________________________
> > ???????????????????????????????????????? MSN Messenger:  http://
> > messenger.msn.com/cn
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>--
>Jason Stajich
>Miller Research Fellow
>University of California, Berkeley
>lab: 510.642.8441
>http://pmb.berkeley.edu/~taylor/people/js.html
>http://fungalgenomes.org/
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From johnston at biochem.ucl.ac.uk  Tue Feb  6 06:52:08 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
Message-ID: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>

Hello,

I've just joined the list - I'm a Bioinformatics PhD student at Essex
University doing transcriptomics-related things. Mainly microarray
analysis and more recently looking at RNA structure prediction.

I was thinking about having a go at writing a bioperl-run wrapper around
some of the structure prediction stuff, but according to the wiki this is
being done already (at least for the Vienna tools). I spoke to Albert
Vilella at the EBI the other day and he said Chris Fields was the man to
speak to. So could he (or anyone) let me know what the current state of
RNA structure prediction tools in bioperl is?

Cheers,
Cass xx

From marian.thieme at lycos.de  Tue Feb  6 08:52:10 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Tue, 06 Feb 2007 14:52:10 +0100
Subject: [Bioperl-l] dbSNP
Message-ID: <45C8880A.7030702@lycos.de>

Hello all,

I looked for a method/class/function/script in the docuementation which
provides the opportunity to generate a snp assay suited to submit to
dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/
http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html)
I didnt find those code, but I recognized that there is at least a xml
parser to read dbSNP reports.

Does anybody know if there is also an output class to generate dbSNP
reports ? I could imagine, that at least the snp assay section is worth
to be implemented.

This example is given by ncbi:


TYPE:SNPASSAY
HANDLE:WI
BATCH: 1.98
MOLTYPE:Genomic
METHOD:RESEQ
SYN NAMES:WI-SNP,DnaId,MapDna
COMMENT:
Here is where some public comment that applies to the entire
batch of SNPS could be put.
PRIVATE:
Here is where a note to NCBI regarding processing that would
not be seen by the outside, could be put.
Note that these are is not exactly real SNPs, as
the data were modified.
||
SNP:WI|WIAF-1234567
SYNONYM:EST4291092,EST8291092,EST7291092
ACCESSION:H30533
LENGTH:101
5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG
OBSERVED:C/T
3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA
||
SNP:WI|WIAF-1722
SYNONYM:STS-T17494,STS-T17494,STS-T17494
ACCESSION:T17494
LENGTH:269
5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT
5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC
OBSERVED:A/T
3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA
3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT
    GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT
    CCCGGGCGTAGGCATTGCTCAAGTACCGAT
||


Regards,
Marian

P.S. this is not in contradiction to my first request about the brackets 
notation. We need both formats.


From cjfields at uiuc.edu  Tue Feb  6 11:45:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Feb 2007 10:45:36 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
Message-ID: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>

On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote:

> Hello,
>
> I've just joined the list - I'm a Bioinformatics PhD student at Essex
> University doing transcriptomics-related things. Mainly microarray
> analysis and more recently looking at RNA structure prediction.
>
> I was thinking about having a go at writing a bioperl-run wrapper  
> around
> some of the structure prediction stuff, but according to the wiki  
> this is
> being done already (at least for the Vienna tools). I spoke to Albert
> Vilella at the EBI the other day and he said Chris Fields was the  
> man to
> speak to. So could he (or anyone) let me know what the current  
> state of
> RNA structure prediction tools in bioperl is?
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Actually, the only RNA tool wrappers I have made are ones for ERPIN,  
RNAMotif, and Infernal (the only one in bioperl-run CVS at this time  
is RNAMotif).  I am planning on writing up wrappers for Vienna,  
UNAFold, and a few others but haven't really started in.  Here's  
where I'm at right now...

I am writing up a new set of AnnotationI classes which positionally  
describe data (Meta) which I hope will help deal with this stuff.   
These would be similar in nature to Heikki's Bio::Seq::Meta classes:

http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html

I would use a regular Bio::SeqI and store the structural data and  
anything else (such as energy calculations, etc) as Annotation  
objects in an AnnotationCollection, and then write up a series of  
SeqIO modules to get data into/out of the designated structure  
formats, like UNAfold ct, RNAML, and so on.  Each sequence would then  
be capable of holding more than one structural Annotation (i.e. could  
represent different folding pathways, alternative RNA folds, and so on).

At this point I represent the data as an array of hashes where $array 
[0] is nt 1 and the hash keys indicate the type of interaction, base  
interacted with, etc.  The text representation would be as simple  
Eddy WUSS (Rfam-like) format by default, which is capable of  
representing some complex data (pseudoknots, for instance), is  
compact, and is documented (via the Infernal manual).  Tags will  
probably switch to more ontologically relevant terms (probably from  
RNAML or RNA Ontology), but in general it is something like this:

[
  {'interaction' => 'WC',
    'base'  => 24},
  {'interaction' => 'WC',
    'base'  => 23},
  {'interaction' => 'SS'},
...
]

In this implementation every seq position would have some kind of  
interaction designation, though that's open for debate as it could  
just be simple text or undef for single-stranded regions.

This is also scalable based on complexity of the data: if one wanted  
to add tert/quaternary interactions, location, base modifications,  
remote sequence interactions, etc., extra key/value pairs could be  
used.  Comversely, if one only wanted sec structure (for drawing RNA  
structures, for example), then only that data would be parsed.

If you (or anyone listening) have any suggestions I would greatly  
appreciate them.

chris


From johnsonm at gmail.com  Tue Feb  6 18:53:49 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 6 Feb 2007 17:53:49 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
Message-ID: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>

Okay, I need to get something going for a project I'm working on.  Options:

1) Stick it all in one module:  This can get a bit ugly, as Glimmer, as
opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in
the prediction report.  You can pick up on some unique things in the output
file, but you don't know what you've got until you're actually parsing it.
Unless you require a format argument up front, then you can split the
parsing code up into different functions.
2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3.
With or without an abstract dispatch front end.

I suppose at this point, after getting my hands dirty, I'd prefer 1), with
an explicit -format => Glimmer2/3/M/HMM arg required in the constructor.
Though I'm not opposed to 2) if that is what it takes to get it into
Bioperl.

If we can achieve some sort of consensus without too much bloodshed, I'll
shoot y'all some patches and we can consider this issue checked off the
list.

On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     I think it's going to be at least two modules, one for the
> prokaryotic stuff and one for the eukaryotic.  And really, the
> prokaryotic stuff is different enough to warrant two modules. So three
> different parsers.  Could do it in one, but it would be ugly and
> nasty.  However, this does not preclude three parsers and one abstract
> interface, which is your excellent suggestion.
>     Oh, and excuse me, but I have a bit of a rant here, after dealing
> with parsers and pipelines for the last few months.  Parsers should
> not load the whole input file into RAM to parse it.  And Pipelines
> using the parsers (Ensembl / biopipe) should not stuff the whole
> result set from the parser into a single array.  When you're trying to
> annotate assemblies, it sucks to have to split up contigs/supercontigs
> because the whole result set won't fit into RAM on a 12 gig blade.
> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
> tiny (by comparison to vertebrates).  There, sorry, been saving up
> that frustration for a while.  No offense meant, hope I didn't tick
> anybody off.  8)
>     Torsten:  You sound like you know what you're doing with respect
> to Bioperl more than I do, and I know I don't have CVS access, so I'll
> defer to you.  I'd be happy to help out, though.
>
>
> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> >
> > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
> >
> > > I'm not sure whether to
> > >
> > > 1. parse them all under the same module, perhaps with a
> > > -format=>'glimmerXXX' parameter
> > >
> > > 2. create a single new module  Glimmer2 and Glimmer3
> > >
> > > 3. create two new modules, one for Glimmer2 and one for Glimmer3,
> > > given
> > > they are different outputs both in syntax and number of output files
> > >
> > > Any advice from Bioperl 'old timers' appreciated ;-)
> > >
> >
> > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
> > example for how this can work.
> >
> > If this would amount to basically 4 modules stringed together into
> > one file (because the parsing code can't share much if anything
> > between the flavors), it'd still be advantageous to have a single
> > frontend module that would then dispatch.
> >
> >         -hilmar
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>

From jason at bioperl.org  Tue Feb  6 19:33:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Feb 2007 16:33:11 -0800
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>

I definitely vote for 1) - worst case you have 4 separate methods if  
there is no good way to condense the parsing for each format and  
require the user to specify the format.

I have no problem with requiring user to specify what program she  
used - if we can be fancy and guess the format later (i.e. guess  
format in SeqIO) -then that's icing.

-jason
On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote:

> Okay, I need to get something going for a project I'm working on.   
> Options:
>
> 1) Stick it all in one module:  This can get a bit ugly, as  
> Glimmer, as
> opposed to GlimmerM and GlimmerHMM, does not explicitly identify  
> itself in
> the prediction report.  You can pick up on some unique things in  
> the output
> file, but you don't know what you've got until you're actually  
> parsing it.
> Unless you require a format argument up front, then you can split the
> parsing code up into different functions.
> 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ 
> Glimmer3.
> With or without an abstract dispatch front end.
>
> I suppose at this point, after getting my hands dirty, I'd prefer  
> 1), with
> an explicit -format => Glimmer2/3/M/HMM arg required in the  
> constructor.
> Though I'm not opposed to 2) if that is what it takes to get it into
> Bioperl.
>
> If we can achieve some sort of consensus without too much  
> bloodshed, I'll
> shoot y'all some patches and we can consider this issue checked off  
> the
> list.
>
> On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>>
>>     I think it's going to be at least two modules, one for the
>> prokaryotic stuff and one for the eukaryotic.  And really, the
>> prokaryotic stuff is different enough to warrant two modules. So  
>> three
>> different parsers.  Could do it in one, but it would be ugly and
>> nasty.  However, this does not preclude three parsers and one  
>> abstract
>> interface, which is your excellent suggestion.
>>     Oh, and excuse me, but I have a bit of a rant here, after dealing
>> with parsers and pipelines for the last few months.  Parsers should
>> not load the whole input file into RAM to parse it.  And Pipelines
>> using the parsers (Ensembl / biopipe) should not stuff the whole
>> result set from the parser into a single array.  When you're  
>> trying to
>> annotate assemblies, it sucks to have to split up contigs/ 
>> supercontigs
>> because the whole result set won't fit into RAM on a 12 gig blade.
>> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
>> tiny (by comparison to vertebrates).  There, sorry, been saving up
>> that frustration for a while.  No offense meant, hope I didn't tick
>> anybody off.  8)
>>     Torsten:  You sound like you know what you're doing with respect
>> to Bioperl more than I do, and I know I don't have CVS access, so  
>> I'll
>> defer to you.  I'd be happy to help out, though.
>>
>>
>> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>
>>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
>>>
>>>> I'm not sure whether to
>>>>
>>>> 1. parse them all under the same module, perhaps with a
>>>> -format=>'glimmerXXX' parameter
>>>>
>>>> 2. create a single new module  Glimmer2 and Glimmer3
>>>>
>>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3,
>>>> given
>>>> they are different outputs both in syntax and number of output  
>>>> files
>>>>
>>>> Any advice from Bioperl 'old timers' appreciated ;-)
>>>>
>>>
>>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
>>> example for how this can work.
>>>
>>> If this would amount to basically 4 modules stringed together into
>>> one file (because the parsing code can't share much if anything
>>> between the flavors), it'd still be advantageous to have a single
>>> frontend module that would then dispatch.
>>>
>>>         -hilmar
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From torsten.seemann at infotech.monash.edu.au  Tue Feb  6 21:36:54 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 7 Feb 2007 13:36:54 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <a79f6a4b0702061836l7e63933bs3f065b773054c9c4@mail.gmail.com>

> I definitely vote for 1) - worst case you have 4 separate methods if
> there is no good way to condense the parsing for each format and
> require the user to specify the format.

And make the defaut -format to be what is currently parses, ie.
GlimmerM/GlimmerHMM

> I have no problem with requiring user to specify what program she
> used - if we can be fancy and guess the format later (i.e. guess
> format in SeqIO) -then that's icing.

Agreed.

>> Okay, I need to get something going for a project I'm working on.

I would normally try to help but I am so swamped with work-work at the
moment. Just a reminder that last year I added examples of the
different Glimmer outputs to the CVS repository:

./t/data/Glimmer3.predict
./t/data/Glimmer3.detail
./t/data/GlimmerHMM.out
./t/data/Glimmer2.out
./t/data/GlimmerM.out
./t/data/glimmer.out (this was the original one)

Thanks for taking this on!

--Torsten

From mitch_skinner at berkeley.edu  Tue Feb  6 23:37:35 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Tue, 06 Feb 2007 20:37:35 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
Message-ID: <45C9578F.2060802@berkeley.edu>

Hello,

I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), 
where we're pre-rendering entire chromosomes by breaking them up into 
tiles.  One of the problems we have is that it takes a long time to 
render all those tiles.  One of the things that's slowing the process 
down (and using lots of RAM) is rendering the gridlines, and it would 
make things a lot easier (and faster) for us if we could assume that the 
gridlines were the same for each tile.  Since we're only rendering at a 
particular set of zoom levels (that we have control over), I think this 
is a reasonable assumption.

Given the right set of zoom levels, the assumption works almost all the 
time, except for one specific case.  It has to do with the way draw_grid 
and map_pt in Bio::Graphics::Panel work for the very first gridline.

Here's how draw_grid (in CVS HEAD) computes the first gridline:

    my $first_tick = $minor * int($self->start/$minor);

$first_tick, $minor and $self->start are in base-pair space, which is 
1-based.  However, if ($self->start < $minor) then $first_tick is 0.  
This might not be a problem, except that $first_tick is translated into 
pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here 
are the relevant lines in map_pt:

    my $val = $flip 
      ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
      : int (0.5 + ($_-$offset-1) * $scale);

This style of rounding only works for positive numbers; rounding 0.6 by 
doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing 
int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0, 
10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates 
false, and pad left is 0) they're drawn at pixels 0, 9, and 19.

I think that there should be gridlines at pixels 0, 10, and 20.  The 
fact that currently the first interval is 9 pixels and the second is 10 
pixels is breaking my hopeful assumption about the gridlines.

AFAICT my problems are solved if we make two changes:
change the above line from draw_grid to this:
    my $first_tick = 1 + $minor * int(($start - 1)/$minor);
and change the lines from map_pt to this:

    my $val = $flip 
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

Does this make sense?  If people agree that these changes are right then 
I can also produce a proper patch if y'all would prefer that.

Regards,
Mitch


From lstein at cshl.edu  Wed Feb  7 07:17:22 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:17:22 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>

Hi Mitch,

Zero is not a forbidden coordinate, since gbrowse also works on genetic maps
which have negative and floating point coordinates. You've simply picked up
a boundary case where the rounding isn't working properly. I will fix this
now.

Lincoln


On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Wed Feb  7 07:18:40 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:18:40 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>

However, I'm also very interested in why grid-drawing takes so long. When
I've profiled drawing, neither grid drawing nor map_pt() consume any
significant amount of time.

Lincoln

On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From johnsonm at gmail.com  Wed Feb  7 11:50:05 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 7 Feb 2007 10:50:05 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>

    Well, each format has some unique features.  If the user declines to
specify the format, I can figure it out, but it will probably involve
scanning the input file twice.  I'll take a look.
    I can do all the parsing in one function, in fact I have, just to see
how nasty it would end up being.  I just can't stomach having the code that
tightly coupled and hard to read.  In the end it'll probably be three
functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
Glimmer3 aren't *that* different, either.

On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>
> I definitely vote for 1) - worst case you have 4 separate methods if there
> is no good way to condense the parsing for each format and require the user
> to specify the format.
>
> I have no problem with requiring user to specify what program she used -
> if we can be fancy and guess the format later (i.e. guess format in SeqIO)
> -then that's icing.
>
> -jason
>
>

From adsj at novozymes.com  Wed Feb  7 12:11:32 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Wed, 07 Feb 2007 18:11:32 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
Message-ID: <8764adoptn.fsf@topper.koldfront.dk>

  Hi.


I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
to features in Bio::Seq objects have stopped appearing when I output
them as EMBL or GenBank-files.

Below is a test-script that exercises the problem.

I guess I should be doing something else when adding qualifiers, now
with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
again of course works perfectly), but I can't deduce what from perldoc
Bio::SeqFeature::Generic - it still lists the add_tag_value method,
and calling it doesn't croak nor warn.

I have found some comments on this in the release notes of 1.5.0? on
the Bioperl wiki, but I must admit I wasn't able to extract what
methods I should be calling instead.

If someone could point me to the relevant documentation or tell me
what method to use instead, I would be happy as a clam.


  Best regards,

    Adam

== =
use Test::More tests=>2;

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqFeature::Generic;
use IO::String;
use Bio::SeqIO;

my $seq=Bio::Seq->new(
                      -seq=>'actgactgactg',
                     );

$seq->display_id('D27');
$seq->accession_number('DB:D27');

my $seq_feature=Bio::SeqFeature::Generic->new(
                                              -strand=>1,
                                              -primary=>'source',
                                             );
$seq_feature->set_attributes(-start=>2, -end=>8);
$seq_feature->add_tag_value(note=>'TEST');
$seq_feature->add_tag_value(db_xref=>'DB:D27');

$seq->add_SeqFeature($seq_feature);

my $raw='';
my $fh=IO::String->new($raw);
my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh);
$out->write_seq($seq);

ok($raw=~m!/note!, 'Qualifier note found');
ok($raw=~m!/db_xref!, 'Qualifier db_xref found');
== =


? <http://www.bioperl.org/wiki/Core_1.4.0_1.5.0_delta>

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com

From cjfields at uiuc.edu  Wed Feb  7 12:50:13 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 11:50:13 -0600
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk>
References: <8764adoptn.fsf@topper.koldfront.dk>
Message-ID: <C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>


On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote:

>   Hi.
>
>
> I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
> to features in Bio::Seq objects have stopped appearing when I output
> them as EMBL or GenBank-files.
>
> Below is a test-script that exercises the problem.
>
> I guess I should be doing something else when adding qualifiers, now
> with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
> again of course works perfectly), but I can't deduce what from perldoc
> Bio::SeqFeature::Generic - it still lists the add_tag_value method,
> and calling it doesn't croak nor warn.
>
> I have found some comments on this in the release notes of 1.5.0? on
> the Bioperl wiki, but I must admit I wasn't able to extract what
> methods I should be calling instead.
>
> If someone could point me to the relevant documentation or tell me
> what method to use instead, I would be happy as a clam.
>
>
>   Best regards,
>
>     Adam

...

This works for me using bioperl-live (Mac OS X):

ok 1 - Qualifier note found
ok 2 - Qualifier db_xref found

If I print the string I get:

ID   DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP.
XX
AC   DB:D27;
XX
XX
FH   Key             Location/Qualifiers
FH
FT   source          2..8
FT                   /db_xref="DB:D27"
FT                   /note="TEST"
XX
SQ   Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other;
      actgactgac  
tg                                                            12
//

GenBank also works:

LOCUS       D27                       12 bp    dna     linear   UNK
ACCESSION   DB:D27
FEATURES             Location/Qualifiers
      source          2..8
                      /db_xref="DB:D27"
                      /note="TEST"
BASE COUNT        3 a      3 c      3 g      3 t
ORIGIN
         1 actgactgac tg
//

If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
mixing the two versions (you can check by using 'perldoc -l  
Bio::Root::Root').

chris

From cjfields at uiuc.edu  Wed Feb  7 13:04:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 12:04:33 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu>


On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote:

>     Well, each format has some unique features.  If the user  
> declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just  
> to see
> how nasty it would end up being.  I just can't stomach having the  
> code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.

I don't see a problem with passing off the parse to a defined class  
method either right off or mid-parse.  I'm doing something like this  
with a revamped GenBank parser:

# declare local to module

my %GLIMMER_METHODS = (
     'GlimmerHMM' => '_parsehmm',
     'Glimmer'  => '_parsenormal',
     ....others if needed
     '_DEFAULT_' => '_parseabnormal'
);

...

Then either preparse part of file using _readline() to determine  
format, or use -format and bypass preparsing:

sub next_thingy {
    ...
    if (!$format) {
        while (my $line = $self->_readline()) {
            if ($line =~ m{(something)}) {
                $format = $1; $self->_pushback($line); last;
            }
        }
    }
    my $method =  (exists $GLIMMER_METHODS($format)) ?  
$GLIMMER_METHODS($format) :
                  ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one

    return $self->$method() # hand off parsing flow to to proper parser
    ...
}

# all parser variants would have this structure:

sub _parsehmm {
    my $self = shift;
    ... init stuff here
    while (my $line = $self->_readline()) {
        ... do stuff until END of next prediction/report
    }
    ... return data if any
}

chris

> On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> I definitely vote for 1) - worst case you have 4 separate methods  
>> if there
>> is no good way to condense the parsing for each format and require  
>> the user
>> to specify the format.
>>
>> I have no problem with requiring user to specify what program she  
>> used -
>> if we can be fancy and guess the format later (i.e. guess format  
>> in SeqIO)
>> -then that's icing.
>>
>> -jason
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Wed Feb  7 13:56:52 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
In-Reply-To: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>

Thanks Chris.

Storing the interaction data as a hash according to an ontology and using
an extended bracket notation as the string representation seems to make
sense, but I'm still unsure how this is supposed to be
attached to the Seq objects. You reckon it should be an AnnotationI?

I'm not sure I understand the distinction between annotations and
features. From the docs I got the impression that Features were like
annotation on bits of sequences and had a reference to the sequence to
which they belong, whereas annotations don't. If that's the case though,
why would RNA structure be an annotation rather than a feature? If not,
what is the distinction between them? Are the positional Annotation
subclasses you're developing intended to replace features? Have I got the
wrong end of the stick entirely?

Cheers,
Cass


On Tue, 6 Feb 2007, Chris Fields wrote:

> Actually, the only RNA tool wrappers I have made are ones for ERPIN,
> RNAMotif, and Infernal (the only one in bioperl-run CVS at this time
> is RNAMotif).  I am planning on writing up wrappers for Vienna,
> UNAFold, and a few others but haven't really started in.  Here's
> where I'm at right now...
>
> I am writing up a new set of AnnotationI classes which positionally
> describe data (Meta) which I hope will help deal with this stuff.
> These would be similar in nature to Heikki's Bio::Seq::Meta classes:
>
> http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html
>
> I would use a regular Bio::SeqI and store the structural data and
> anything else (such as energy calculations, etc) as Annotation
> objects in an AnnotationCollection, and then write up a series of
> SeqIO modules to get data into/out of the designated structure
> formats, like UNAfold ct, RNAML, and so on.  Each sequence would then
> be capable of holding more than one structural Annotation (i.e. could
> represent different folding pathways, alternative RNA folds, and so on).
>
> At this point I represent the data as an array of hashes where $array
> [0] is nt 1 and the hash keys indicate the type of interaction, base
> interacted with, etc.  The text representation would be as simple
> Eddy WUSS (Rfam-like) format by default, which is capable of
> representing some complex data (pseudoknots, for instance), is
> compact, and is documented (via the Infernal manual).  Tags will
> probably switch to more ontologically relevant terms (probably from
> RNAML or RNA Ontology), but in general it is something like this:
>
> [
>   {'interaction' => 'WC',
>     'base'  => 24},
>   {'interaction' => 'WC',
>     'base'  => 23},
>   {'interaction' => 'SS'},
> ...
> ]
>
> In this implementation every seq position would have some kind of
> interaction designation, though that's open for debate as it could
> just be simple text or undef for single-stranded regions.
>
> This is also scalable based on complexity of the data: if one wanted
> to add tert/quaternary interactions, location, base modifications,
> remote sequence interactions, etc., extra key/value pairs could be
> used.  Comversely, if one only wanted sec structure (for drawing RNA
> structures, for example), then only that data would be parsed.
>
> If you (or anyone listening) have any suggestions I would greatly
> appreciate them.
>
> chris
>
>

From cjfields at uiuc.edu  Wed Feb  7 17:15:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 16:15:44 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
	<Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu>


On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote:

> Thanks Chris.
>
> Storing the interaction data as a hash according to an ontology and  
> using
> an extended bracket notation as the string representation seems to  
> make
> sense, but I'm still unsure how this is supposed to be
> attached to the Seq objects. You reckon it should be an AnnotationI?

As long as it describes everything in the object and that there is a  
reasonable way of textually representing the data, I think you can  
attach anything as annotation.  A recent example is the addition of  
trees as annotation.  Also, Annotation can be used to describe  
alignments (such as the structure consensus string in Rfam  
alignments), or added to SeqFeatures.  The class just needs to  
implement AnnotatableI.

> I'm not sure I understand the distinction between annotations and
> features. From the docs I got the impression that Features were like
> annotation on bits of sequences and had a reference to the sequence to
> which they belong, whereas annotations don't. If that's the case  
> though,
> why would RNA structure be an annotation rather than a feature? If  
> not,
> what is the distinction between them? Are the positional Annotation
> subclasses you're developing intended to replace features? Have I  
> got the
> wrong end of the stick entirely?
>
> Cheers,
> Cass

The key distinction between seqfeatures and annotations is that  
annotations are normally associated with the entire sequence record,  
while seqfeatures normally describe a part of the sequence (and thus  
have a location on the sequence).  There are a few exceptions, but in  
general that's that case.  The HOWTO gives a bit more background:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Using annotations or seqfeatures in a case like this may be  
completely dependent on one's point of view.  For instance, one  
implementation I had considered was adding an interface to Bio::Seq  
which would allow Seq objects to also have Bio::Structure objects/  
since my view is that any sequence could (optionally) have a  
structure associated with it.  However, I reasoned that a sequence  
could actually have multiple structures (RNA, ssDNA, and protein can  
have several alternative folds or different folding pathways, for  
instance).   Instead of splitting up each structure into individual  
seqfeatures (where each which would have to be tagged with the  
relevant structure and score info), I could have one class encompass  
all of that data in a reasonable way.  Hence I used Annotation.

BTW, this isn't meant to replace features in any way.  It would be  
primarily used to describe (1) a sequence as a whole, such as a tRNA  
sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in  
a genome sequence, or (3) a conserved structure in an alignment, such  
as Rfam stockholm output.

I'll add that the option of splitting the data into seqfeatures isn't  
ruled out.  It would be a matter of using a helper method, maybe in  
SeqUtils or directly in Annotation::Meta or whatever I end up calling  
it.  I plan on adding something along those lines at some point.

chris


From mitch_skinner at berkeley.edu  Wed Feb  7 18:26:53 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:26:53 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
Message-ID: <45CA603D.1070901@berkeley.edu>

Lincoln Stein wrote:
> Zero is not a forbidden coordinate, since gbrowse also works on 
> genetic maps which have negative and floating point coordinates. 
> You've simply picked up a boundary case where the rounding isn't 
> working properly. I will fix this now.
Thanks for the fix.  What do you think of the following case?.  This is 
something I actually ran into.  Suppose you have:
the original draw_grid:

    my $first_tick = $minor * int($self->start/$minor);

and my version of map_pt:

    my $val = $flip
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10.
Our tiles are currently 1000px wide.  So the first gridline will be at 
0bp => -1px and the 200th gridline will be at 2000bp => 1000px.  So the 
first tile will not have a gridline at it's 0th pixel but the second 
tile will have one there.  Last night I was thinking that this was an 
artifact of having gridlines start at 0bp but now I'm thinking this is 
just because rounding half-pixels leaves an extra space when crossing 
zero.  Which is not unreasonable; it just invalidates the assumption I 
was hoping to make that the gridlines are the same for each tile.  Maybe 
it's just unreasonable to think that floating point calculations will 
give pixel-exact results.

Or I may just be barking up the wrong tree entirely.  Perhaps it's time 
to reconsider at a higher level (see my next message).

Mitch

From mitch_skinner at berkeley.edu  Wed Feb  7 18:28:11 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:28:11 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
Message-ID: <45CA608B.80907@berkeley.edu>

Lincoln Stein wrote:
> However, I'm also very interested in why grid-drawing takes so long. 
> When I've profiled drawing, neither grid drawing nor map_pt() consume 
> any significant amount of time.
Well, the approach that we've been taking is to hand 
Bio::Graphics::Panel a fake GD object that stores all of the graphical 
primitives (line, rectangle, filledRectangle, etc. + their parameters) 
and then draws them later in chunks (for each tile, we draw all the 
primitives that overlap its pixel boundaries).  We're doing this because 
trying to create a real GD object that's hundreds of millions of pixels 
wide takes too much RAM.  But storing all the gridlines (for a whole 
chromosome, at a high zoom level) also takes a lot of RAM, and getting 
the gridlines for the current tile and translating their coordinates 
into the coordinate space of the tile also takes a fair amount of CPU.  
The gridline hack I've been experimenting with (that prompted these 
emails) was motivated by the hope that the gridlines were regular enough 
that we wouldn't have to store them explicitly, but just draw the same 
gridlines over and over again.  It runs almost twice as fast as the 
version that explicitly stores the gridlines.

So the main slowdown is not in draw_grid or map_pt, but in our code 
that's storing/retrieving and translating the gridlines.  Which we are 
also looking into speeding up.  But the memory usage is harder to 
reduce; I've experimented with trying to compress the gridline data but 
it seems easier to just have the panel draw the grid directly.

The more I read the Panel code, the more I think it would be nice to 
make more use of it.  One of the reasons that we're trying to fool it 
right now is that there seem to be a number of behaviors in it (and/or 
in the glyphs?) that take the current image boundaries into account 
(drawing an arrow where a feature runs off the edge of the image, 
etc.).  But in our browser each tile is supposed to mesh seamlessly with 
its neighbor, so if there's an easy way to turn off those edge-aware 
behaviors that would be pretty interesting.

Ian has also suggested that it might be better to store less information 
than the full set of graphics primitives.  For example, we could just 
store the Panel's glyph boxes and use their (pixel bound)->feature 
information to decide which features need to be drawn for each tile.

I'm going to be spending some time reading the Bio::Graphics code in 
more depth.  I'd also welcome suggestions from you or anyone on the list.

Thanks,
Mitch

From sdbrown at annular.org  Wed Feb  7 18:41:13 2007
From: sdbrown at annular.org (Steven Brown)
Date: Wed, 7 Feb 2007 15:41:13 -0800
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>

The module seems to have trouble handling the cut-site specifiers  
that surround the sequence that the enzyme is specific for.  The error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad end parameter (22). End must be less than the total length  
of sequence (total=6)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/PrimarySeq.pm:371
STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 
5.8.6/Bio/Restriction/Analysis.pm:369
STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
---snip (my script line)---
-----------------------------------------------------------

The offending enzyme:

---snip---
<1>AcuI
<2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
<3>CTGAAG(16/14)
---snip---

If I get rid of the (16/14) the error disappears and the right  
sequence site is matched.  It seems like maybe a decision was made  
not analyze enzymes with remote cut positions, or the code wouldn't  
throw the error...?  Any information on this would be helpful.

Thanks,
Steve

From adsj at novozymes.com  Thu Feb  8 03:55:50 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Thu, 08 Feb 2007 09:55:50 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
References: <8764adoptn.fsf@topper.koldfront.dk>
	<C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>
Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk>

On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote:

> This works for me using bioperl-live (Mac OS X):

> ok 1 - Qualifier note found
> ok 2 - Qualifier db_xref found

*slaps forehead*

Thanks for the test - your diagnose was spot on:

> If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
> mixing the two versions (you can check by using 'perldoc -l  
> Bio::Root::Root').

I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in
my @INC (added, and promptly forgotten, writing the patch mentioned
here: <http://article.gmane.org/gmane.comp.lang.perl.bio.general/13349/>).

Removing those and patching 1.5.2 fixed my self-inflicted problem.


  Thanks again!

     Adam

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com

From heikki at sanbi.ac.za  Thu Feb  8 04:39:47 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Feb 2007 11:39:47 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
Message-ID: <200702081139.48125.heikki@sanbi.ac.za>

The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an 
existing sequence. Maybe your sequence has a restriction site that is near 
the end of your sequence?

This is a special case which has not been into account in 
Bio::Restriction::Analysis::_cuts method. 

The question is : should the site be be detected if its cut site is not within 
the studied sequence?

Please submit a bugzilla bug, so this gets solved. I probably do not have time 
to tweak the code myself.

	-Heikki


On Thursday 08 February 2007 01:41:13 Steven Brown wrote:
> The module seems to have trouble handling the cut-site specifiers
> that surround the sequence that the enzyme is specific for.  The error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bad end parameter (22). End must be less than the total length
> of sequence (total=6)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/PrimarySeq.pm:371
> STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
> STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
> STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/
> 5.8.6/Bio/Restriction/Analysis.pm:369
> STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
> ---snip (my script line)---
> -----------------------------------------------------------
>
> The offending enzyme:
>
> ---snip---
> <1>AcuI
> <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
> <3>CTGAAG(16/14)
> ---snip---
>
> If I get rid of the (16/14) the error disappears and the right
> sequence site is matched.  It seems like maybe a decision was made
> not analyze enzymes with remote cut positions, or the code wouldn't
> throw the error...?  Any information on this would be helpful.
>
> Thanks,
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From cjfields at uiuc.edu  Thu Feb  8 09:20:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Feb 2007 08:20:26 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
Message-ID: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>

All,

BLAST XML parsing should now work for any CPAN-based XML::SAX parser!

XML::SAX::PurePerl (comes with XML::SAX, the slowest)
XML::SAX::Expat
XML::SAX::ExpatXS (the fastest)
XML::LibXML::SAX
XML::LibXML::SAX::Parser

Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl  
bug, so using that parser will necessitate an XML::SAX upgrade.  I  
had also found a bug in the SAX handler which chopped off a large  
chunk of the description for hits which is now fixed in CVS.

If Sendu is out there, I think we can safely remove any dependencies  
beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
modify Build.PL?

chris


From lstein at cshl.edu  Thu Feb  8 10:51:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 8 Feb 2007 10:51:49 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45CA608B.80907@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com>

Hi,

I like the approach you're taking (creating a fake GD object that stores the
graphics primitives). Perhaps the best thing to do is to subclass Panel
itself so that it doesn't draw the gridlines (or turn gridlines off
completely). Then you can draw gridlines after the fact in each tile as
needed.

Lincoln

On 2/7/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Lincoln Stein wrote:
> > However, I'm also very interested in why grid-drawing takes so long.
> > When I've profiled drawing, neither grid drawing nor map_pt() consume
> > any significant amount of time.
> Well, the approach that we've been taking is to hand
> Bio::Graphics::Panel a fake GD object that stores all of the graphical
> primitives (line, rectangle, filledRectangle, etc. + their parameters)
> and then draws them later in chunks (for each tile, we draw all the
> primitives that overlap its pixel boundaries).  We're doing this because
> trying to create a real GD object that's hundreds of millions of pixels
> wide takes too much RAM.  But storing all the gridlines (for a whole
> chromosome, at a high zoom level) also takes a lot of RAM, and getting
> the gridlines for the current tile and translating their coordinates
> into the coordinate space of the tile also takes a fair amount of CPU.
> The gridline hack I've been experimenting with (that prompted these
> emails) was motivated by the hope that the gridlines were regular enough
> that we wouldn't have to store them explicitly, but just draw the same
> gridlines over and over again.  It runs almost twice as fast as the
> version that explicitly stores the gridlines.
>
> So the main slowdown is not in draw_grid or map_pt, but in our code
> that's storing/retrieving and translating the gridlines.  Which we are
> also looking into speeding up.  But the memory usage is harder to
> reduce; I've experimented with trying to compress the gridline data but
> it seems easier to just have the panel draw the grid directly.
>
> The more I read the Panel code, the more I think it would be nice to
> make more use of it.  One of the reasons that we're trying to fool it
> right now is that there seem to be a number of behaviors in it (and/or
> in the glyphs?) that take the current image boundaries into account
> (drawing an arrow where a feature runs off the edge of the image,
> etc.).  But in our browser each tile is supposed to mesh seamlessly with
> its neighbor, so if there's an easy way to turn off those edge-aware
> behaviors that would be pretty interesting.
>
> Ian has also suggested that it might be better to store less information
> than the full set of graphics primitives.  For example, we could just
> store the Panel's glyph boxes and use their (pixel bound)->feature
> information to decide which features need to be drawn for each tile.
>
> I'm going to be spending some time reading the Bio::Graphics code in
> more depth.  I'd also welcome suggestions from you or anyone on the list.
>
> Thanks,
> Mitch
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From Kevin.M.Brown at asu.edu  Thu Feb  8 10:28:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 08:28:30 -0700
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu>

> The more I read the Panel code, the more I think it would be 
> nice to make more use of it.  One of the reasons that we're 
> trying to fool it right now is that there seem to be a number 
> of behaviors in it (and/or in the glyphs?) that take the 
> current image boundaries into account (drawing an arrow where 
> a feature runs off the edge of the image, etc.).  But in our 
> browser each tile is supposed to mesh seamlessly with its 
> neighbor, so if there's an easy way to turn off those 
> edge-aware behaviors that would be pretty interesting.

I think the glyphs try to deal with edges because if they didn't, then
they would flow out into whatever right or left padding had been placed
around the image when the panel was created.  Something I've noticed is
that when I create tiles for the chromosomes I'm working on the panels
don't line up because the bump position in one panel is not accounted
for when the next panel is drawn.


From sheris at eps.berkeley.edu  Thu Feb  8 12:42:27 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Thu, 08 Feb 2007 09:42:27 -0800
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>

Hi,
I'm a newbie to BioPerl so apologies if this is a very basic 
question. I am trying to parse GenBank files with the goal of 
creating concatenated gene lists in nucleic acid or amino acid 
format. It is working fine, except for one thing: I need to create 
gene labels incorporating information on whether the gene is on the 
complementary strand or not ("complement" in the CDS tag). How can I 
get Bioperl to tell me whether the CDS tag value includes the word 
"complement"?

Thanks
Sheri


From george.heller at yahoo.com  Thu Feb  8 13:54:41 2007
From: george.heller at yahoo.com (George Heller)
Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST)
Subject: [Bioperl-l] Perl script to extract from ncbi
Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com>

Hi all, 
   
  I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. 
   
  I have about 1500 records for which I need to extract the names from ncbi. 
   
  Any ideas of how I can go about writing a perl script for extracting this information from ncbi?
   
  Thanks!
  George.

 
---------------------------------
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.

From Kevin.M.Brown at asu.edu  Thu Feb  8 14:11:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 12:11:50 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu>

When you extract the features, just look at the strand method on the
returned sequence to find out.

@features = $seq->all_SeqFeatures;
# sort features by their primary tags
for my $f (@features)
{
	my $tag = $f->primary_tag;
	if ($tag eq 'CDS')
	{
		print $f->strand ."\n";
	}
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sheri Simmons
> Sent: Thursday, February 08, 2007 10:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl newbie needs help with 
> extracting cds info
> 
> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic 
> question. I am trying to parse GenBank files with the goal of 
> creating concatenated gene lists in nucleic acid or amino 
> acid format. It is working fine, except for one thing: I need 
> to create gene labels incorporating information on whether 
> the gene is on the complementary strand or not ("complement" 
> in the CDS tag). How can I get Bioperl to tell me whether the 
> CDS tag value includes the word "complement"?
> 
> Thanks
> Sheri
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From barry.moore at genetics.utah.edu  Thu Feb  8 14:35:03 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 8 Feb 2007 12:35:03 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <E6200600-30F2-4471-9107-29A355F543F9@genetics.utah.edu>

Sheri-

The Bio::SeqFeature::Generic object has a 'strand' method, so you can  
just call strand on the CDS (or any other) feature like this.

   my @features = grep { $_->primary_tag eq 'CDS' } $seq- 
 >get_SeqFeatures();
   for my $feature (@features) {
	  my $strand = $feature->strand;
  }

Barry

On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote:

> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic
> question. I am trying to parse GenBank files with the goal of
> creating concatenated gene lists in nucleic acid or amino acid
> format. It is working fine, except for one thing: I need to create
> gene labels incorporating information on whether the gene is on the
> complementary strand or not ("complement" in the CDS tag). How can I
> get Bioperl to tell me whether the CDS tag value includes the word
> "complement"?
>
> Thanks
> Sheri
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Feb  8 23:18:33 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 9 Feb 2007 15:18:33 +1100
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>

Chris,

> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
> XML::SAX::Expat
> XML::SAX::ExpatXS (the fastest)
> XML::LibXML::SAX
> XML::LibXML::SAX::Parser

That's excellent news - thanks for all the work you have put in on
this one. I'm impressed.

This is a good opportunity to encourage people who use Bio::SearchIO
for BLAST parsing to switch to 'blastxml' format over 'blast'.
Although the latter is more human readable, it perenially requires
parser source changes to cope with the variations and new formatting
introduced with each new NCBI BLAST release. Best to use "-m 7" XML
format, and convert as appropriate using one of the
Bio::Search::Writer:: classes.

--Torsten

From cjfields at uiuc.edu  Fri Feb  9 08:58:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 07:58:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu>

On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote:

> Chris,
>
>> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
>> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
>> XML::SAX::Expat
>> XML::SAX::ExpatXS (the fastest)
>> XML::LibXML::SAX
>> XML::LibXML::SAX::Parser
>
> That's excellent news - thanks for all the work you have put in on
> this one. I'm impressed.

Jason did most of the hard work; I just tinkered with it until it  
worked (and pestered a few perl XML guys along the way).  Thanks  
Grant and Bj?rn!

> This is a good opportunity to encourage people who use Bio::SearchIO
> for BLAST parsing to switch to 'blastxml' format over 'blast'.
> Although the latter is more human readable, it perenially requires
> parser source changes to cope with the variations and new formatting
> introduced with each new NCBI BLAST release. Best to use "-m 7" XML
> format, and convert as appropriate using one of the
> Bio::Search::Writer:: classes.
>
> --Torsten

I'll try getting some benchmarks for the different parsers up today  
on the wiki if I have time.

Strangely enough, NCBI changed a few things about BLAST XML a few  
releases back w/o mentioning it to anyone (it was a silent bug in  
BLAST XML parsing which I fixed recently).  If you sent in multiple  
queries in older versions of BLAST you would get all of the BLAST XML  
reports concatenated together, which required preparsing the reports  
to carve up the XML prior to parsing.  Now they treat it like PSI- 
BLAST where multiple queries = multiple iterations, so you get one  
long XML BLAST report where each iteration=Result.

The current parser should handle both as it just caches the other  
results and returns them one at a time prior to new parses, but I  
wouldn't recommend parsing a huge BLAST XML file with hundreds of  
queries as you'll quickly run out of memory!

If they get Perl SAX2 up to date with Expat they'll eventually add  
parse_chunk() and pause_parse() for each parser.  Until then...

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cuiw at ncbi.nlm.nih.gov  Fri Feb  9 09:20:10 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Fri, 9 Feb 2007 09:20:10 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
References: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov>

This is an example for fetching two GenBank records
(id=124504630,110665734) in XML format. Organism names like
'<GBSeq_organism>Rattus norvegicus</GBSeq_organism>' can be parsed from
the XML. 

 
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i
d=124504630,110665734&retmode=xml&rettype=gb

 
Or you can get TaxIds and translate them into real names:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide
&id=124504630,110665734&retmode=xml

 
Wenwu Cui, PhD

 
-----Original Message-----
From: George Heller [mailto:george.heller at yahoo.com] 
Sent: Thursday, February 08, 2007 1:55 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Perl script to extract from ncbi

 
Hi all, 

   
  I have a question regarding extracting data from Ncbi. I have a
database to store the sequence data, but the files I have loaded into
it, dont have a proper description line specified. Based on the
accession number, I need to find out what is the genus and species name
() from ncbi. 

   
  I have about 1500 records for which I need to extract the names from
ncbi. 

   
  Any ideas of how I can go about writing a perl script for extracting
this information from ncbi?

   
  Thanks!

  George.

 
---------------------------------

Now that's room service! Choose from over 150,000 hotels 

in 45,000 destinations on Yahoo! Travel to find your fit.

_______________________________________________

Bioperl-l mailing list

Bioperl-l at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Feb  9 12:51:19 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 09 Feb 2007 12:51:19 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <C1F21EC7.CBAA%bosborne11@verizon.net>

George,

http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat
abase

Brian O.


On 2/8/07 1:54 PM, "George Heller" <george.heller at yahoo.com> wrote:

> Hi all, 
>    
>   I have a question regarding extracting data from Ncbi. I have a database to
> store the sequence data, but the files I have loaded into it, dont have a
> proper description line specified. Based on the accession number, I need to
> find out what is the genus and species name (organism name) from ncbi.
>    
>   I have about 1500 records for which I need to extract the names from ncbi.
>    
>   Any ideas of how I can go about writing a perl script for extracting this
> information from ncbi?
>    
>   Thanks!
>   George.
> 
>  
> ---------------------------------
> Now that's room service! Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnston at biochem.ucl.ac.uk  Fri Feb  9 14:23:41 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
Message-ID: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>

Hi,

Could WrapperBase::executable warn you if it doesn't find the exe in
program_path? At the moment it just silently goes ahead and uses one in
the system path if it exists.

Cass.

I've never used diff, so not sure if this is right, but:

305,308c305,314
<        if( $prog_path && -e $prog_path && -x $prog_path ) {
<            $self->{'_pathtoexe'} = $prog_path;
<        } else {
<            my $exe;
---
>        if($prog_path){
>        if(-e $prog_path && -x $prog_path){
>          $self->{'_pathtoexe'} = $prog_path;
>        }
>        else{
>          $self->warn("executable not found in $prog_path, trying system
path...") if $warn;
>        }
>        }
>        unless ($self->{_path_to_exe}){
>        my $exe;
335a342


From bix at sendu.me.uk  Fri Feb  9 17:38:59 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:38:59 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
Message-ID: <45CCF803.9030004@sendu.me.uk>

Caroline Johnston wrote:
> Hi,
> 
> Could WrapperBase::executable warn you if it doesn't find the exe in
> program_path? At the moment it just silently goes ahead and uses one in
> the system path if it exists.

No, I think not. That would be very annoying when using wrappers for 
programs that you just have in your system path.

What specific problem are you encountering with the current behaviour?

From bix at sendu.me.uk  Fri Feb  9 17:40:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:40:33 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <45CCF861.8030000@sendu.me.uk>

Chris Fields wrote:
> If Sendu is out there, I think we can safely remove any dependencies  
> beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
> modify Build.PL?

Sure, good to hear.

From cjfields at uiuc.edu  Fri Feb  9 22:42:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 21:42:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45CCF861.8030000@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
Message-ID: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>


On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> If Sendu is out there, I think we can safely remove any dependencies
>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>> modify Build.PL?
>
> Sure, good to hear.

I added a version dependency for XML::SAX (v. 0.15) for the PurePerl  
fix.  That likely obviates the need for a Bundle for XML::Simple.   
Not too pressing; we can determine that before the next release.

chris

From johnston at biochem.ucl.ac.uk  Sat Feb 10 11:27:53 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <45CCF803.9030004@sendu.me.uk>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
	<45CCF803.9030004@sendu.me.uk>
Message-ID: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>

> No, I think not. That would be very annoying when using wrappers for
> programs that you just have in your system path.
>

Hmm, maybe I misundertood what the program_path was for? The executable
method goes straight to the system path unless program_path is set, so I
assumed you would only set program_path if you specifically wanted it to
look somewhere else. You wouldn't get a warning if you didn't specify a
program_path and just left it to look in the system path.

> What specific problem are you encountering with the current behaviour?

One version of an executable in /usr/local, another version - which I
wanted to use in my home directory.
The program_path method gets a path from an environment variable, which
was set to ~/.
I didn't realise I had the wrong permissions on the
executable though, and it was silently failing to use my version and using
the one in /usr/local instead.


Cass

From george.heller at yahoo.com  Sat Feb 10 15:35:18 2007
From: george.heller at yahoo.com (George Heller)
Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST)
Subject: [Bioperl-l] Error while parsing
Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com>

Hi all,
   
  I am in the process of parsing a few files, actually blast results, but happen to get the following error:
   
  ------------- EXCEPTION  -------------
MSG: Can't get HSPs: data not collected.
STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
STACK toplevel parser.pl:31
  --------------------------------------

  I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. 
   
  Thanks!
  George.

 
---------------------------------
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.

From cjfields at uiuc.edu  Sat Feb 10 17:56:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Feb 2007 16:56:19 -0600
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>

On Feb 10, 2007, at 2:35 PM, George Heller wrote:

> Hi all,
>
>   I am in the process of parsing a few files, actually blast  
> results, but happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing  
> wrong. Any pointers are appreciated.
>
>   Thanks!
>   George.

We'll need more to go on than that.  If the bioperl version is  
v1.5.2, please file a bug via the bioperl bugzilla:

http://bugzilla.open-bio.org/

Don't forget to attach a test file which triggers the bug using the  
'Create a new attachment' link after the report has been filed.

chris

From sac at bioperl.org  Sat Feb 10 22:56:10 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Feb 2007 19:56:10 -0800
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com>

Your report may be lacking HSP alignments for the hit you are attempting to
process. Note that by default, blast will report twice as many one-line
descriptions as it will alignments:

  -v  Number of database sequences to show one-line descriptions for (V)
[Integer]
    default = 500
  -b  Number of database sequence to show alignments for (B) [Integer]
    default = 250

Verify that this isn't the case for your error. If not, go ahead and file a
bug report. Attach the report (zipped if big) as well as the relevant
portion of your processing script.

Steve

On 2/10/07, George Heller <george.heller at yahoo.com> wrote:
>
> Hi all,
>
>   I am in the process of parsing a few files, actually blast results, but
> happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp
> /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing wrong.
> Any pointers are appreciated.
>
>   Thanks!
>   George.
>
>
> ---------------------------------
> No need to miss a message. Get email on-the-go
> with Yahoo! Mail for Mobile. Get started.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From jay at jays.net  Sun Feb 11 09:24:55 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 08:24:55 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>

Just a heads-up --

I wanted to check the "E-mail me when a page I'm watching is changed"  
box in my preferences

http://www.bioperl.org/wiki/Special:Preferences

But I can't. Even if I change nothing and hit the Save button I get  
this:

----------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "User::saveSettings". MySQL returned error  
"1054: Unknown column 'user_newpass_time' in 'field list' (localhost)".
----------

(Yes, it literally says "(SQL query hidden)". That wasn't me for the  
purposes of this email. -grin-)

Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


Username:	Jhannah
User ID:	51


From jay at jays.net  Sun Feb 11 10:16:13 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 09:16:13 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>

Hmm.... The error appears to not be limited to changing preferences.  
I tried to update a couple different pages and got errors like this:

------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "Article::updateRedirectOn". MySQL returned  
error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
------

So all changes to the wiki aren't working right now?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Sun Feb 11 15:18:15 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 12:18:15 -0800
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>

Should be fine now - I did an upgrade to mediawiki 1.9 this weekend  
and i think the upgrade script didn't finish.

In the future system support requests should go to support - AT -  
open-bio.org so we can track them.

-jason
On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:

> Hmm.... The error appears to not be limited to changing preferences.
> I tried to update a couple different pages and got errors like this:
>
> ------
> Database error
> A database query syntax error has occurred. This may indicate a bug
> in the software. The last attempted database query was:
>
>      (SQL query hidden)
>
> from within function "Article::updateRedirectOn". MySQL returned
> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
> ------
>
> So all changes to the wiki aren't working right now?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From cjfields at uiuc.edu  Sun Feb 11 15:51:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 11 Feb 2007 14:51:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
Message-ID: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>

Is there a good place on the main wiki page to prominently display  
this?  I wanted to place something at the top of the main page but I  
didn't know if we wanted to post the support email address on the  
page itself.

chris

On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote:

> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend
> and i think the upgrade script didn't finish.
>
> In the future system support requests should go to support - AT -
> open-bio.org so we can track them.
>
> -jason
> On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:
>
>> Hmm.... The error appears to not be limited to changing preferences.
>> I tried to update a couple different pages and got errors like this:
>>
>> ------
>> Database error
>> A database query syntax error has occurred. This may indicate a bug
>> in the software. The last attempted database query was:
>>
>>      (SQL query hidden)
>>
>> from within function "Article::updateRedirectOn". MySQL returned
>> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
>> ------
>>
>> So all changes to the wiki aren't working right now?
>>
>> j
>> seqlab.net
>> http://www.bioperl.org/wiki/User:Jhannah
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Sun Feb 11 15:56:53 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 14:56:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
	<E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
Message-ID: <CAF40EBD-F0E2-434C-91F4-2B766B20E734@jays.net>

On Feb 11, 2007, at 2:51 PM, Chris Fields wrote:
> Is there a good place on the main wiki page to prominently display  
> this?  I wanted to place something at the top of the main page but  
> I didn't know if we wanted to post the support email address on the  
> page itself.

I added it here:

http://www.bioperl.org/wiki/About_site

Which is linked from all pages via the left-hand bar:  community |  
About this site

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From agd27 at cornell.edu  Sun Feb 11 12:47:03 2007
From: agd27 at cornell.edu (Adam Diehl)
Date: Sun, 11 Feb 2007 12:47:03 -0500
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
Message-ID: <45CF5697.60703@cornell.edu>

Good morning folks,

I've got sort of a newbie question regarding how to get gff's out of 
Bio::Tools:GFF objects that are formatted according to the UCSC browser 
conventions, described here:

http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
(Ignore the custom track headers and what-not. I just need the fields to 
be set up according to the descriptions in 1 - 9).

The write_feature($feature) method isn't doing it for me, as I get lines 
like the following (newlines excepted):

chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +       
.       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +       
.       
EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_
id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT
LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN

As you can see, field 8, which should be frame according to UCSC 
conventions is blank, and field 9, group according to UCSC, has frame, 
along with ID, etc. All this extra stuff causes the UCSC browser to 
choke. First off, it can't identify which features are the same (it does 
this by matching the group field), and second, it can't interpret the 
CDS's into translated proteins because it lacks frame data.

Basically what I need to do is, for CDS features, extract frame (or 
codon_start, as it were), from the last field, parse out the integer 
value and store that in field 8 (as frame), then parse out locus_tag 
from the last field, clear out everything else and store the locus_tag 
only in that field (preferably without the qualifier locus_tag=). For 
feature type gene, I just want to do the last step, so that the gene and 
CDS features for the same feature have matching group fields that are as 
simple as possible. Let me know if this is not clear.

The way I've been trying to do this is by stringifying each gff object, 
splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the 
following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to 
parse out the bits I need with regular expressions and store back to 
@tmp1[n].  -- This does not work, because perl wants to interpret every 
/ + etc. as a metacharacter!

I am assuming there's a simple way to get at each value in the last 
field of the gff object using methods supplied by Bio::Tools::GFF, but 
the API docs seem a bit lacking in this area. Could anyone steer me 
towards what I need to know to do this? Please let me know if I can 
clarify any details!

Cheers,
Adam Diehl

From jason at bioperl.org  Sun Feb 11 18:29:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 15:29:16 -0800
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
In-Reply-To: <45CF5697.60703@cornell.edu>
References: <45CF5697.60703@cornell.edu>
Message-ID: <F6B017A7-E91F-4739-9688-F1212EC857C8@bioperl.org>

I assume you are getting your features from a Bio::SeqIO parse of a  
Genbank file?

you get back a Bio::SeqFeature::Generic objects  so you want to look  
at the docs for that module to see what the API is.
you will need to set frame via
$feature->frame($frame)
You are going to have to determine the frame yourself if that isn't  
part of the feature, we don't calculate it for you.

For the 9th column, this is available through the tags methods  
has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag
so you can remove all the tags you don't want through remove_tag (or  
if you want to remove them all)
my $locus;
for my $tag ( $feature->get_all_tags ) {
  if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it
   ($locus) = $feature->get_tag_values($tag);
  }
  $feature->remove_tag($tag);
}

You will also want to set the GFF format when you call  
Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I  
don't know exactly how you set the tag then when they aren't paired  
with key=>value, you'll need to set the tag to 'group' so
$feature->add_tag_value('group', $locus);

If this is all unsatistfactory you can easily write your own GFF  
write for your flavor of the data with the
print join("\t",
                  $feat->seq_id,
                  $feat->source_tag,
                  $feat->primary_tag,
                  $feat->start,
                  $feat->end,
                  $feat->score,
                  $feat->strand > 0 ? '+' : '-',
                  $feat->frame,
		$locus), "\n";


-jason
On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote:

> Good morning folks,
>
> I've got sort of a newbie question regarding how to get gff's out of
> Bio::Tools:GFF objects that are formatted according to the UCSC  
> browser
> conventions, described here:
>
> http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
> (Ignore the custom track headers and what-not. I just need the  
> fields to
> be set up according to the descriptions in 1 - 9).
>
> The write_feature($feature) method isn't doing it for me, as I get  
> lines
> like the following (newlines excepted):
>
> chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +
> .       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
> chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +
> .
> EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: 
> 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase 
> +III%2C+beta+chain;protein_
> id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA 
> IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK 
> EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI 
> VLSNHKDFKAVATDSHRMSQRLIT
> LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE 
> TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP 
> TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN
>
> As you can see, field 8, which should be frame according to UCSC
> conventions is blank, and field 9, group according to UCSC, has frame,
> along with ID, etc. All this extra stuff causes the UCSC browser to
> choke. First off, it can't identify which features are the same (it  
> does
> this by matching the group field), and second, it can't interpret the
> CDS's into translated proteins because it lacks frame data.
>
> Basically what I need to do is, for CDS features, extract frame (or
> codon_start, as it were), from the last field, parse out the integer
> value and store that in field 8 (as frame), then parse out locus_tag
> from the last field, clear out everything else and store the locus_tag
> only in that field (preferably without the qualifier locus_tag=). For
> feature type gene, I just want to do the last step, so that the  
> gene and
> CDS features for the same feature have matching group fields that  
> are as
> simple as possible. Let me know if this is not clear.
>
> The way I've been trying to do this is by stringifying each gff  
> object,
> splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the
> following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally,  
> trying to
> parse out the bits I need with regular expressions and store back to
> @tmp1[n].  -- This does not work, because perl wants to interpret  
> every
> / + etc. as a metacharacter!
>
> I am assuming there's a simple way to get at each value in the last
> field of the gff object using methods supplied by Bio::Tools::GFF, but
> the API docs seem a bit lacking in this area. Could anyone steer me
> towards what I need to know to do this? Please let me know if I can
> clarify any details!
>
> Cheers,
> Adam Diehl
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From bix at sendu.me.uk  Sun Feb 11 18:39:15 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 11 Feb 2007 23:39:15 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>	<45CCF803.9030004@sendu.me.uk>
	<Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
Message-ID: <45CFA923.8010201@sendu.me.uk>

Caroline Johnston wrote:
>> No, I think not. That would be very annoying when using wrappers for
>> programs that you just have in your system path.
> 
> Hmm, maybe I misundertood what the program_path was for? The executable
> method goes straight to the system path unless program_path is set, so I
> assumed you would only set program_path if you specifically wanted it to
> look somewhere else. You wouldn't get a warning if you didn't specify a
> program_path and just left it to look in the system path.

Yes, sorry. Having now actually looked at your patch it seems fine. I'll 
commit it unless someone beats me to it.

From flope004 at hotmail.com  Sun Feb 11 21:40:08 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 03:40:08 +0100
Subject: [Bioperl-l] TreeIO, how it works?
Message-ID: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>

Hi,

I have a problem. I don't understand how TreeIO reads the trees:
my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));

An unrooted tree with 4 tips and 2 internal nodes.
when I asked for:
print "Total number of nodes ",$tree->number_nodes;

I get 6 but when I ask for:
foreach my $node (@nodes) {
	print $node->internal_id,",";
}
I get 6,0,1,2,3,4,5. Total 7.

The root is number 6 and 2 and 5 are my internal nodes.
If I set the root to be number 5 this node 6 is still present.
Why? what is the node 6?

when I try the following:
  $node5 = $tree->find_node(-internal_id => '5');
  $node6 = $tree->find_node(-internal_id => '6');
  $node2 = $tree->find_node(-internal_id => '2');
  $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
  $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
  $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
  or any other distance I get 2 warnings:
  -------------------- WARNING ---------------------
MSG: Must provide a valid array reference for -nodes
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Could not find distance!
---------------------------------------------------
What am I doing incorrectly?

I am practicing with AlignIO and TreeIO to calculate the maximum likelihood 
for a given tree. So,
other information about that would be of great help. I am practicing with 
this to see how Bioperl can
help me with more complex problems.

Thank you very much for your help!

_________________________________________________________________
Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos 
incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos


From jason at bioperl.org  Sun Feb 11 22:05:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 19:05:18 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
References: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>


On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote:

> Hi,
>
> I have a problem. I don't understand how TreeIO reads the trees:
> my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));
>
> An unrooted tree with 4 tips and 2 internal nodes.
> when I asked for:
> print "Total number of nodes ",$tree->number_nodes;
>
> I get 6 but when I ask for:
> foreach my $node (@nodes) {
> 	print $node->internal_id,",";
> }
> I get 6,0,1,2,3,4,5. Total 7.
>
> The root is number 6 and 2 and 5 are my internal nodes.
> If I set the root to be number 5 this node 6 is still present.
> Why? what is the node 6?

Node 6 is to hold the root or a fake root with a trifurcation for  
unrooted trees.  Did you actually call the reroot method to set the  
root to node 5?

>
> when I try the following:
>   $node5 = $tree->find_node(-internal_id => '5');
>   $node6 = $tree->find_node(-internal_id => '6');
>   $node2 = $tree->find_node(-internal_id => '2');
>   $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
>   $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
>   $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
>   or any other distance I get 2 warnings:
>   -------------------- WARNING ---------------------
> MSG: Must provide a valid array reference for -nodes
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: Could not find distance!
> ---------------------------------------------------
> What am I doing incorrectly?
>
The distance method is just summing branch lengths on the path  
between two nodes.  Is that what are you trying to do?

The error message you report doesn't make sense as
"Must provide a valid array reference for -nodes"
is only printed when you call is_monophyletic or is_paraphyletic as  
far as I can tell.

what version of bioperl are you using?

> I am practicing with AlignIO and TreeIO to calculate the maximum  
> likelihood
> for a given tree. So,other information about that would be of great  
> help. I am practicing with
> this to see how Bioperl can help me with more complex problems.
>
You are trying to calculate the likelihood of a tree or are you  
trying to generate a ML tree from an alignment?

> Thank you very much for your help!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos
> incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis.
> http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From er at xs4all.nl  Mon Feb 12 08:03:06 2007
From: er at xs4all.nl (Erik)
Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET)
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>

Hi,


The bioperl wiki changes rss / atom feed has two leading empty lines which
invalidate the xml:

XML Parsing Error: xml declaration not at start of external entity
Location:
http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss
Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
^

Could those be removed? (I didn't see a way to do it myself). Might be a
useful feed :)


thanks,

Erik


From cjfields at uiuc.edu  Mon Feb 12 09:52:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Feb 2007 08:52:44 -0600
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
	<20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
Message-ID: <DA1A57C0-32B5-4095-AB80-318B5F529730@uiuc.edu>

I have forwarded this to support at open-bio.org, which should take  
care of it.

chris

On Feb 12, 2007, at 7:03 AM, Erik wrote:

> Hi,
>
>
> The bioperl wiki changes rss / atom feed has two leading empty  
> lines which
> invalidate the xml:
>
> XML Parsing Error: xml declaration not at start of external entity
> Location:
> http://www.bioperl.org/w/index.php? 
> title=Special:Recentchanges&feed=rss
> Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
> ^
>
> Could those be removed? (I didn't see a way to do it myself). Might  
> be a
> useful feed :)
>
>
> thanks,
>
> Erik
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sm8 at sanger.ac.uk  Mon Feb 12 12:12:00 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 17:12:00 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF830@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From sm8 at sanger.ac.uk  Mon Feb 12 11:04:41 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 16:04:41 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From flope004 at hotmail.com  Mon Feb 12 13:07:12 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 19:07:12 +0100
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>
Message-ID: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>

thanks for your reply!

I am using Bioperl 1.4.

>Node 6 is to hold the root or a fake root with a trifurcation for
>unrooted trees.  Did you actually call the reroot method to set the
>root to node 5?

Yes, I tried the following with the same result:
$tree->reroot($tree->find_node(-internal_id => '5'));
or
$tree->set_root_node($tree->find_node(-internal_id => '5'));

Even if I use a rooted tree: 
(((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1);
I get the node #6. So, is it always present? Am I not representing properly 
a rooted tree  in newick format?

>The distance method is just summing branch lengths on the path
>between two nodes.  Is that what are you trying to do?
>
>The error message you report doesn't make sense as
>"Must provide a valid array reference for -nodes"
>is only printed when you call is_monophyletic or is_paraphyletic as
>far as I can tell.

I do not know yet what I was doing incorrectly but now It works. Yes, I was 
using the distance method to know where the node 6 was located. For the 
unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree 
node 6 was 0.1 from the mouse leaf and the internal node (root).
The error message: "Must provide a valid array reference for -nodes" is 
shown if I indicate a node which is not present in the tree.

>You are trying to calculate the likelihood of a tree or are you
>trying to generate a ML tree from an alignment?

I am trying to calculate the likelihood of a tree, as a practice. Probably 
there are other  bioperl modules, besides AlignIO and TreeIO, which can help 
me in the process and I do not know them.

Again, thank you for your time!

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From dmessina at wustl.edu  Mon Feb 12 12:49:49 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 12 Feb 2007 11:49:49 -0600
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu>

Stephen,

Great, thanks for this. Could you submit it to Bugzilla as an  
enhancement?

http://bugzilla.open-bio.org/


Thanks,
Dave


From jason at bioperl.org  Mon Feb 12 13:38:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 12 Feb 2007 10:38:11 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
References: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
Message-ID: <BD0EF8B4-69A9-468E-A722-1110B02D0EF7@bioperl.org>

I would definitely suggest getting ahold of bioperl 1.5.2 as I seem  
to remember there are several fixes in the tree module code for re- 
rooting a tree.
-jason

On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote:

> thanks for your reply!
>
> I am using Bioperl 1.4.
>
>> Node 6 is to hold the root or a fake root with a trifurcation for
>> unrooted trees.  Did you actually call the reroot method to set the
>> root to node 5?
>
> Yes, I tried the following with the same result:
> $tree->reroot($tree->find_node(-internal_id => '5'));
> or
> $tree->set_root_node($tree->find_node(-internal_id => '5'));
>
> Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): 
> 0.1,mouse:0.1);
> I get the node #6. So, is it always present? Am I not representing  
> properly a rooted tree  in newick format?
>
>> The distance method is just summing branch lengths on the path
>> between two nodes.  Is that what are you trying to do?
>>
>> The error message you report doesn't make sense as
>> "Must provide a valid array reference for -nodes"
>> is only printed when you call is_monophyletic or is_paraphyletic as
>> far as I can tell.
>
> I do not know yet what I was doing incorrectly but now It works.  
> Yes, I was using the distance method to know where the node 6 was  
> located. For the unrooted tree, node 6 was node 5 (an internal  
> node) and for the rooted tree node 6 was 0.1 from the mouse leaf  
> and the internal node (root).
> The error message: "Must provide a valid array reference for - 
> nodes" is shown if I indicate a node which is not present in the tree.
>
>> You are trying to calculate the likelihood of a tree or are you
>> trying to generate a ML tree from an alignment?
>
> I am trying to calculate the likelihood of a tree, as a practice.  
> Probably there are other  bioperl modules, besides AlignIO and  
> TreeIO, which can help me in the process and I do not know them.
>
> Again, thank you for your time!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet.  
> Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
>

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From johnsonm at gmail.com  Mon Feb 12 18:13:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 12 Feb 2007 17:13:09 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>

On 2/7/07, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     Well, each format has some unique features.  If the user declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just to see
> how nasty it would end up being.  I just can't stomach having the code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.


    I've got a 4-in-1 parser roughed in per Chris Fields' suggestion.   Two
actual parsing routines (prokaryotic and eukaryotic).  You can specify
-format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it
will look through the input until it can figure out what it is looking at.
    I've got one main issue to solve, the rest is just stuff like updating
the POD.  Torsten Seemann very helpfully added example output for all 4
formats to t/data.  Looking at GlimmerHMM.out, the first line is
'GlimmerHMM'.  However, I think there is a bug in the existing
_parse_predictions:

Shouldn't this:

} elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }

be this instead:

} elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }


I lifted that bit of code to do format detection...we don't have GlimmerHMM
installed locally, so I'm assuming Torsten's output is correct and the above
is a bug.  Guess I'll go check bugzilla...

From torsten.seemann at infotech.monash.edu.au  Mon Feb 12 21:07:40 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 13 Feb 2007 13:07:40 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
Message-ID: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>

Mark,

>     I've got one main issue to solve, the rest is just stuff like updating
> the POD.  Torsten Seemann very helpfully added example output for all 4
> formats to t/data.  Looking at GlimmerHMM.out, the first line is
> 'GlimmerHMM'.  However, I think there is a bug in the existing
> _parse_predictions:
> Shouldn't this:
> } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
> be this instead:
> } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version

I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
Here's why:

I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
parse GlimmerM. I noted that GlimmerHMM was the same output format as
GlimmerM, except for the first line. So in rev 1.5 I modified the
regexp to match both ie. \S* . This would also hopefully match any
other Glimmer-clone formats that arose. I also fixed the pdocs to say
this, and added tests to t/Genpred.t.
% cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
% cvs diff -r 1.15 -r 1.16 t/Genpred.t

I then planned to extend support to Glimmer2 and Glimmer3. I added the
4 test files (t/Glimmer*.out) but never wrote the code. This is where
you have come in Mark :-)

> I lifted that bit of code to do format detection...we don't have GlimmerHMM
> installed locally, so I'm assuming Torsten's output is correct and the above
> is a bug.  Guess I'll go check bugzilla...

I'm pretty sure my 4 test files are correct - I spent a lot of time
ensuring they were consistent etc, as I was getting very confused with
the different "glimmer" versions!

Hope this all helps,

--Torsten

From avilella at gmail.com  Tue Feb 13 08:20:15 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 13 Feb 2007 13:20:15 +0000
Subject: [Bioperl-l] number of gaps for the other sequences in an alignment
Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com>

Hi,

It would be great if we could have a method to count, given one
sequence in an alignment, the number of gaps present in the rest of
the sequences of the alignment. That is, for each
nucleotide/aminoacidic position of the sequence of interest, look at
the column in the alignment, count the gaps, then sum them over for
the rest of the non-gapped columns in the sequence of interest.

Has anyone tried this before?

My idea is to end up having a coefficient of indel contribution for
each of the sequences in the alignment, with this coefficient being
high when one sequences forces a lot of gaps to be inserted in the
final alignment, in order to accommodate this given sequence.

I would say that the best place for this is either using methods
already available in SimpleAlign, or have something new added there.

Looking forward to your comments,

Cheers,

    Albert.

From bix at sendu.me.uk  Tue Feb 13 11:09:09 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 13 Feb 2007 16:09:09 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
Message-ID: <45D1E2A5.6060104@sendu.me.uk>

I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database 
and wanted to associated some basic information with them, like exon 
positions. I thought of creating Bio::SeqFeature::Gene::Transcript 
objects and storing them so I could later use features() to see what 
other features overlapped exons. I ran into a fatal error that can be 
replicated with the following simplified one-liner:

perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e 
'$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => 
"dbi:mysql:test"); $trans = 
Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id 
=> "test"); $db->store($trans); @trans = $db->features(-seqid => $id, 
-type => "transcript"); print "@trans\n";'

code sub {
     package Bio::SeqFeature::Generic;
     use strict 'refs';
     my $self = shift @_;
     foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
         $f = undef;
     }
     $$self{'_gsf_seq'} = undef;
     foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
         $$self{'_gsf_tag_hash'}{$t} = undef;
         delete $$self{'_gsf_tag_hash'}{$t};
     }
} did not evaluate to a subroutine reference, at 
/.../Bio/DB/SeqFeature/Store.pm line 2280


Is this a bug? Or am I taking the wrong approach?

From johnsonm at gmail.com  Tue Feb 13 15:10:23 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 13 Feb 2007 14:10:23 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
Message-ID: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>

    You're quite correct.  I wasn't paying enough attention.  That does work
just fine.  I fat-fingered something somewhere else, broke my version of the
module for GlimmerHMM, hallucinated and confused \S and \s.  8)
    All I have left now is to fixup the POD documentation and such and then
I can send the module along and somebody can make whatever tweaks and check
it in.  Shall I open a ticket in Bugzilla for this and attach diffs, or just
send them along to somebody to take care of directly?
    Oh, one thing I have not mentioned.  I also added a -seqname argument.
Glimmer2 does not provide any kind of sequence identifier in the output, and
only processes the first sequence in a fasta file.  It would be tedious to
have to code around this by fixing up the predictions after they are
produced, so I added the option to provide this missing info up front,
hopefully allowing downstream code to not have to care as much and have a
special case for fixing up Glimmer2 predictions.

On 2/12/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:

> I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
> Here's why:
>
> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
> parse GlimmerM. I noted that GlimmerHMM was the same output format as
> GlimmerM, except for the first line. So in rev 1.5 I modified the
> regexp to match both ie. \S* . This would also hopefully match any
> other Glimmer-clone formats that arose. I also fixed the pdocs to say
> this, and added tests to t/Genpred.t.
> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>
> I then planned to extend support to Glimmer2 and Glimmer3. I added the
> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
> you have come in Mark :-)
>
> > I lifted that bit of code to do format detection...we don't have
> GlimmerHMM
> > installed locally, so I'm assuming Torsten's output is correct and the
> above
> > is a bug.  Guess I'll go check bugzilla...
>
> I'm pretty sure my 4 test files are correct - I spent a lot of time
> ensuring they were consistent etc, as I was getting very confused with
> the different "glimmer" versions!
>
> Hope this all helps,
>
> --Torsten
>

From cjfields at uiuc.edu  Tue Feb 13 15:47:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 14:47:19 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
Message-ID: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>

You'll also want to update whatever relevant tests there are for  
Glimmer; looks like they are in GenPred.t.

chris

On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote:

>     You're quite correct.  I wasn't paying enough attention.  That  
> does work
> just fine.  I fat-fingered something somewhere else, broke my  
> version of the
> module for GlimmerHMM, hallucinated and confused \S and \s.  8)
>     All I have left now is to fixup the POD documentation and such  
> and then
> I can send the module along and somebody can make whatever tweaks  
> and check
> it in.  Shall I open a ticket in Bugzilla for this and attach  
> diffs, or just
> send them along to somebody to take care of directly?
>     Oh, one thing I have not mentioned.  I also added a -seqname  
> argument.
> Glimmer2 does not provide any kind of sequence identifier in the  
> output, and
> only processes the first sequence in a fasta file.  It would be  
> tedious to
> have to code around this by fixing up the predictions after they are
> produced, so I added the option to provide this missing info up front,
> hopefully allowing downstream code to not have to care as much and  
> have a
> special case for fixing up Glimmer2 predictions.
>
> On 2/12/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>
>> I think it should be what it says, or perhaps now /^(Glimmer(M| 
>> HMM))/.
>> Here's why:
>>
>> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
>> parse GlimmerM. I noted that GlimmerHMM was the same output format as
>> GlimmerM, except for the first line. So in rev 1.5 I modified the
>> regexp to match both ie. \S* . This would also hopefully match any
>> other Glimmer-clone formats that arose. I also fixed the pdocs to say
>> this, and added tests to t/Genpred.t.
>> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
>> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>>
>> I then planned to extend support to Glimmer2 and Glimmer3. I added  
>> the
>> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
>> you have come in Mark :-)
>>
>>> I lifted that bit of code to do format detection...we don't have
>> GlimmerHMM
>>> installed locally, so I'm assuming Torsten's output is correct  
>>> and the
>> above
>>> is a bug.  Guess I'll go check bugzilla...
>>
>> I'm pretty sure my 4 test files are correct - I spent a lot of time
>> ensuring they were consistent etc, as I was getting very confused  
>> with
>> the different "glimmer" versions!
>>
>> Hope this all helps,
>>
>> --Torsten
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thokeller at gmail.com  Tue Feb 13 17:00:06 2007
From: thokeller at gmail.com (Thomas Keller)
Date: Tue, 13 Feb 2007 14:00:06 -0800
Subject: [Bioperl-l] update/install problem
Message-ID: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>

Could someone suggest a workaround or fix for this error?

$ sudo fink update bioperl-pm586
Information about 5850 packages read in 2 seconds.
The package 'bioperl-pm586' will be built and installed.
The package 'xml-sax-pm586' will be installed.
The package 'xml-sax-writer-pm586' will be built and installed.
The package 'xml-filter-buffertext-pm586' will be built and installed.
The following package will be installed or updated:
 bioperl-pm586
The following 3 additional packages will be installed:
 xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
Do you want to continue? [Y/n] Y
/sw/bin/dpkg-lockwait -i
/sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin-
powerpc.deb
(Reading database ... 48029 files and directories currently installed.)
Preparing to replace xml-sax-pm586 0.13-2 (using
.../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
Unpacking replacement xml-sax-pm586 ...
Setting up xml-sax-pm586 (0.13-2) ...
update-perl586-sax-parsers: adding Perl SAX parser module info file of
XML::SAX::PurePerl...
Can't locate object method "save_parsers_debian" via package "XML::SAX" at
/sw/sbin/update-perl586-sax-parsers line 96.
/sw/bin/dpkg: error processing xml-sax-pm586 (--install):
 subprocess post-installation script returned error exit status 22
Errors were encountered while processing:
 xml-sax-pm586
### execution of /sw/bin/dpkg-lockwait failed, exit code 1
Failed: can't install package xml-sax-pm586-0.13-2


-- 
Tom Keller
"Ecrasez l'Infame!" -- Voltaire

From sac at bioperl.org  Tue Feb 13 18:00:46 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 13 Feb 2007 15:00:46 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>

I noticed that Bio::Root::Utilities was purged from bioperl-live for the
1.5.2 release, but I'd like us to consider adding it back. I agree that the
other purged Root modules were ancient relics of the past, but Bio::Root::
Utilities.pm still has signs of life (at least I still find occasion to use
it, or refer to code in it).

I know that it's not currently used by any other modules in Bioperl, but
there are likely some legacy scripts out there that rely on it. Probably
most of those scripts are ones I've written, but there have been substantive
commits by others in the not-to-distant past (Dec 2005), so at least some
folks besides myself are using it and may hesitate to upgrade their bioperl
installation if it's absent.

I'm all for avoiding bloat in the codebase and am eager to see Bioperl be
more lean and mean, but I'd like to keep this module around. I'll agree to
add some tests for it as well as clean some things up (e.g., use
Bio::Root::IO to get temp file name).

Cheers,
Steve
--
Steve Chervitz
sac at bioperl.org

From cjfields at uiuc.edu  Tue Feb 13 20:29:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 19:29:03 -0600
Subject: [Bioperl-l] update/install problem
In-Reply-To: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
Message-ID: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>

On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote:

> Could someone suggest a workaround or fix for this error?
>
> $ sudo fink update bioperl-pm586
> Information about 5850 packages read in 2 seconds.
> The package 'bioperl-pm586' will be built and installed.
> The package 'xml-sax-pm586' will be installed.
> The package 'xml-sax-writer-pm586' will be built and installed.
> The package 'xml-filter-buffertext-pm586' will be built and installed.
> The following package will be installed or updated:
>  bioperl-pm586
> The following 3 additional packages will be installed:
>  xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
> Do you want to continue? [Y/n] Y
> /sw/bin/dpkg-lockwait -i
> /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ 
> xml-sax-pm586_0.13-2_darwin-
> powerpc.deb
> (Reading database ... 48029 files and directories currently  
> installed.)
> Preparing to replace xml-sax-pm586 0.13-2 (using
> .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
> Unpacking replacement xml-sax-pm586 ...
> Setting up xml-sax-pm586 (0.13-2) ...
> update-perl586-sax-parsers: adding Perl SAX parser module info file of
> XML::SAX::PurePerl...
> Can't locate object method "save_parsers_debian" via package  
> "XML::SAX" at
> /sw/sbin/update-perl586-sax-parsers line 96.
> /sw/bin/dpkg: error processing xml-sax-pm586 (--install):
>  subprocess post-installation script returned error exit status 22
> Errors were encountered while processing:
>  xml-sax-pm586
> ### execution of /sw/bin/dpkg-lockwait failed, exit code 1
> Failed: can't install package xml-sax-pm586-0.13-2

The fink installation seems to be hanging on XML::SAX, not bioperl.   
You could try installing XML::SAX (now at v. 0.15) via CPAN using  
'sudo cpan'; I updated just recently w/o problems.

As an aside, you could similarly install bioperl directly from CPAN  
(which I also haven't had any problems with).  The installation  
allows for installing optional modules.

chris


From cjfields at uiuc.edu  Tue Feb 13 22:41:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 21:41:31 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>


On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:

> I noticed that Bio::Root::Utilities was purged from bioperl-live  
> for the
> 1.5.2 release, but I'd like us to consider adding it back. I agree  
> that the
> other purged Root modules were ancient relics of the past, but  
> Bio::Root::
> Utilities.pm still has signs of life (at least I still find  
> occasion to use
> it, or refer to code in it).
>
> I know that it's not currently used by any other modules in  
> Bioperl, but
> there are likely some legacy scripts out there that rely on it.  
> Probably
> most of those scripts are ones I've written, but there have been  
> substantive
> commits by others in the not-to-distant past (Dec 2005), so at  
> least some
> folks besides myself are using it and may hesitate to upgrade their  
> bioperl
> installation if it's absent.
>
> I'm all for avoiding bloat in the codebase and am eager to see  
> Bioperl be
> more lean and mean, but I'd like to keep this module around. I'll  
> agree to
> add some tests for it as well as clean some things up (e.g., use
> Bio::Root::IO to get temp file name).
>
> Cheers,
> Steve
> --
> Steve Chervitz
> sac at bioperl.org

I don't have a problem with adding it back, esp. if tests are added.   
Everything in Bio::Root* not tied to a module was yanked out when no  
one spoke up about cleaning up Bio::Root* modules:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ 
focus=12839

Maybe others disagree?

chris

From bix at sendu.me.uk  Wed Feb 14 03:00:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 08:00:35 +0000
Subject: [Bioperl-l] update/install problem
In-Reply-To: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
	<C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
Message-ID: <45D2C1A3.9060300@sendu.me.uk>

Chris Fields wrote:
> As an aside, you could similarly install bioperl directly from CPAN  
> (which I also haven't had any problems with).

Indeed. If you follow the unix instructions at 
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have 
a problem-free complete install under Mac OS X.

From bix at sendu.me.uk  Wed Feb 14 09:08:22 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:08:22 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
Message-ID: <45D317D6.5070903@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> If Sendu is out there, I think we can safely remove any dependencies
>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>> modify Build.PL?
>>
>> Sure, good to hear.
> 
> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl 
> fix.  That likely obviates the need for a Bundle for XML::Simple.  Not 
> too pressing; we can determine that before the next release.

The bundle is now obsolete. Does anything in Bioperl, or any of its 
dependencies, now make use of the expat library? If not, I can remove 
mention of it from the install documentation.


From bix at sendu.me.uk  Wed Feb 14 09:02:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:02:39 +0000
Subject: [Bioperl-l] DB.t failures
Message-ID: <45D3167F.2000608@sendu.me.uk>

DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer 
getting sequences back from NCBI in the order we requested them in batch 
mode.

Is this a change at NCBI? Is there some way we can make sure to return 
the sequences in the expected order? Or shouldn't the order be expected 
(should the test script be altered)?

From cjfields at uiuc.edu  Wed Feb 14 09:37:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:37:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu>

Confirmed on this end.

It's possible that the default sort order from eutils is different  
now though I haven't seen anything on the eutils mail list.  There  
may be a way to set the sort order via the base URL; I'll check into  
it later today; I'm still digging myself out from the midwest blizzard.

chris

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:

> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.
>
> Is this a change at NCBI? Is there some way we can make sure to return
> the sequences in the expected order? Or shouldn't the order be  
> expected
> (should the test script be altered)?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Feb 14 09:42:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:42:05 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45D317D6.5070903@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
	<45D317D6.5070903@sendu.me.uk>
Message-ID: <E9611B3C-658E-4CBC-A2ED-1990F929A130@uiuc.edu>


On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> If Sendu is out there, I think we can safely remove any  
>>>> dependencies
>>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>>> modify Build.PL?
>>>
>>> Sure, good to hear.
>>
>> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl
>> fix.  That likely obviates the need for a Bundle for XML::Simple.   
>> Not
>> too pressing; we can determine that before the next release.
>
> The bundle is now obsolete. Does anything in Bioperl, or any of its
> dependencies, now make use of the expat library? If not, I can remove
> mention of it from the install documentation.

I'll try getting something up about XML::SAX on the wiki today.   
XML::Parser, though, still requires expat AFAIK:

http://www.bioperl.org/wiki/BioPerl_Dependencies

chris

From kellert at ohsu.edu  Tue Feb 13 17:43:24 2007
From: kellert at ohsu.edu (Thomas J Keller)
Date: Tue, 13 Feb 2007 14:43:24 -0800
Subject: [Bioperl-l] HowTo:SearchIO
Message-ID: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>

Greetings,
I've been away from programming and informatics for many months.  
Hoping to get back into it, I thought it would be good to review the  
tutorials.
I tried the code in the tutorial on the sample blast report in the  
tutorial and it worked fine. So I ran a blastx search and saved the  
results and tried to parse them: It gave the "... parsing" message,  
but no other results get reported.

Any suggestions?

Thanks,
Tom

Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From mrouard at gmail.com  Wed Feb 14 06:23:47 2007
From: mrouard at gmail.com (Mathieu Rouard)
Date: Wed, 14 Feb 2007 12:23:47 +0100
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
Message-ID: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>

Dear all,

I am starting to use the bioperl API to parse multiple alignments and I am
wondering what is the most effective way to splice all the columns from an
alignment (all the AA at the postion 1, position 2 etc.). I quickly
implemented this simple code but it becomes quite slow when the length of
sequences increases.

my $stream  = Bio::AlignIO->new(-file => $inputfilename,
                        '-format' => 'stockholm');

my $aln = $stream->next_aln();

my $length = $aln->length();
my %column;

for (my $i=1;$i<=$length;$i++) {
       my $aa;
        foreach my $seq ($aln->each_seq()) {
          my $obj = $seq->trunc($i,$i);
          $aa .=$obj->seq;
        }
     # need to track the column number and the sequence of the column
     push $column,  $aa;
}

Would you have any other suggestion?

thanks
Mathieu

From avilella at gmail.com  Wed Feb 14 10:29:02 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 14 Feb 2007 15:29:02 +0000
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
In-Reply-To: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
References: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com>

there is a slice method:

  $mini_aln = $aln->slice(20,30);  # get a block of columns

 Title     : slice
 Usage     : $aln2 = $aln->slice(20,30)
 Function  : Creates a slice from the alignment inclusive of start and
             end columns, and the first column in the alignment is denoted 1.
             Sequences with no residues in the slice are excluded from the
             new alignment and a warning is printed. Slice beyond the length of
             the sequence does not do padding.
 Returns   : A Bio::SimpleAlign object
 Args      : Positive integer for start column, positive integer for end column,
             optional boolean which if true will keep gap-only columns
in the newly
             created slice. Example:

             $aln2 = $aln->slice(20,30,1)

but I don't know how well it behaves for lots of sequences :)


On 2/14/07, Mathieu Rouard <mrouard at gmail.com> wrote:
> Dear all,
>
> I am starting to use the bioperl API to parse multiple alignments and I am
> wondering what is the most effective way to splice all the columns from an
> alignment (all the AA at the postion 1, position 2 etc.). I quickly
> implemented this simple code but it becomes quite slow when the length of
> sequences increases.
>
> my $stream  = Bio::AlignIO->new(-file => $inputfilename,
>                         '-format' => 'stockholm');
>
> my $aln = $stream->next_aln();
>
> my $length = $aln->length();
> my %column;
>
> for (my $i=1;$i<=$length;$i++) {
>        my $aa;
>         foreach my $seq ($aln->each_seq()) {
>           my $obj = $seq->trunc($i,$i);
>           $aa .=$obj->seq;
>         }
>      # need to track the column number and the sequence of the column
>      push $column,  $aa;
> }
>
> Would you have any other suggestion?
>
> thanks
> Mathieu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From jason at bioperl.org  Wed Feb 14 11:59:49 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 14 Feb 2007 08:59:49 -0800
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>

As always, reporting the version of BLAST and Bioperl you have  
installed will help someone diagnose if this is a fixed problem or  
not.  If you trawl through the list archives you'll chris and others  
have been playing cat and mouse with the text version output from  
NCBI BLAST which appears to be an ever evolving beast.

So the best advice right now is to get the latest bioperl from CVS   
to insure you have all the patches that might parse this version.  If  
it still fails then the standard response will be to submit the  
report as an attachment to a new bug report on the bugzilla.

thanks,
-jason


On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:

> Greetings,
> I've been away from programming and informatics for many months.
> Hoping to get back into it, I thought it would be good to review the
> tutorials.
> I tried the code in the tutorial on the sample blast report in the
> tutorial and it worked fine. So I ran a blastx search and saved the
> results and tried to parse them: It gave the "... parsing" message,
> but no other results get reported.
>
> Any suggestions?
>
> Thanks,
> Tom
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From dmessina at wustl.edu  Wed Feb 14 11:58:45 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 10:58:45 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu>

Hi Tom,

Could you tell us what version of BioPerl you are using, and what  
specific example is failing for  you? And could you post your code?

That would make it easier to diagnose the problem.

Thanks,
Dave

-- 
Dave Messina
Senior Programmer/Analyst, Assembly Group
WashU Genome Sequencing Center
dmessina a t  wustl.edu
314-286-1415


From cjfields at uiuc.edu  Wed Feb 14 12:28:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 11:28:24 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>

I would also strongly encourage switching to using XML-based parsing,  
which is much more stable now.  Here's the link to the NCBI response  
re: BLAST report parsing:

http://bioperl.org/wiki/NCBI_Blast_email

chris (taking a break from shoveling snow...)

On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote:

> As always, reporting the version of BLAST and Bioperl you have
> installed will help someone diagnose if this is a fixed problem or
> not.  If you trawl through the list archives you'll chris and others
> have been playing cat and mouse with the text version output from
> NCBI BLAST which appears to be an ever evolving beast.
>
> So the best advice right now is to get the latest bioperl from CVS
> to insure you have all the patches that might parse this version.  If
> it still fails then the standard response will be to submit the
> report as an attachment to a new bug report on the bugzilla.
>
> thanks,
> -jason
>
>
> On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:
>
>> Greetings,
>> I've been away from programming and informatics for many months.
>> Hoping to get back into it, I thought it would be good to review the
>> tutorials.
>> I tried the code in the tutorial on the sample blast report in the
>> tutorial and it worked fine. So I ran a blastx search and saved the
>> results and tried to parse them: It gave the "... parsing" message,
>> but no other results get reported.
>>
>> Any suggestions?
>>
>> Thanks,
>> Tom
>>
>> Tom Keller, Ph.D.
>> kellert at ohsu.edu
>> 503-494-2442
>> 6339b Basic Science Bldg
>> http://www.ohsu.edu/research/core
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sac at bioperl.org  Wed Feb 14 13:20:17 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 14 Feb 2007 10:20:17 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:
>
> > I noticed that Bio::Root::Utilities was purged from bioperl-live
> > for the
> > 1.5.2 release, but I'd like us to consider adding it back. I agree
> > that the
> > other purged Root modules were ancient relics of the past, but
> > Bio::Root::
> > Utilities.pm still has signs of life (at least I still find
> > occasion to use
> > it, or refer to code in it).
> >
> > I know that it's not currently used by any other modules in
> > Bioperl, but
> > there are likely some legacy scripts out there that rely on it.
> > Probably
> > most of those scripts are ones I've written, but there have been
> > substantive
> > commits by others in the not-to-distant past (Dec 2005), so at
> > least some
> > folks besides myself are using it and may hesitate to upgrade their
> > bioperl
> > installation if it's absent.
> >
> > I'm all for avoiding bloat in the codebase and am eager to see
> > Bioperl be
> > more lean and mean, but I'd like to keep this module around. I'll
> > agree to
> > add some tests for it as well as clean some things up (e.g., use
> > Bio::Root::IO to get temp file name).
> >
> > Cheers,
> > Steve
> > --
> > Steve Chervitz
> > sac at bioperl.org
>
> I don't have a problem with adding it back, esp. if tests are added.
> Everything in Bio::Root* not tied to a module was yanked out when no
> one spoke up about cleaning up Bio::Root* modules:
>
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
> focus=12839
>
> Maybe others disagree?
>
> chris
>

Sorry I missed out on that thread. I had some trouble with my bioperl-l
email delivery getting disabled due to excessive bounces, and it took me a
while to catch it.

Bio::Root::Utilities is quite a grab bag of miscellaneous general functions
that are occasionally useful for perl scripting (e.g., determining
end-of-line characters, sending email, etc.). The code could definitely use
a review, and maybe an example script to advertise it. I can look into this,
and suggestions are welcome.

Steve

From dmessina at wustl.edu  Wed Feb 14 13:55:18 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 12:55:18 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>


On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:

> I would also strongly encourage switching to using XML-based parsing,

Unless anyone objects, I would be happy to update the HOWTO to  
suggest people make the switch and give an example of XML parsing.

The Bio::SearchIO synopsis is already an XML example. However,  
there's no warning about text-based parsing nor a suggestion to use  
XML that I can see -- perhaps should be added?

Dave


From cjfields at uiuc.edu  Wed Feb 14 15:12:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 14:12:21 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
Message-ID: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>


On Feb 14, 2007, at 12:55 PM, David Messina wrote:

>
> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:
>
>> I would also strongly encourage switching to using XML-based parsing,
>
> Unless anyone objects, I would be happy to update the HOWTO to
> suggest people make the switch and give an example of XML parsing.
>
> The Bio::SearchIO synopsis is already an XML example. However,
> there's no warning about text-based parsing nor a suggestion to use
> XML that I can see -- perhaps should be added?
>
> Dave

We should probably add something specifically for BLAST, yes.  Other  
text parsers should be fine.

Personally, I use XML or tabular output parsing simply b/c they are  
faster and do what I need.  I think we'll need to retain the  
capability for text-based BLAST parsing, but it will become extremely  
bloated long-term if we plan on continuing support for parsing all  
versions and flavors of BLAST, particularly if NCBI continues to  
change the output.

chris

From dmessina at wustl.edu  Wed Feb 14 15:46:31 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 14:46:31 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
	<C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu>

On Feb 14, 2007, at 2:12 PM, Chris Fields wrote:

> We should probably add something specifically for BLAST, yes.   
> Other text parsers should be fine.

Good point -- I'll make it clear it's only pertinent to BLAST.


> I think we'll need to retain the capability for text-based BLAST  
> parsing,

Agreed. Through the 1.6 release at least, I would think.


> particularly if NCBI continues to change the output.

Well, clearly the solution is not to use the NCBI flavor of BLAST. :)


Dave
(look at my email address)


From jay at jays.net  Thu Feb 15 08:08:56 2007
From: jay at jays.net (Jay Hannah)
Date: Thu, 15 Feb 2007 07:08:56 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.

Is this the same result you get?


DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
         Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
okay, 85.84%)
Failed Test Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
8 subtests skipped.


Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From bix at sendu.me.uk  Thu Feb 15 08:37:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 13:37:32 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
Message-ID: <45D4621C.6040309@sendu.me.uk>

Jay Hannah wrote:
> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>> getting sequences back from NCBI in the order we requested them in  
>> batch
>> mode.
> 
> Is this the same result you get?
> 
> 
> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
> okay, 85.84%)
> Failed Test Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
> 8 subtests skipped.

Yes, those fails are all caused by results in the wrong order (I believe).

From cjfields at uiuc.edu  Thu Feb 15 09:22:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:22:09 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <CF92D281-CAC2-415C-91A9-CBA0893336B9@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

I'm fixing those now so it doesn't depend on order and will commit in  
the next few minutes.

chris

From bix at sendu.me.uk  Thu Feb 15 09:37:00 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 14:37:00 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
Message-ID: <45D4700C.8020305@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
> 
>> Jay Hannah wrote:
>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>>> getting sequences back from NCBI in the order we requested them in
>>>> batch mode.
 >
> Okay, I committed a fix for that.  I hope there are many users who 
> depend on the returned sequence order for anything!

s/are/aren't/ ?

I suspect there might be, and its certainly a reasonable assumption to 
make. Did you not see an easy way of maintaining the order?

From cjfields at uiuc.edu  Thu Feb 15 09:28:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:28:46 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

Okay, I committed a fix for that.  I hope there are many users who  
depend on the returned sequence order for anything!

chris


From michael.watson at bbsrc.ac.uk  Thu Feb 15 09:44:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 15 Feb 2007 14:44:27 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

OK I have some great images out of this glyph, but I can't see the axis,
and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
publication.  The docs say:

"NOTE: -gc_window=>'auto' gives nice results and is recommended for
drawing GC content. The GC content axes draw slightly outside the
panel, so you may wish to add some extra padding on the right and
left. "

Any idea how to do this?

Basically, I want a nice GC graph with the axis quite clearly labelled,
and a nice "%GC" title next to it :)

Thanks

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From nehadnahar at yahoo.co.in  Thu Feb 15 10:28:42 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>
Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com>

Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine.

Regards,
Neha.

Jason Stajich <jason at bioperl.org> wrote: Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich  wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich  wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>      
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> 


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 

From cjfields at uiuc.edu  Thu Feb 15 10:44:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 09:44:23 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4700C.8020305@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>


On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>
>>> Jay Hannah wrote:
>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no  
>>>>> longer
>>>>> getting sequences back from NCBI in the order we requested them in
>>>>> batch mode.
>>
>> Okay, I committed a fix for that.  I hope there are many users who
>> depend on the returned sequence order for anything!
>
> s/are/aren't/ ?

Yes, my oops.

> I suspect there might be, and its certainly a reasonable assumption to
> make. Did you not see an easy way of maintaining the order?

I haven't looked (been busy the last few days), but I think there is  
a way via efetch.

We could add in something to the default base URL if there is  
something or (probably better) add a sort_order() method to designate  
a particular sort order, defaulting to the old order if not set.

chris

From lstein at cshl.edu  Thu Feb 15 13:53:13 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Feb 2007 13:53:13 -0500
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>

Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20);

This will leave enough room on the left and right for you to see the Y axis.
Otherwise it runs off the edge of the image (ok, this is a mis-design, but
it was the only way to solve a chicken-and-egg problem about who gets to say
how wide the panel is)

Lincoln

On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote:
>
> Hi
>
> OK I have some great images out of this glyph, but I can't see the axis,
> and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
> publication.  The docs say:
>
> "NOTE: -gc_window=>'auto' gives nice results and is recommended for
> drawing GC content. The GC content axes draw slightly outside the
> panel, so you may wish to add some extra padding on the right and
> left. "
>
> Any idea how to do this?
>
> Basically, I want a nice GC graph with the axis quite clearly labelled,
> and a nice "%GC" title next to it :)
>
> Thanks
>
> Mick
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful.
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute.
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From johnsonm at gmail.com  Thu Feb 15 14:24:08 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 13:24:08 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
Message-ID: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>

Done.  Bug opened in Bugzilla, diffs attached including new/updated tests:

http://bugzilla.open-bio.org/show_bug.cgi?id=2206

Can somebody grab that, take a look, tweak to taste, test and commit?  Tests
pass on my end presently.

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> You'll also want to update whatever relevant tests there are for
> Glimmer; looks like they are in GenPred.t.
>
> chris
>

From cjfields at uiuc.edu  Thu Feb 15 14:37:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:37:22 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
	<ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu>


On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote:

> Done.  Bug opened in Bugzilla, diffs attached including new/updated  
> tests:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2206
>
> Can somebody grab that, take a look, tweak to taste, test and  
> commit?  Tests
> pass on my end presently.
>
> On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> You'll also want to update whatever relevant tests there are for
>> Glimmer; looks like they are in GenPred.t.
>>
>> chris

Done; everything passed on this end as well, no tweaking necessary.   
If there are problems we'll definitely hear about it down the road  
(Glimmer is a popular tool), but I think you'll be fine.

Thanks Mark!

chris

From cjfields at uiuc.edu  Thu Feb 15 14:46:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:46:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
	<809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
Message-ID: <FA9F2E96-064B-4C8F-87BB-D72A7D6F6910@uiuc.edu>


On Feb 15, 2007, at 9:44 AM, Chris Fields wrote:

>
> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>>
>>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>>
>>>> Jay Hannah wrote:
>>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no
>>>>>> longer
>>>>>> getting sequences back from NCBI in the order we requested  
>>>>>> them in
>>>>>> batch mode.
>>>
>>> Okay, I committed a fix for that.  I hope there are many users who
>>> depend on the returned sequence order for anything!
>>
>> s/are/aren't/ ?
>
> Yes, my oops.
>
>> I suspect there might be, and its certainly a reasonable  
>> assumption to
>> make. Did you not see an easy way of maintaining the order?
>
> I haven't looked (been busy the last few days), but I think there is
> a way via efetch.
>
> We could add in something to the default base URL if there is
> something or (probably better) add a sort_order() method to designate
> a particular sort order, defaulting to the old order if not set.
>
> chris

Delving in to it further, the problem only occurs when using  
get_seq_stream() directly in batch mode, which is likely only used by  
developers for testing.  The sort issue only pops up when eposting  
IDs using that mode; retrieved seqs are returned in a different order  
than through a direct efetch query (the default with get_Stream* or  
get_Seq* methods).  No use of the 'sort' parameter works to get  
around that problem, not a complete surprise since it is supposed to  
only work for PubMed, but since the method is rarely used I'll just  
leave the bullet-proofed tests alone.

chris


From letondal at pasteur.fr  Thu Feb 15 15:23:55 2007
From: letondal at pasteur.fr (Catherine Letondal)
Date: Thu, 15 Feb 2007 21:23:55 +0100
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
Message-ID: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>

Hi bioperlers,

I have a script called protal2dna 
(http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see 
attachment #1) that realign DNA sequences giving their sequences + the 
corresponding protein alignment (sequences have to be in the same order 
or named equivalently). We have a parsing problem reported from the 
AlignIO class when users enter some clustalw file (see attachment #2 
for an example):

% protal2dna alig-protal2dna.dat dna-protal2dna.data
no alignment available in 'clustalw' format from file 
'alig-protal2dna.dat'
%

I have tried with bioperl 1.4. I have looked in the archive and in the 
BUGS, but found nothing?
Is there any bug fix for this? I also provide the DNA sequences file if 
you want to test.

Thanks a lot in advance,

--
Catherine Letondal -- Institut Pasteur
www.pasteur.fr/~letondal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: protal2dna
Type: application/octet-stream
Size: 11093 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment.obj 
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: alig-protal2dna.dat
Type: application/octet-stream
Size: 12022 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0001.obj 
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dna-protal2dna.data
Type: application/octet-stream
Size: 7739 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0002.obj 

From Kevin.M.Brown at asu.edu  Thu Feb 15 16:38:25 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 15 Feb 2007 14:38:25 -0700
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
In-Reply-To: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
References: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu>

Did you try Bioperl 1.5.2 to see if updates to it might fix the issue?
IIRC 1.4 is nearly 2 years old now.  1.5.2 was released within the last
few months.

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Catherine Letondal
> Sent: Thursday, February 15, 2007 1:24 PM
> To: bioperl-l
> Cc: Catherine Letondal; Katja Schuerer
> Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
> 
> Hi bioperlers,
> 
> I have a script called protal2dna
> (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, 
> see attachment #1) that realign DNA sequences giving their 
> sequences + the corresponding protein alignment (sequences 
> have to be in the same order or named equivalently). We have 
> a parsing problem reported from the AlignIO class when users 
> enter some clustalw file (see attachment #2 for an example):
> 
> % protal2dna alig-protal2dna.dat dna-protal2dna.data no 
> alignment available in 'clustalw' format from file 
> 'alig-protal2dna.dat'
> %
> 
> I have tried with bioperl 1.4. I have looked in the archive 
> and in the BUGS, but found nothing?
> Is there any bug fix for this? I also provide the DNA 
> sequences file if you want to test.
> 
> Thanks a lot in advance,
> 
> --
> Catherine Letondal -- Institut Pasteur
> www.pasteur.fr/~letondal
> 
> 


From cjfields at uiuc.edu  Thu Feb 15 16:50:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:50:54 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
	<8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
Message-ID: <C53B465C-8BBA-4DE7-92BC-FFC5DDBEB4AA@uiuc.edu>


On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote:
...

>>
>> I don't have a problem with adding it back, esp. if tests are added.
>> Everything in Bio::Root* not tied to a module was yanked out when no
>> one spoke up about cleaning up Bio::Root* modules:
>>
>> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
>> focus=12839
>>
>> Maybe others disagree?
>>
>> chris
>>
>
> Sorry I missed out on that thread. I had some trouble with my  
> bioperl-l
> email delivery getting disabled due to excessive bounces, and it  
> took me a
> while to catch it.
>
> Bio::Root::Utilities is quite a grab bag of miscellaneous general  
> functions
> that are occasionally useful for perl scripting (e.g., determining
> end-of-line characters, sending email, etc.). The code could  
> definitely use
> a review, and maybe an example script to advertise it. I can look  
> into this,
> and suggestions are welcome.
>
> Steve

Steve,

I have added Root::Utilities back to CVS but I didn't know if I  
should add back the other related Root modules (didn't know what your  
future plans were for them).  Could the Bio::Root::Global and  
Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or  
would that be too problematic?  None of the other Bio* modules  
currently use them.

Personally, I use Date::Manip for anything that requires date/time  
manipulation (updating seq records based on dates, for instance).   
Some of the other utilities could come in handy, though.  Don't know  
if that helps...

chris

From cjfields at uiuc.edu  Thu Feb 15 16:51:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:51:58 -0600
Subject: [Bioperl-l] XEMBL deprecation
Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>

I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService  
both for deprecation in the wiki and in CVS (though I haven't set any  
timeline):

http://www.bioperl.org/wiki/Deprecated_modules

The XEMBL web services are no longer available, and it looks like  
everything is running through DBFetch now.  The XEMBL tests are  
skipped if no server is detected, so they shouldn't cause any  
problems with Bioperl installations.

Lincoln, was there anything to salvage from these?  I noticed they  
used SOAP::Lite, so maybe we could convert these over to a SOAP-based  
interface to DBFetch web services?

chris

From johnsonm at gmail.com  Thu Feb 15 17:29:37 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 16:29:37 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Glimmer?
Message-ID: <ebf5eb170702151429w233ec66dkfb89743a4b8e687e@mail.gmail.com>

    Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3
output, I suppose I might as well go and write Bio::Tools::Run::Glimmer.  I
suspect another 4-in-1 module may be possible.  Now that I think about it,
I'll need one for GeneMark, too.
    Comments?  Suggestions on a good module to use as a template?

From hlapp at gmx.net  Thu Feb 15 20:18:56 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:18:56 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>


On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:

> The XEMBL web services are no longer available

What happens if someone invokes the module? Should it maybe return  
nothing and warn()? I don't think it's a good idea if the module just  
silently does not function because its backend is no more.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Feb 15 20:48:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:48:12 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>

On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:

> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>
>> The XEMBL web services are no longer available
>
> What happens if someone invokes the module? Should it maybe return  
> nothing and warn()? I don't think it's a good idea if the module  
> just silently does not function because its backend is no more.
>
> 	-hilmar

Yes, I thought the same.  I have added a warn() noting the  
deprecation to the XEMBL constructor and removed XEMBL tests from  
CVS.  The modules are still there for the time being.

I actually worry more about the internals; it would be a shame to  
toss them altogether.  Would it be worth it to shift this towards a  
SOAP-based interface to DBFetch?  Or, more precisely, how much  
trouble would it be to do so?

chris

From hlapp at gmx.net  Thu Feb 15 20:54:29 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:54:29 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
Message-ID: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>

Well, if dbFetch dosn't have a SOAP based interface, how would you  
want to do this?

	-hilmar

On Feb 15, 2007, at 8:48 PM, Chris Fields wrote:

> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:
>
>> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>>
>>> The XEMBL web services are no longer available
>>
>> What happens if someone invokes the module? Should it maybe return  
>> nothing and warn()? I don't think it's a good idea if the module  
>> just silently does not function because its backend is no more.
>>
>> 	-hilmar
>
> Yes, I thought the same.  I have added a warn() noting the  
> deprecation to the XEMBL constructor and removed XEMBL tests from  
> CVS.  The modules are still there for the time being.
>
> I actually worry more about the internals; it would be a shame to  
> toss them altogether.  Would it be worth it to shift this towards a  
> SOAP-based interface to DBFetch?  Or, more precisely, how much  
> trouble would it be to do so?
>
> chris

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Feb 15 20:59:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:59:46 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
	<FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu>


On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote:

> Well, if dbFetch dosn't have a SOAP based interface, how would you  
> want to do this?
>
> 	-hilmar

DBfetch has a SOAP-based interface:

http://www.ebi.ac.uk/Tools/webservices/services/dbfetch

Just not sure how easy it would be to switch XEMBL code over to using  
it.  We already have Bio::DB::DBFetch so it may be redundant, but I  
don't recall any other SOAP-based tools in BioPerl beyond some stuff  
in bioperl-run (and I'm not sure how up-to-date the DBFetch module is).

chris


From jimhu at tamu.edu  Fri Feb 16 00:20:09 2007
From: jimhu at tamu.edu (Jim Hu)
Date: Thu, 15 Feb 2007 23:20:09 -0600
Subject: [Bioperl-l] Pathway tools output parser
In-Reply-To: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
References: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu>

Hi Chris,

I need to check the list more often!  I never got an answer here, but  
Eric Just pointed out a perl api at TAIR that's linked from the  
BioCyc site.  I've used the lisp parser functions from that to move  
the data to a perl array of arrays, and I'm working on creating  
object classes for BioCyc objects, starting with genes and products.

I need to look at the appropriate ways to link this up to the  
existing codebase for interconverting to Chado and other BioPerl data  
types.

Jim
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote:

>
> Hi Jim
>
> Did you ever get an answer to this? I'm interested in storing  
> pathway data
> in Chado & I remember enough lisp to get it into something perl- 
> manageable
> like XML
>
> On Thu, 25 Jan 2007, Jim Hu wrote:
>
>> Is there a module to parse the lisp object files from Peter Karp's
>> Pathway Tools?   I need a parser to convert the gene and protein
>> objects in EcoCyc releases into something that can be imported into
>> Chado.
>> =====================================
>> Jim Hu
>> Associate Professor
>> Dept. of Biochemistry and Biophysics
>> 2128 TAMU
>> Texas A&M Univ.
>> College Station, TX 77843-2128
>> 979-862-4054
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From lstein at cshl.edu  Fri Feb 16 08:35:19 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:35:19 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D1E2A5.6060104@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>

Hi,

Older versions of Storable can't deal with features that contain subroutine
refs. You should get the current version from CPAN. Note that there is a
slight security problem here if you don't trust the objects stored in the
database. If they contain code refs, the code will be evaluated during
deserialization.

Lincoln

On 2/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database
> and wanted to associated some basic information with them, like exon
> positions. I thought of creating Bio::SeqFeature::Gene::Transcript
> objects and storing them so I could later use features() to see what
> other features overlapped exons. I ran into a fatal error that can be
> replicated with the following simplified one-liner:
>
> perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e
> '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn =>
> "dbi:mysql:test"); $trans =
> Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id
> => "test"); $db->store($trans); @trans = $db->features(-seqid => $id,
> -type => "transcript"); print "@trans\n";'
>
> code sub {
>      package Bio::SeqFeature::Generic;
>      use strict 'refs';
>      my $self = shift @_;
>      foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
>          $f = undef;
>      }
>      $$self{'_gsf_seq'} = undef;
>      foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
>          $$self{'_gsf_tag_hash'}{$t} = undef;
>          delete $$self{'_gsf_tag_hash'}{$t};
>      }
> } did not evaluate to a subroutine reference, at
> /.../Bio/DB/SeqFeature/Store.pm line 2280
>
>
> Is this a bug? Or am I taking the wrong approach?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Fri Feb 16 08:47:29 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:47:29 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com>

Hi Sendu,

I'll do a little digging and let you know.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Fri Feb 16 08:52:30 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:52:30 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>

It looks like 2.05 or higher is the Storable version to use. It requires
B::Deparse, which is (I think) standard on perl 5.6 or higher.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Fri Feb 16 08:55:06 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:06 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>

I like the idea of converting these over to use DBFetch's SOAP services. On
the other hand, it isn't llikely that I'm going to have time to do this
anytime soon.

Probably the best thing to do is to issue a warning and return undef if
someone tries to use othe XEMBL module. I'll make that change.

Lincoln

On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Fri Feb 16 08:55:47 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:47 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>

Oh, looks like someone has inserted the warnings already. Good.

Lincoln

On 2/16/07, Lincoln Stein <lstein at cshl.edu> wrote:
>
> I like the idea of converting these over to use DBFetch's SOAP services.
> On the other hand, it isn't llikely that I'm going to have time to do this
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return undef if
> someone tries to use othe XEMBL module. I'll make that change.
>
> Lincoln
>
> On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> > both for deprecation in the wiki and in CVS (though I haven't set any
> > timeline):
> >
> > http://www.bioperl.org/wiki/Deprecated_modules
> >
> > The XEMBL web services are no longer available, and it looks like
> > everything is running through DBFetch now.  The XEMBL tests are
> > skipped if no server is detected, so they shouldn't cause any
> > problems with Bioperl installations.
> >
> > Lincoln, was there anything to salvage from these?  I noticed they
> > used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> > interface to DBFetch web services?
> >
> > chris
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From bix at sendu.me.uk  Fri Feb 16 08:56:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:56:50 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>	
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>	
	<45D5B42A.1080303@sendu.me.uk>
	<6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
Message-ID: <45D5B822.6080908@sendu.me.uk>

Lincoln Stein wrote:
> It looks like 2.05 or higher is the Storable version to use. It requires 
> B::Deparse, which is (I think) standard on perl 5.6 or higher.

Thanks, now recommended in Build.PL

From cjfields at uiuc.edu  Fri Feb 16 09:05:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Feb 2007 08:05:08 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
	<6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
Message-ID: <ACAF9E26-CBDD-43AC-8D3E-0CADFF5B9576@uiuc.edu>

I added the warning yesterday.

We can add something to the project priority list on modifying XEMBL  
to use DBFetch instead; I like the SOAP-based interface.  I am  
thinking of a similar interface for NCBI eutils but I haven't had  
time to work on it.

chris

On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote:

> Oh, looks like someone has inserted the warnings already. Good.
>
> Lincoln
>
> On 2/16/07, Lincoln Stein <lstein at cshl.edu > wrote:I like the idea  
> of converting these over to use DBFetch's SOAP services. On the  
> other hand, it isn't llikely that I'm going to have time to do this  
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return  
> undef if someone tries to use othe XEMBL module. I'll make that  
> change.
>
> Lincoln
>
>
> On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone  
> ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Feb 16 08:39:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:39:54 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
Message-ID: <45D5B42A.1080303@sendu.me.uk>

Lincoln Stein wrote:
> Hi,
> 
> Older versions of Storable can't deal with features that contain 
> subroutine refs. You should get the current version from CPAN.

Do you have any idea which version of Storable first supported this? I 
can specify that version in Bioperl's Build.PL.

(else I just just specify the latest version)

From eu at otelo-online.de  Sat Feb 17 07:55:08 2007
From: eu at otelo-online.de (eu at otelo-online.de)
Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET)
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18>

Hello @all,

i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.

Can somebody help me? I dont know  whether it is  possible?
Because i need for each amino acid a positive, negative charge and unchargedly.

thx
 

Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig
und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer,
nur  44,85 ?  inkl. DSL- und ISDN-Grundgeb?hr!
http://www.arcor.de/rd/emf-dsl-2


From The_Polymorph at rocketmail.com  Sun Feb 18 14:08:34 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST)
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
Message-ID: <148421.50501.qm@web50801.mail.yahoo.com>

Hi.

In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
1.5.2_100, I noticed the ppm was not found on the activestate
repositories. 

Thanks,

~Caitlin


____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 

From bix at sendu.me.uk  Sun Feb 18 15:36:03 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 18 Feb 2007 20:36:03 +0000
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com>
References: <148421.50501.qm@web50801.mail.yahoo.com>
Message-ID: <45D8B8B3.4000408@sendu.me.uk>

Caitlin wrote:
> Hi.
> 
> In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
> 1.5.2_100, I noticed the ppm was not found on the activestate
> repositories. 

Follow the install instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Its not in the normal activestate repository, but on bioperl.org.


From t.nugent at cs.ucl.ac.uk  Mon Feb 19 12:29:48 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 19 Feb 2007 17:29:48 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk>

Hi everyone,

I've written a perl module to display transmembrane protein topology 
using GD. There are various options, including labels, helix/loop 
dimensions, colour schemes etc but it only requires a string or array 
containing the protein topology (e.g. transmembrane helix start/stop 
points). It produces output like this:

http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png

using the code at the bottom.

Here is a the module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm

I've never submitted anything to Bioperl before - is this sort of thing 
likely to be of use to others? I imagine it would sit alongside some of 
the Bio::Graphics stuff.

Best wishes,

Tim

#!/usr/bin/perl

use strict;
use warnings;
use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
use DrawTransmembrane;

my @topology = (20,45,59,70,86,109,145,168,194,220);

my %labels = ('5' => '5 - Sulphation Site',
               '21' => '1st Helix',
               '47' => '40 - Mutation',
               '60' => 'Voltage Sensor',
               '72' => '72 - Mutation 2',
               '73' => '73 - Mutation 3',
               '138' => '138 - Glycosylation Site',
               '170' => '170 - Phosphorylation Site',
               '200' => 'Last Helix');

my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
cartoon displaying transmembrane helices.',
                                                -topology => \@topology,
                                                -n_terminal => 'out',
                                                -helix_width => 48,
                                                -helix_height => 125,
                                                -short_loop_limit => 10,
                                                -long_loop_limit => 35,
                                                -loop_width => 25,
                                                -colour_scheme => 'yellow',
                                                -labels => \%labels,
                                                -text_offset => -10);

## print the .png file
my $output = 'test.png';
open(OUTPUT, ">$output");
binmode OUTPUT;
print OUTPUT $im->png;
close OUTPUT;

my $system = `display $output`;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk

From bix at sendu.me.uk  Mon Feb 19 12:42:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 19 Feb 2007 17:42:23 +0000
Subject: [Bioperl-l] t/FeatureHolder.x
Message-ID: <45D9E17F.4030302@sendu.me.uk>

Is this supposed to work? It doesn't get run in the test suite normally 
because of its name.

With a live checkout I get:
./Build test --test_files t/FeatureHolder.x --verbose
t/FeatureHolder....1..6
ok 1
ok 2
Set group tag to: locus_tag
GROUPS:
   GROUP [?]:source

[snip]

   resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) 
Bio::SeqFeature::Generic=HASH(0x1362830)
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [BG:DS07721.3]:gene mRNA CDS
UNFLATTENING GROUP:
   GROUP [BG:DS07721.6]:gene mRNA CDS

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: DUPLICATE ID: AAF53399.1
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359
STACK: 
Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175
STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245
STACK: t/FeatureHolder.x:68
-----------------------------------------------------------
dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay
Failed Test       Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/FeatureHolder.x  255 65280     6    8  3-6
Failed 1/1 test scripts. 4/6 subtests failed.
Files=1, Tests=6,  1 wallclock secs ( 0.55 cusr +  0.04 csys =  0.59 CPU)
Failed 1/1 test programs. 4/6 subtests failed.


It also fails quite differently with 1.5.2.

From cjfields at uiuc.edu  Mon Feb 19 15:04:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 14:04:20 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <45D9E17F.4030302@sendu.me.uk>
References: <45D9E17F.4030302@sendu.me.uk>
Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>

Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know  
if he's stalking the mail list.

Wonder if this has anything to do the feature/annotation changes  
around rel 1.5.

(the other) chris

On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:

> Is this supposed to work? It doesn't get run in the test suite  
> normally
> because of its name.
>
> With a live checkout I get:
> ./Build test --test_files t/FeatureHolder.x --verbose
> t/FeatureHolder....1..6
...

From cjfields at uiuc.edu  Mon Feb 19 16:24:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 15:24:04 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>

I think this is pretty nice!  We can add the code and test script to  
bugzilla and (if someone has time) try to see where it might fit in,  
though Bio::Graphics sounds like a good spot.

Anyone else have ideas on where this could go?

chris

On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've written a perl module to display transmembrane protein topology
> using GD. There are various options, including labels, helix/loop
> dimensions, colour schemes etc but it only requires a string or array
> containing the protein topology (e.g. transmembrane helix start/stop
> points). It produces output like this:
>
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>
> using the code at the bottom.
>
> Here is a the module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>
> I've never submitted anything to Bioperl before - is this sort of  
> thing
> likely to be of use to others? I imagine it would sit alongside  
> some of
> the Bio::Graphics stuff.
>
> Best wishes,
>
> Tim
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
> use DrawTransmembrane;
>
> my @topology = (20,45,59,70,86,109,145,168,194,220);
>
> my %labels = ('5' => '5 - Sulphation Site',
>                '21' => '1st Helix',
>                '47' => '40 - Mutation',
>                '60' => 'Voltage Sensor',
>                '72' => '72 - Mutation 2',
>                '73' => '73 - Mutation 3',
>                '138' => '138 - Glycosylation Site',
>                '170' => '170 - Phosphorylation Site',
>                '200' => 'Last Helix');
>
> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
> cartoon displaying transmembrane helices.',
>                                                 -topology =>  
> \@topology,
>                                                 -n_terminal => 'out',
>                                                 -helix_width => 48,
>                                                 -helix_height => 125,
>                                                 -short_loop_limit  
> => 10,
>                                                 -long_loop_limit =>  
> 35,
>                                                 -loop_width => 25,
>                                                 -colour_scheme =>  
> 'yellow',
>                                                 -labels => \%labels,
>                                                 -text_offset => -10);
>
> ## print the .png file
> my $output = 'test.png';
> open(OUTPUT, ">$output");
> binmode OUTPUT;
> print OUTPUT $im->png;
> close OUTPUT;
>
> my $system = `display $output`;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Feb 19 17:23:56 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 19 Feb 2007 14:23:56 -0800
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
Message-ID: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>


On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:

> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
> if he's stalking the mail list.

occasionally..

> Wonder if this has anything to do the feature/annotation changes
> around rel 1.5.

possibly even before then.

there was a reason for the .x prefix... I think it was intended to  
denote requirements; tests that don't pass yet but should in the future

anyway, this file can go

> (the other) chris
>
> On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:
>
>> Is this supposed to work? It doesn't get run in the test suite
>> normally
>> because of its name.
>>
>> With a live checkout I get:
>> ./Build test --test_files t/FeatureHolder.x --verbose
>> t/FeatureHolder....1..6
> ...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From torsten.seemann at infotech.monash.edu.au  Mon Feb 19 18:20:48 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Feb 2007 10:20:48 +1100
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18>
References: <29037001.1171716908969.JavaMail.ngmail@webmail18>
Message-ID: <a79f6a4b0702191520l55625d6dif027df04b9841587@mail.gmail.com>

> i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
> OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.
> Can somebody help me? I dont know  whether it is  possible?
> Because i need for each amino acid a positive, negative charge and unchargedly.

The latest released Bioperl 1.5.x has a charge() function which does
what you want:

http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html

It returns A, N, C for the charges.

--Torsten

From bix at sendu.me.uk  Tue Feb 20 06:18:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Feb 2007 11:18:14 +0000
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
Message-ID: <45DAD8F6.1030409@sendu.me.uk>

Bio::Graphics::FeatureBase::seq_id is currently implemented as a 
read-only alias to ref():
sub seq_id          { shift->ref()         }


What is the reasoning behind this? Can it be made to handle setting of 
the value as well?:
sub seq_id          { shift->ref(@_)       }


Cheers,
Sendu.

From cjfields at uiuc.edu  Tue Feb 20 08:39:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:39:11 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
	<F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu>


On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote:

> On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:
>
>> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
>> if he's stalking the mail list.
>
> occasionally..
>
>> Wonder if this has anything to do the feature/annotation changes
>> around rel 1.5.
>
> possibly even before then.
>
> there was a reason for the .x prefix... I think it was intended to
> denote requirements; tests that don't pass yet but should in the  
> future
>
> anyway, this file can go

Chris,

I removed it from CVS.  Thanks!

(the other) chris besides chris D.

P.S. I may have some Data::Stag questions for you at some point.  I'm  
guessing you're still at fruitfly.org?

From cjfields at uiuc.edu  Tue Feb 20 08:29:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:29:20 -0600
Subject: [Bioperl-l] Fwd: help on remote blast
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu>

Sanjib,

You shouldn't email the developers directly.  Questions like this  
should go to the bioperl mail list in case I (or others) can't answer  
them immediately.

chris

Begin forwarded message:

> From: "Sanjib Kumar Gupta" <sanjib at bic.boseinst.ernet.in>
> Date: February 20, 2007 1:32:00 AM CST
> To: cjfields at uiuc.edu
> Subject: help on remote blast
>
> Dear Dr. Chris
> I am very new usedr to bioperl. and have been using the script for
> retrieving some blast sequences . But suddenly it has stopped  
> retrieving
> #perl n9.pl
> te.pep
> waiting........
> for a long time
>
> I am attaching the file. Can you please tell me what I should do so  
> that it
> again runs.
>
>
> --
> Sanjib Kumar Gupta
> Bioinformatics Centre
> Bose Institute
> Kolkata 700054, INDIA
> Phone  : +91-33-2355 6626, 2816, 2355 4766
> Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070220/02f96eab/attachment.pl 
-------------- next part --------------

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 09:31:20 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 14:31:20 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
Message-ID: <45DB0638.1030001@cs.ucl.ac.uk>

Thanks Chris, glad it's appreciated.

Is there anything else I can do? If anyone has any requests/suggestions 
please let me know too.

Best wishes,

Tim

Chris Fields wrote:
> I think this is pretty nice!  We can add the code and test script to  
> bugzilla and (if someone has time) try to see where it might fit in,  
> though Bio::Graphics sounds like a good spot.
> 
> Anyone else have ideas on where this could go?
> 
> chris
> 
> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
> 
>> Hi everyone,
>>
>> I've written a perl module to display transmembrane protein topology
>> using GD. There are various options, including labels, helix/loop
>> dimensions, colour schemes etc but it only requires a string or array
>> containing the protein topology (e.g. transmembrane helix start/stop
>> points). It produces output like this:
>>
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>
>> using the code at the bottom.
>>
>> Here is a the module:
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>
>> I've never submitted anything to Bioperl before - is this sort of  
>> thing
>> likely to be of use to others? I imagine it would sit alongside  
>> some of
>> the Bio::Graphics stuff.
>>
>> Best wishes,
>>
>> Tim
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
>> use DrawTransmembrane;
>>
>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>
>> my %labels = ('5' => '5 - Sulphation Site',
>>                '21' => '1st Helix',
>>                '47' => '40 - Mutation',
>>                '60' => 'Voltage Sensor',
>>                '72' => '72 - Mutation 2',
>>                '73' => '73 - Mutation 3',
>>                '138' => '138 - Glycosylation Site',
>>                '170' => '170 - Phosphorylation Site',
>>                '200' => 'Last Helix');
>>
>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>> cartoon displaying transmembrane helices.',
>>                                                 -topology =>  
>> \@topology,
>>                                                 -n_terminal => 'out',
>>                                                 -helix_width => 48,
>>                                                 -helix_height => 125,
>>                                                 -short_loop_limit  
>> => 10,
>>                                                 -long_loop_limit =>  
>> 35,
>>                                                 -loop_width => 25,
>>                                                 -colour_scheme =>  
>> 'yellow',
>>                                                 -labels => \%labels,
>>                                                 -text_offset => -10);
>>
>> ## print the .png file
>> my $output = 'test.png';
>> open(OUTPUT, ">$output");
>> binmode OUTPUT;
>> print OUTPUT $im->png;
>> close OUTPUT;
>>
>> my $system = `display $output`;
>>
>> -- 
>> Tim Nugent (MRes)
>> Research Student
>> Bioinformatics Unit
>> Department of Computer Science
>> University College London
>> Gower Street
>> London WC1E 6BT
>> Tel: 020-7679-0410
>> t.nugent at ucl.ac.uk
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk

From marian.thieme at lycos.de  Tue Feb 20 08:34:24 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Tue, 20 Feb 2007 13:34:24 +0000
Subject: [Bioperl-l] Alignment
Message-ID: <188661178021328@lycos-europe.com>

Hi all,

perhaps somebody can give some comments in the following matter:

I have a series of sequences which should be aligned against a reference sequence.
In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.

Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
If yes how I have to understand the example in the doc:
use Bio::LocatableSeq;
my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);

Does the "-" sign represents a gap ? When this sequence starts at position 1
why it ends at position 7, because when considering the gap, there are 8 positions.
Does the SimpleAlign object can treat the gap ?


Thanks for your attention,
Marian

Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe

From cjfields at uiuc.edu  Tue Feb 20 09:40:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 08:40:38 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <E1D718F1-E0FA-496B-9798-7EC84E2D4439@uiuc.edu>

You can add the module and test code (the script) to bugzilla:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

Basically file a new bug report but note that it in an enhancement  
request when filling it out.  Attach the code and test script to the  
report after it is generated (note that it may be easier to add all  
of the files together as a zipped archive).  I think you could also  
add the graphical output as a binary file if they are huge files.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions please let me know too.
>
> Best wishes,
>
> Tim
>
> Chris Fields wrote:
>> I think this is pretty nice!  We can add the code and test script  
>> to  bugzilla and (if someone has time) try to see where it might  
>> fit in,  though Bio::Graphics sounds like a good spot.
>> Anyone else have ideas on where this could go?
>> chris
>> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
>>> Hi everyone,
>>>
>>> I've written a perl module to display transmembrane protein topology
>>> using GD. There are various options, including labels, helix/loop
>>> dimensions, colour schemes etc but it only requires a string or  
>>> array
>>> containing the protein topology (e.g. transmembrane helix start/stop
>>> points). It produces output like this:
>>>
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>>
>>> using the code at the bottom.
>>>
>>> Here is a the module:
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>>
>>> I've never submitted anything to Bioperl before - is this sort  
>>> of  thing
>>> likely to be of use to others? I imagine it would sit alongside   
>>> some of
>>> the Bio::Graphics stuff.
>>>
>>> Best wishes,
>>>
>>> Tim
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to  
>>> module
>>> use DrawTransmembrane;
>>>
>>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>>
>>> my %labels = ('5' => '5 - Sulphation Site',
>>>                '21' => '1st Helix',
>>>                '47' => '40 - Mutation',
>>>                '60' => 'Voltage Sensor',
>>>                '72' => '72 - Mutation 2',
>>>                '73' => '73 - Mutation 3',
>>>                '138' => '138 - Glycosylation Site',
>>>                '170' => '170 - Phosphorylation Site',
>>>                '200' => 'Last Helix');
>>>
>>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>>> cartoon displaying transmembrane helices.',
>>>                                                 -topology =>   
>>> \@topology,
>>>                                                 -n_terminal =>  
>>> 'out',
>>>                                                 -helix_width => 48,
>>>                                                 -helix_height =>  
>>> 125,
>>>                                                 - 
>>> short_loop_limit  => 10,
>>>                                                 -long_loop_limit  
>>> =>  35,
>>>                                                 -loop_width => 25,
>>>                                                 -colour_scheme  
>>> =>  'yellow',
>>>                                                 -labels => \%labels,
>>>                                                 -text_offset =>  
>>> -10);
>>>
>>> ## print the .png file
>>> my $output = 'test.png';
>>> open(OUTPUT, ">$output");
>>> binmode OUTPUT;
>>> print OUTPUT $im->png;
>>> close OUTPUT;
>>>
>>> my $system = `display $output`;
>>>
>>> -- 
>>> Tim Nugent (MRes)
>>> Research Student
>>> Bioinformatics Unit
>>> Department of Computer Science
>>> University College London
>>> Gower Street
>>> London WC1E 6BT
>>> Tel: 020-7679-0410
>>> t.nugent at ucl.ac.uk
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Feb 20 10:30:17 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 20 Feb 2007 15:30:17 +0000
Subject: [Bioperl-l] Alignment
In-Reply-To: <188661178021328@lycos-europe.com>
References: <188661178021328@lycos-europe.com>
Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>

I think the SimpleAlign object contains a set of sequences, each of
which is a LocatableSeq object.

These LocatableSeq objects will have gaps, represented by '-' or
whatever other symbol is specified (I think there are methods for it),
and then one can use methods like column_from_residue_number to map
the coordinates between the primary sequence and the aligned sequence.
The perldoc for LocatableSeq has some examples on how to use these
methods.

[Hopefully I haven't written any lie in this message],

Cheers,

    Albert.

On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
> Hi all,
>
> perhaps somebody can give some comments in the following matter:
>
> I have a series of sequences which should be aligned against a reference sequence.
> In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
> The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.
>
> Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
> If yes how I have to understand the example in the doc:
> use Bio::LocatableSeq;
> my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);
>
> Does the "-" sign represents a gap ? When this sequence starts at position 1
> why it ends at position 7, because when considering the gap, there are 8 positions.
> Does the SimpleAlign object can treat the gap ?
>
>
> Thanks for your attention,
> Marian
>
> Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at uiuc.edu  Tue Feb 20 10:30:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:30:15 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>

Sorry, I sent that last one off prematurely.

I could see this being used as a very useful utility if a Bioperl  
object had SeqFeatures which described transmembrane regions, or if  
output from something like TMHMM were parsed and used for input.   
Don't know if it's included, but if not you probably should allow  
labeling of the intracellular/extracellular space to designate  
periplasmic space, mitochondrial matrix, thylakoid, etc.

I think Bio::Graphics namespace is definitely the place to go.  If I  
ever get around to writing up the RNA structural stuff I may put  
something there myself.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions
> please let me know too.
>
> Best wishes,
>
> Tim


From cjfields at uiuc.edu  Tue Feb 20 10:49:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:49:56 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu>


On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:

> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.
>
> These LocatableSeq objects will have gaps, represented by '-' or
> whatever other symbol is specified (I think there are methods for it),
> and then one can use methods like column_from_residue_number to map
> the coordinates between the primary sequence and the aligned sequence.
> The perldoc for LocatableSeq has some examples on how to use these
> methods.
>
> [Hopefully I haven't written any lie in this message],
>
> Cheers,
>
>     Albert.

No lies.  The comparison methods are in SimpleAlign; if you look in  
SimpleAlign.t you'll see several demos on how to go abouot adding  
LocatableSeqs to a SimpleAlign object and then use SimpleAlign  
methods for them.

chris

PS (to marian): I'm a bit behind this week, so the bracket_strings  
stuff is lagging behind; I'm writing up some stuff on a deadline.

From t.nugent at cs.ucl.ac.uk  Tue Feb 20 10:50:10 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 15:50:10 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk>

Labeling of inside/outside and membrane is already possible via 
-inside_label, -outside_label and -membrane_label tags, defaults are 
intracellular, extracellular and plasma membrane.

Was definitely going to add an input/parser for MEMSAT, developed here 
at UCL, and probably a few other popular TM predictors too, e.g. 
PHOBIUS, TMHMM etc. Can already accept topology in the string format 
used by OPM (http://opm.phar.umich.edu/).

Tim


Chris Fields wrote:
> Sorry, I sent that last one off prematurely.
> 
> I could see this being used as a very useful utility if a Bioperl object 
> had SeqFeatures which described transmembrane regions, or if output from 
> something like TMHMM were parsed and used for input.  Don't know if it's 
> included, but if not you probably should allow labeling of the 
> intracellular/extracellular space to designate periplasmic space, 
> mitochondrial matrix, thylakoid, etc.
> 
> I think Bio::Graphics namespace is definitely the place to go.  If I 
> ever get around to writing up the RNA structural stuff I may put 
> something there myself.
> 
> chris
> 
> On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:
> 
>> Thanks Chris, glad it's appreciated.
>>
>> Is there anything else I can do? If anyone has any requests/suggestions
>> please let me know too.
>>
>> Best wishes,
>>
>> Tim
> 
> 

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk

From cjfields at uiuc.edu  Tue Feb 20 11:09:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 10:09:00 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
	<45DB18B2.8070004@cs.ucl.ac.uk>
Message-ID: <FF7B4076-FA5A-4F44-ADE7-A44D2FCF4599@uiuc.edu>


On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote:

> Labeling of inside/outside and membrane is already possible via - 
> inside_label, -outside_label and -membrane_label tags, defaults are  
> intracellular, extracellular and plasma membrane.
>
> Was definitely going to add an input/parser for MEMSAT, developed  
> here at UCL, and probably a few other popular TM predictors too,  
> e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string  
> format used by OPM (http://opm.phar.umich.edu/).
>
> Tim

I'll definitely have to take a closer look at it when I have time.   
My guess is the best fit for data would be a seqfeatures, either in a  
collection or a Bio::Seq.  As for the parsers you can look at the  
Bio::Tools::Tmhmm module, which scans Tmhmm output and converts  
everything to seqfeatures.

chris

From lstein at cshl.edu  Tue Feb 20 12:25:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 20 Feb 2007 12:25:24 -0500
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
In-Reply-To: <45DAD8F6.1030409@sendu.me.uk>
References: <45DAD8F6.1030409@sendu.me.uk>
Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com>

Just an oversight. I'll fix it.

Lincoln

On 2/20/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Bio::Graphics::FeatureBase::seq_id is currently implemented as a
> read-only alias to ref():
> sub seq_id          { shift->ref()         }
>
>
> What is the reasoning behind this? Can it be made to handle setting of
> the value as well?:
> sub seq_id          { shift->ref(@_)       }
>
>
> Cheers,
> Sendu.
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From khan at cshl.edu  Tue Feb 20 15:42:12 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Tue, 20 Feb 2007 15:42:12 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>

Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


From michael.watson at bbsrc.ac.uk  Tue Feb 20 16:33:19 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 20 Feb 2007 21:33:19 -0000
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
References: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk>

Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Wed Feb 21 03:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive

From neetisomaiya at gmail.com  Wed Feb 21 03:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive

From sdavis2 at mail.nih.gov  Wed Feb 21 06:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1

From sdavis2 at mail.nih.gov  Wed Feb 21 06:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1

From cjfields at uiuc.edu  Wed Feb 21 07:08:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 06:08:57 -0600
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu>


On Feb 21, 2007, at 5:17 AM, Sean Davis wrote:

> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
>> Hi All,
>>
>> I downloaded module
>> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
>> And I am trying to install it when I got the following error. Can  
>> someone
>> please guide me.
>
> You will probably need to read the INSTALL document.  You need to  
> install a
> couple of libraries first.  Looks like you don't have the staden io- 
> lib
> installed.

Just to note, this module isn't part of BioPerl (I don't even think  
it has a Bioperl interface).  You'll probably need to contact Lincoln  
for details on using this module.

One thing you may run into is errors with the version of io_lib  
installed (a problem I've encountered with bioperl-ext), probably  
from API changes.  If you run into problems with newer versions of  
io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12.

From neetisomaiya at gmail.com  Wed Feb 21 07:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive

From neetisomaiya at gmail.com  Wed Feb 21 07:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive

From jay at jays.net  Tue Feb 20 19:27:01 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 20 Feb 2007 18:27:01 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>

> On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
>> I have a series of sequences which should be aligned against a 
>> reference sequence.
>> In this special case we dont need to calculate anything, we only need 
>> to represent the sequences and get for instance some columns of 
>> interest.
>> The problem now is, that some sequences have gaps and we need to 
>> represent gaps in the rewference sequence as well as in some 
>> individual sequences.

On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.

Fascinating. In my BLAST-centric universe I went and rolled my own 
solution for SeqLab where I hold onto the Bio::Seq from the reference 
sequences and then hold onto the Bio::Search::HSP::GenericHSP objects 
for all my BLAST hits. From that dataset I can write whatever reports I 
want and/or perform any subsequent actions. I wonder if I should have 
done that differently...

What typically creates .pfam files?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From cjfields at uiuc.edu  Wed Feb 21 08:36:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 07:36:02 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
	<cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu>


On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote:
...
>
> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
>> I think the SimpleAlign object contains a set of sequences, each of
>> which is a LocatableSeq object.
>
> Fascinating. In my BLAST-centric universe I went and rolled my own
> solution for SeqLab where I hold onto the Bio::Seq from the reference
> sequences and then hold onto the Bio::Search::HSP::GenericHSP objects
> for all my BLAST hits. From that dataset I can write whatever  
> reports I
> want and/or perform any subsequent actions. I wonder if I should have
> done that differently...
>
> What typically creates .pfam files?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah

Pfam alignments come in two formats (pfam and stockholm) that can  
both be parsed into SimpleAlign objects via Bio::AlignIO:

my $alnin = Bio::AlignIO->new(-format => 'stockholm',
                               -file => 'dho.sto');

while (my $aln = $alnin->next_aln) {
    # do stuff to $aln SimpleAlign
}

Personally I stick with Stockholm as it's a richer format (with  
annotations and so on), but the parser was rewritten recently (by  
moi!) so may have some bugs still.

I'm a bit confused as to what you do with BLAST files.  You can  
generate a SimpleAlign right from the HSP for most SearchIO parsers:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods

chris

From sanjib at bic.boseinst.ernet.in  Wed Feb 21 01:12:06 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Wed, 21 Feb 2007 11:42:06 +0530
Subject: [Bioperl-l] help on remote blast
In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in>
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors


waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
xx.pep
 
-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded
 
DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
 
<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>
 
---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.
--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070221/5a3382d6/attachment.pl 

From granjeau at tagc.univ-mrs.fr  Wed Feb 21 08:50:39 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 21 Feb 2007 14:50:39 +0100
Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily
Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr>

Hello!

Not clear to me, but I find a work around by checking for empty list 
before adding, here is what I noticed. Adding as members an empty list 
() is not the same as adding a reference to an empty list [], of course, 
but could be thought to be the same. Calling get_members, for the second 
case, I got a list of 0 member, but in the first case I got of 1 member, 
which is not an object at all. I am warned now, but may be the 
documentation should emphasize on using by the reference call.

Best regards,
--Samuel


use Bio::Cluster::SequenceFamily;

$f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$f->add_members( () );
print scalar $f->get_members();
# 1
$g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$g->add_members( [] );
print scalar $g->get_members();
# 0


From stephen.marshall at novartis.com  Wed Feb 21 12:01:00 2007
From: stephen.marshall at novartis.com (stephen.marshall at novartis.com)
Date: Wed, 21 Feb 2007 12:01:00 -0500
Subject: [Bioperl-l] Parsing kegg files
Message-ID: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>

Hello
I"m trying to parse a Kegg file and I can't seem to get at the pathway 
information... Here's a snippet of my code. I only see dblink and 
description as annotation

use Bio::SeqIO;

my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');

while ( my $seq = $stream->next_seq() ) {
        # do something with $seq
        my $id = $seq->display_id();
        print "$id:";
        my $ann = $seq->annotation();
        foreach my $key ( $ann->get_all_annotation_keys() ) {
                my @values = $ann->get_Annotations($key);
                foreach my $value ( @values ) {
                        print "Annotation: ",$key," value: 
",$value->as_text,"\n";
                }
        }

}
_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.

From prateek.vit at gmail.com  Wed Feb 21 12:40:25 2007
From: prateek.vit at gmail.com (prateek singh yadav)
Date: Wed, 21 Feb 2007 23:10:25 +0530
Subject: [Bioperl-l] Problem in BioPerl Installation
Message-ID: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>

Hello all,

I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
shows this problem.


[root at HX342SBC054 Desktop]# cpan
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.7601)
ReadLine support available (try 'install Bundle::CPAN')

cpan> get bioperl
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
Warning: Found only 25 objects in /root/.cpan/Metadata
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Line-Count header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Last-Updated header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Going to read /root/.cpan/sources/modules/03modlist.data.gz
Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
 at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
        CPAN::Index::rd_modlist('CPAN::Index',
'/root/.cpan/sources/modules/03modlist.data.gz') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 3129
        CPAN::Index::reload('CPAN::Index') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 675
        CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
        CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2078
        CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2157
        CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 201
        eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
        CPAN::shell() called at /usr/bin/cpan line 193

cpan>

Can anyone give me direction  how to configure cpan again or how to install
BioPerl on linux with its complete dependencies. Because I think I have a
problem in CPAN configuration.

Regards,
Prateek

-- 
Prateek Singh
3rd year Bioinformatics(BTech)
Vellore Institute Of Technology
Vellore-632014
prateek.vit at gmail.com

From bosborne11 at verizon.net  Wed Feb 21 12:29:40 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 21 Feb 2007 12:29:40 -0500
Subject: [Bioperl-l] Parsing kegg files
In-Reply-To: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>
Message-ID: <C201EBB4.CEE7%bosborne11@verizon.net>

Stephen,

I don't know what your eventual goals are but you might want to take a look
at bioperl-network. However, there are problems with this package. One, it
only parses DIP tab-delimited and PSI-MI and it does this last one only
partially (you will get the graph though). Two, it seems to have only a
single developer interested in it, that's me, and few users. In my Bioperl
experience projects like this tend to fade away.

http://www.bioperl.org/wiki/Network_package


Brian O.


On 2/21/07 12:01 PM, "stephen.marshall at novartis.com"
<stephen.marshall at novartis.com> wrote:

> Hello
> I"m trying to parse a Kegg file and I can't seem to get at the pathway
> information... Here's a snippet of my code. I only see dblink and
> description as annotation
> 
> use Bio::SeqIO;
> 
> my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');
> 
> while ( my $seq = $stream->next_seq() ) {
>         # do something with $seq
>         my $id = $seq->display_id();
>         print "$id:";
>         my $ann = $seq->annotation();
>         foreach my $key ( $ann->get_all_annotation_keys() ) {
>                 my @values = $ann->get_Annotations($key);
>                 foreach my $value ( @values ) {
>                         print "Annotation: ",$key," value:
> ",$value->as_text,"\n";
>                 }
>         }
> 
> }
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivery of the
> message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, please
> notify the sender immediately by e-mail and delete the material from any
> computer.  Thank you.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Feb 21 13:18:37 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 21 Feb 2007 12:18:37 -0600
Subject: [Bioperl-l] Problem in BioPerl Installation
In-Reply-To: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
References: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx>

You can always rebuild your CPAN configuration by deleting the existing 
.cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke 
CPAN again from root's shell to rebuild the config:

# perl -MCPAN -e shell

Hope this helps.

Regards,
Mauricio.

prateek singh yadav wrote:
> Hello all,
> 
> I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
> shows this problem.
> 
> 
> [root at HX342SBC054 Desktop]# cpan
> Terminal does not support AddHistory.
> 
> cpan shell -- CPAN exploration and modules installation (v1.7601)
> ReadLine support available (try 'install Bundle::CPAN')
> 
> cpan> get bioperl
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
> Warning: Found only 25 objects in /root/.cpan/Metadata
> Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Line-Count header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Last-Updated header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
> Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
> you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
>  at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
>         CPAN::Index::rd_modlist('CPAN::Index',
> '/root/.cpan/sources/modules/03modlist.data.gz') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 3129
>         CPAN::Index::reload('CPAN::Index') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 675
>         CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
> called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
>         CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2078
>         CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2157
>         CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         CPAN::shell() called at /usr/bin/cpan line 193
> 
> cpan>
> 
> Can anyone give me direction  how to configure cpan again or how to install
> BioPerl on linux with its complete dependencies. Because I think I have a
> problem in CPAN configuration.
> 
> Regards,
> Prateek
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Wed Feb 21 13:33:17 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Feb 2007 13:33:17 -0500
Subject: [Bioperl-l] Adding empty member list in
	Bio::Cluster::SequenceFamily
In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr>
References: <45DC4E2F.4060804@tagc.univ-mrs.fr>
Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net>

Fixed in CVS HEAD. -hilmar

On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> Not clear to me, but I find a work around by checking for empty list
> before adding, here is what I noticed. Adding as members an empty list
> () is not the same as adding a reference to an empty list [], of  
> course,
> but could be thought to be the same. Calling get_members, for the  
> second
> case, I got a list of 0 member, but in the first case I got of 1  
> member,
> which is not an object at all. I am warned now, but may be the
> documentation should emphasize on using by the reference call.
>
> Best regards,
> --Samuel
>
>
> use Bio::Cluster::SequenceFamily;
>
> $f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $f->add_members( () );
> print scalar $f->get_members();
> # 1
> $g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $g->add_members( [] );
> print scalar $g->get_members();
> # 0
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Feb 21 14:12:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 13:12:57 -0600
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>

Dmitry,

I'm forwarding this to the mail list.  In the future please post/ 
respond to the regular mail list so other BioPerl developers/users  
can comment.  You'll get feedback much faster here (and maybe even  
some support!).

The issue at hand is whether we can support GenBank accessions/ 
display_id/version with your naming scheme.  My feeling is that  
support for nonalphanumerics was removed to be compliant with the  
GenBank standard for accessions, though I may be wrong.  Maybe  
someone who was around during bioperl 1.2 can elaborate more?

 From http://bugzilla.open-bio.org/show_bug.cgi?id=2214
--------------------------------------------------
....
Thanks for verbose explanation. It seems that I would need to apply
my local patches to the BioPerl module(s). With BioPerl-1.2 there was
no problem with '-' in sequence names.

The problem is that in the project we participate (Vizier project)  
following
sequence name convention was adopted:

VZ##<virus_ICTV>-(<GenBank LOCUS ID>or<strain designation>)-<$$>

VZ Stands for Vizier

## Your 2-digits Partner ID within the VIZIER consortium

<virus_ICTV> Virus name according to the ICTV nomenclature;

<GenBank LOCUS ID>,
<strain designation> If sequence has not been assigned a GenBank  
LOCUS ID,
available strain designation, short as possible, should be used

<$$> Unique 2-digits number on your discretion to label sequence variant
--------------------------------------------------

chris

From gabriel.cardona at uib.es  Thu Feb 22 04:33:14 2007
From: gabriel.cardona at uib.es (gcardona)
Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST)
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
Message-ID: <9096740.post@talk.nabble.com>


Hello,

I am trying to install Bioperl on a Windows system, following the
installation notes in 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
find the package and answers:
Downloading bioperl-1.5.2_100 ... not found

I've looked the contents of
http://bioperl.org/DIST
and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
folder the available version is bioperl-1.5.2_102
Is this a bug? or should I download and install manually?

Thank you in advance,

Gabriel Cardona
-- 
View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Thu Feb 22 07:35:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Feb 2007 12:35:14 +0000
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
In-Reply-To: <9096740.post@talk.nabble.com>
References: <9096740.post@talk.nabble.com>
Message-ID: <45DD8E02.1070404@sendu.me.uk>

gcardona wrote:
> Hello,
> 
> I am trying to install Bioperl on a Windows system, following the
> installation notes in 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
> find the package and answers:
> Downloading bioperl-1.5.2_100 ... not found
> 
> I've looked the contents of
> http://bioperl.org/DIST
> and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
> folder the available version is bioperl-1.5.2_102
> Is this a bug? or should I download and install manually?

Sorry, my mistake. I accidentally moved the ppm to a different folder. 
It should work now though.

I may make a 1.5.2_102 ppm at some point, but there are no relevant 
differences between _102 and _100 as far as Windows users are concerned.

From enrique_rulz at yahoo.com  Thu Feb 22 15:41:37 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
Message-ID: <9107936.post@talk.nabble.com>


Hi every1..
I m facing a great deal of problem in simple pattern matching between
sequence & a pattern ..Program shod be designed such a way that it shod be
able do two things 1) normal matching...For eg: GATCAAT....if TC is
entered... output shod be 2...2) matching using spl character..In same
example if C*T value is entered It shod give o/p as 3 & seq to b displayed
is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
problem..output I m gettin as 1 instead of 3...Code is really simple!

#!/usr/bin/perl
$alphabet = "GATCAAT";
$pattern=  "C*T ";

$alphabet =~ /($pattern)/i;

print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";

====================
OUTPUT!
The entire C*T match began at 1 and ended at 2
====================

but the o/p shod be 3????
& Is there n e chance I can get seq too..I mean instead of C*T'' i need
'CAAT'...????

Well..Its not compulsion to use regex....But I find it quite simple..can
there be n e other method??

Thanx in advance!
Kurt!    
 
-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Feb 22 16:01:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Feb 2007 15:01:03 -0600
Subject: [Bioperl-l] GenBank accession bug?
In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>
	<51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu>


On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:

>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
>
> Chris, I'm a little unsure of what you're saying here (which might  
> mean
> that you're already saying what I'm about to...say). Do you mean it  
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
>
> I would argue any arbitrary ID should be supported as long as that  
> ID is a
> contiguous non-space word (\S+).
>
> Actually the existing accession regex looks like it already  
> supports IDs
> with '-':
>
> /^ACCESSION\s+(\S.*\S)/
>
> It's only the version regex which doesn't (\w doesn't include '-'):
>
> /^\w+\.(\d+)/
>
>
> Anyone else have thoughts or comments on this? Off the top of my  
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
>
> Dave

You're right; the argument comes down simply to whether we would  
support \S+ or just \w+.  I'm neutral on this myself, but I wonder  
how allowing \S+ would affect other modules (for instance, indexing  
for a flat db), where one might just use \w+ for accessions,  
expecting them to be GenBank- or EMBL-like alphanumerics.  The fact  
that \S+ was supported in the past (as indicated in the bug report)  
and then wasn't post 1.2 makes me think there was a reason for  
someone going in and modifying it, but that was before my time on the  
group.

I'll have a look at the CVS history when I have time to see what I  
can dig up.

chris

From mkiwala at watson.wustl.edu  Thu Feb 22 15:36:33 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 22 Feb 2007 14:36:33 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
Message-ID: <45DDFED1.1090503@watson.wustl.edu>

Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?

I get the impression they are designed to do similar things.  If so is 
one deprecated and the other preferred?

If their responsibilities are orthogonal to each other, what sorts of 
tasks are suited to each?

Thanks,
Michael

From dmessina at wustl.edu  Thu Feb 22 15:53:01 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST)
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu>

> The issue at hand is whether we can support GenBank accessions/
> display_id/version with your naming scheme.

Chris, I'm a little unsure of what you're saying here (which might mean
that you're already saying what I'm about to...say). Do you mean it might
be tricky to support both the Genbank standard and Dmitry's
simultaneously?

I would argue any arbitrary ID should be supported as long as that ID is a
contiguous non-space word (\S+).

Actually the existing accession regex looks like it already supports IDs
with '-':

/^ACCESSION\s+(\S.*\S)/

It's only the version regex which doesn't (\w doesn't include '-'):

/^\w+\.(\d+)/


Anyone else have thoughts or comments on this? Off the top of my head, I
can't think of any issues that might arise from doing so (apart from
having to modify all of the SeqIO modules to support it).

Dave


From heikki at sanbi.ac.za  Fri Feb 23 03:25:39 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 23 Feb 2007 10:25:39 +0200
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9107936.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>
Message-ID: <200702231025.39416.heikki@sanbi.ac.za>

Kurt,

There are  few things in your code to note:

- regexp /C*T/ matches any T preceded by zero or more Cs,
  not what you meant
- $- and $+ are among the "expensive" perl functions worth 
  not using unless you have to. Using them once in your 
  code slows execution down considerable. There is always 
  an other way.
- Keep in mind what you want to use the match positions for: 
  Human readable locations usually start counting with 1 but
  perl code uses 0 as the first location. The code below assumes
  you want to print the locations out.

Study my example code below.

Yours,
	-Heikki

###################################################################
#!/usr/bin/perl
$seq = "GATCAAT";
#$pattern=  'C*T';
$pattern=  'C.*T';

while ($seq =~ m/($pattern)/gi) {

    $match = $1;
    $end = pos($seq);
    $start = $end - length($match) +1;

    print "$match : $start - $end\n";
}

###################################################################


On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> Hi every1..
> I m facing a great deal of problem in simple pattern matching between
> sequence & a pattern ..Program shod be designed such a way that it shod be
> able do two things 1) normal matching...For eg: GATCAAT....if TC is
> entered... output shod be 2...2) matching using spl character..In same
> example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> problem..output I m gettin as 1 instead of 3...Code is really simple!
>
> #!/usr/bin/perl
> $alphabet = "GATCAAT";
> $pattern=  "C*T ";
>
> $alphabet =~ /($pattern)/i;
>
> print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
>
> ====================
> OUTPUT!
> The entire C*T match began at 1 and ended at 2
> ====================
>
> but the o/p shod be 3????
> & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> 'CAAT'...????
>
> Well..Its not compulsion to use regex....But I find it quite simple..can
> there be n e other method??
>
> Thanx in advance!
> Kurt!


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From avilella at gmail.com  Fri Feb 23 04:59:49 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Feb 2007 09:59:49 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>

now that we are at this pattern matching thread, I was wondering if
any perl guru could enlighten me on the issue of matching exact
sequence patterns on a gapped target sequence. E.g.:

my $seq = "CGATCAACGAATCGTACGTACTC";
my $gapped_seq =
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

and one would like to get as a result:

"CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"

which is the match of $seq but in $gapped_seq.

Cheers,

    Albert.


On 2/23/07, Heikki Lehvaslaiho <heikki at sanbi.ac.za> wrote:
> Kurt,
>
> There are  few things in your code to note:
>
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth
>   not using unless you have to. Using them once in your
>   code slows execution down considerable. There is always
>   an other way.
> - Keep in mind what you want to use the match positions for:
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
>
> Study my example code below.
>
> Yours,
>         -Heikki
>
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
>
> while ($seq =~ m/($pattern)/gi) {
>
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
>
>     print "$match : $start - $end\n";
> }
>
> ###################################################################
>
>
> On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> > Hi every1..
> > I m facing a great deal of problem in simple pattern matching between
> > sequence & a pattern ..Program shod be designed such a way that it shod be
> > able do two things 1) normal matching...For eg: GATCAAT....if TC is
> > entered... output shod be 2...2) matching using spl character..In same
> > example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> > problem..output I m gettin as 1 instead of 3...Code is really simple!
> >
> > #!/usr/bin/perl
> > $alphabet = "GATCAAT";
> > $pattern=  "C*T ";
> >
> > $alphabet =~ /($pattern)/i;
> >
> > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
> >
> > ====================
> > OUTPUT!
> > The entire C*T match began at 1 and ended at 2
> > ====================
> >
> > but the o/p shod be 3????
> > & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> > 'CAAT'...????
> >
> > Well..Its not compulsion to use regex....But I find it quite simple..can
> > there be n e other method??
> >
> > Thanx in advance!
> > Kurt!
>
>
>
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From js5 at sanger.ac.uk  Fri Feb 23 06:34:37 2007
From: js5 at sanger.ac.uk (James Smith)
Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>

On Fri, 23 Feb 2007, Albert Vilella wrote:

> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
>
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>
> and one would like to get as a result:
>
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>
> which is the match of $seq but in $gapped_seq.

Try...

 my $seq = "CGATCAACGAATCGTACGTACTC";
 my $gapped_seq =
   "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

 my $regexp = '('.join('-*?',split//,$seq).')';

 if( $gapped_seq =~ /$regexp/ ) {
   print "Match is $1\n";
 } else {
   print "No match\n";
 }

 (not sure on the efficiency if $seq is long tho')
James

>
> Cheers,

From khoueiry at ibdm.univ-mrs.fr  Fri Feb 23 08:09:33 2007
From: khoueiry at ibdm.univ-mrs.fr (pierre)
Date: Fri, 23 Feb 2007 14:09:33 +0100
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <1172236173.4309.6.camel@ciona-pierre>

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/0e08ebe6/attachment.ksh 

From neetisomaiya at gmail.com  Fri Feb 23 07:27:28 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 23 Feb 2007 17:57:28 +0530
Subject: [Bioperl-l] need help urgently - needle output parsing
Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com>

Hi,

I am using needle alignment tool (standalone, on a linux machine), and then
I am using Bioperl to parse the output.
All data - sequence files and alignment outputs are attached with this mail.

I have 2 small sequences :- 693.seq and revcomp693.seq
I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and
80768-4291-5639.84809_84810_84810_1.scf.seq
All these are in fasta format

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84809_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 97
2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 91

All this is correct.

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84810_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is correct)
2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is incorrect, correct position is 330)


Part of my code is as follows :-
---------------------------------------------

# running needle
`$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen
10.0-gapextend
0.5 $output`;

# parsing needle output
my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output);
my $aln = $str->next_aln();
my $pos = $aln->column_from_residue_number('original',1);

$logger->info("Alignment pos is $pos");

####################################

 # running needle
`$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen
10.0 -gapextend 0.5 $comp_output`;

# parsing needle output
my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output);
my $comp_aln = $comp_str->next_aln();
my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1);

$logger->info("Alignment pos is $comp_pos");


Can someone please tell me what is going wrong here?


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data.zip
Type: application/zip
Size: 4456 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/21658b7d/attachment-0001.zip 

From bix at sendu.me.uk  Fri Feb 23 08:55:24 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Feb 2007 13:55:24 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
	<Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
Message-ID: <45DEF24C.1010303@sendu.me.uk>

James Smith wrote:
> On Fri, 23 Feb 2007, Albert Vilella wrote:
> 
>> now that we are at this pattern matching thread, I was wondering if
>> any perl guru could enlighten me on the issue of matching exact
>> sequence patterns on a gapped target sequence. E.g.:
>>
>> my $seq = "CGATCAACGAATCGTACGTACTC";
>> my $gapped_seq =
>> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>>
>> and one would like to get as a result:
>>
>> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>>
>> which is the match of $seq but in $gapped_seq.
> 
> Try...
> 
>  my $seq = "CGATCAACGAATCGTACGTACTC";
>  my $gapped_seq =
>    "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
> 
>  my $regexp = '('.join('-*?',split//,$seq).')';
> 
>  if( $gapped_seq =~ /$regexp/ ) {
>    print "Match is $1\n";
>  } else {
>    print "No match\n";
>  }

That's great stuff. If you were matching thousands of different $seq 
against the same very large $gapped_seq, and only needed the first match 
of $seq in $gapped_seq, the alternative to the above approach (remove 
the gaps from $gapped_seq and do index() matching) will be faster.

Here's one (overly long-winded) way of implementing it, that I found to 
take ~2s vs ~22s for the above regex approach when doing the job on 
999999 copies of $seq:

#!/usr/bin/perl -w
use strict;
use warnings;

my $gapped_seq = 
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

# note the total gap-length at position in gapless 0-based coords
my @gap_lengths;
my $gap_length = 0;
while ($gapped_seq =~ /(-+)/g) {
   my $match = $1;
   my $prev_length = $gap_length;
   $gap_length += length($match);
   my $end = pos($gapped_seq) - $gap_length - 1;
   push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths);
}
push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - 
@gap_lengths - $gap_length));

# remove the gaps
my $gapless_seq = $gapped_seq;
$gapless_seq =~ s/-//g;

# now for each of thousands of seqs...
my $seq = 'CGATCAACGAATCGTACGTACTC';
my @seqs;
for (1..999999) {
   push(@seqs, $seq);
}
foreach my $seq (@seqs) {
   my $start = index($gapless_seq, $seq);
   if ($start == -1) {
     print "No match found for seq '$seq'\n";
     next;
   }
   my $end = $start + length($seq) - 1;

   # calculate the coords in $gapped_seq
   $start = $start + $gap_lengths[$start];
   $end = $end + $gap_lengths[$end];

   my $result = substr($gapped_seq, $start, ($end - $start + 1));
   #print $result, "\n";
}

exit;


From MEC at stowers-institute.org  Fri Feb 23 10:54:57 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 09:54:57 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with
	multiple values
In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>

Lincoln, and other Bio::DB::SeqFeature wanderers:

I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
does not respect the following:
 
"Multiple attributes of the same type are indicated by separating the
values with the comma "," character"  (c.f.
http://www.sequenceontology.org/gff3.shtml)
 
This one-liner demonstrates the problem:
 
perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
"J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
-name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
J	A	PH	1	2	.	.	.
foo=bar;foo=blat;Name=mec

Do you agree this is a problem? 
 
The fix is in the post-sig patch to
/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
stylistic privilege of promoting any ID, Parent, or Name attribute to
the front of column 9, so output is now:

J	A	PH	1	2	.	.	.
Name=mec;foo=bar,blat

Do you agree this is better?

I am poised to commit it, as well as the functionally same patch to the
equivilent function in Bio/Graphics/FeatureBase.pm

All clear?

-- Malcolm Cook

  
*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,498 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     
!      push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     #push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
 

From MEC at stowers-institute.org  Fri Feb 23 12:08:11 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 11:08:11 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	withmultiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F509@exchkc02.stowers-institute.org>

Oy,

I hit send too soon.  The patch I send had my new attribute encoder
commented out.  It should've been: 


*** NormalizedFeature.pm	2 Feb 2007 21:05:42 -0000	1.25
--- NormalizedFeature.pm	23 Feb 2007 17:06:37 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,497 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     # push @result,join '=',$self->escape($t),$self->escape($_)
foreach @values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  

Malcolm


From lstein at cshl.edu  Fri Feb 23 12:16:01 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 12:16:01 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>

Hi Malcom,

You're quite right, and I appreciate your work in tracking down and fixing
it. Before you commit the patch, can you confirm that the loader is working
correctly so that comma-separated values are read back into the data
structure as multiple attributes?

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, and other Bio::DB::SeqFeature wanderers:
>
> I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> does not respect the following:
>
> "Multiple attributes of the same type are indicated by separating the
> values with the comma "," character"  (c.f.
> http://www.sequenceontology.org/gff3.shtml)
>
> This one-liner demonstrates the problem:
>
> perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> J       A       PH      1       2       .       .       .
> foo=bar;foo=blat;Name=mec
>
> Do you agree this is a problem?
>
> The fix is in the post-sig patch to
> /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> stylistic privilege of promoting any ID, Parent, or Name attribute to
> the front of column 9, so output is now:
>
> J       A       PH      1       2       .       .       .
> Name=mec;foo=bar,blat
>
> Do you agree this is better?
>
> I am poised to commit it, as well as the functionally same patch to the
> equivilent function in Bio/Graphics/FeatureBase.pm
>
> All clear?
>
> -- Malcolm Cook
>
>
>
> *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> ***************
> *** 481,494 ****
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   push @result,"ID=".$self->escape($id)                     if defined
> $id;
> !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> $parent;
> !   push @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
> --- 481,498 ----
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> !     # NO! Multiple attributes of the same type are indicated by
> !     # separating the values with the comma "," character - per
> !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> !     #push @result,join '=',$self->escape($t),join(',', map
> {$self->escape($_)} @values);
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   unshift @result,"ID=".$self->escape($id)                     if
> defined $id;
> !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> defined $parent;
> !   unshift @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From aaron.j.mackey at gsk.com  Fri Feb 23 09:36:18 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 23 Feb 2007 09:36:18 -0500
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <45DDFED1.1090503@watson.wustl.edu>
Message-ID: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>

The fundamental difference (in my mind) between a feature and an 
annotation, is that a feature has a location/range, and thus the 
information represented in the feature is applicable only to that 
location/range.  An annotation, on the other hand, is "global", or at 
least non-localizable (note: a feature with a "fuzzy" location of 
"somewhere along this sequence, but I'm not sure where" is still not 
global - if you did/could know the location, you'd describe it as a 
feature, so it shouldn't be represented with an annotation).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:

> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
> 
> I get the impression they are designed to do similar things.  If so is 
> one deprecated and the other preferred?
> 
> If their responsibilities are orthogonal to each other, what sorts of 
> tasks are suited to each?
> 
> Thanks,
> Michael
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From MEC at stowers-institute.org  Fri Feb 23 13:46:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 12:46:00 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>

Lincoln,
 
OK.  I'll do that...
 
...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... 
 
...ok - parse_attributes _looks_ right to me
 
...so, let's try it
 
#load a feature into a new database:
 
bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
-create -user test -pass test <(echo -e
"J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n")
 
#It loaded ok.  Now, let's print it out in GFF3:
 
perl -MBio::DB::SeqFeature::Store -e 'foreach
(Bio::DB::SeqFeature::Store->new(-dsn =>
"dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu
res(-type => "PH:A")) {print $_->gff3_string . "\n"}'
J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat

#output looks good to me

Note, I tried loading attributes foo=bar;foo=blat and it came back
foo=bar,blat.  So, you can load either way.

I'll commit later today.

--Malcolm  

 
________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, February 23, 2007 11:16 AM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes
with multiple values
	
	
	Hi Malcom,
	
	You're quite right, and I appreciate your work in tracking down
and fixing it. Before you commit the patch, can you confirm that the
loader is working correctly so that comma-separated values are read back
into the data structure as multiple attributes? 
	
	Lincoln
	
	
	On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, and other Bio::DB::SeqFeature wanderers:
		
		I find that generating GFF from a Bio::DB::SeqFeature
using gff3_string
		does not respect the following:
		
		"Multiple attributes of the same type are indicated by
separating the 
		values with the comma "," character"  (c.f.
		http://www.sequenceontology.org/gff3.shtml)
		
		This one-liner demonstrates the problem:
		
		perl -MBio::DB::SeqFeature -e 'print
Bio::DB::SeqFeature->new(-seq_id =>
		"J", -start => 1, -end => 2, -primary_tag => 'PH',
-source => 'A',
		-name => 'mec', -attributes => {foo =>  [qw(bar
blat)]})->gff3_string' 
		J       A       PH      1       2       .       .
.
		foo=bar;foo=blat;Name=mec
		
		Do you agree this is a problem?
		
		The fix is in the post-sig patch to
		/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also
took the 
		stylistic privilege of promoting any ID, Parent, or Name
attribute to
		the front of column 9, so output is now:
		
		J       A       PH      1       2       .       .
.
		Name=mec;foo=bar,blat
		
		Do you agree this is better? 
		
		I am poised to commit it, as well as the functionally
same patch to the
		equivilent function in Bio/Graphics/FeatureBase.pm
		
		All clear?
		
		-- Malcolm Cook
		
		
		*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
		--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
		***************
		*** 481,494 ****
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace 
		!
		!     push @result,join
'=',$self->escape($t),$self->escape($_) foreach
		@values;
		    }
		    my $id   = $self->primary_id;
		    my $name = $self->display_name;
		!   push @result,"ID=".$self->escape($id)
if defined 
		$id;
		!   push
@result,"Parent=".$self->escape($parent->primary_id) if defined
		$parent;
		!   push @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result; 
		  }
		
		--- 481,498 ----
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
		!
		!      push @result,join
'=',$self->escape($t),$self->escape($_) foreach 
		@values;
		!     # NO! Multiple attributes of the same type are
indicated by
		!     # separating the values with the comma ","
character - per
		!     # http://www.sequenceontology.org/gff3.shtml.  Do
it this way:
		!     #push @result,join '=',$self->escape($t),join(',',
map
		{$self->escape($_)} @values);
		    }
		    my $id   = $self->primary_id; 
		    my $name = $self->display_name;
		!   unshift @result,"ID=".$self->escape($id)
if
		defined $id;
		!   unshift
@result,"Parent=".$self->escape($parent->primary_id) if 
		defined $parent;
		!   unshift @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result;
		  }
		
		
	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Fri Feb 23 13:49:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Feb 2007 12:49:44 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
References: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
Message-ID: <FEDC420E-AE3A-4AD4-A30B-54F8DF904D84@uiuc.edu>

To add to that, there's a HOWTO describing the differences:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

I agree w/ Aaron; if it has a location it's a feature,  otherwise  
it's an annotation.

chris

On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote:

> The fundamental difference (in my mind) between a feature and an
> annotation, is that a feature has a location/range, and thus the
> information represented in the feature is applicable only to that
> location/range.  An annotation, on the other hand, is "global", or at
> least non-localizable (note: a feature with a "fuzzy" location of
> "somewhere along this sequence, but I'm not sure where" is still not
> global - if you did/could know the location, you'd describe it as a
> feature, so it shouldn't be represented with an annotation).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:
>
>> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
>>
>> I get the impression they are designed to do similar things.  If  
>> so is
>> one deprecated and the other preferred?
>>
>> If their responsibilities are orthogonal to each other, what sorts of
>> tasks are suited to each?
>>
>> Thanks,
>> Michael
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri Feb 23 16:20:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 16:20:26 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com>

Excellent!

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
>  Lincoln,
>
> OK.  I'll do that...
>
> ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ ....
>
> ...ok - parse_attributes _looks_ right to me
>
> ...so, let's try it
>
> #load a feature into a new database:
>
> bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
> -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,
> blat;Name=mec\n")
>
> #It loaded ok.  Now, let's print it out in GFF3:
>
> perl -MBio::DB::SeqFeature::Store -e 'foreach
> (Bio::DB::SeqFeature::Store->new(-dsn =>
> "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type
> => "PH:A")) {print $_->gff3_string . "\n"}'
> J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat
>
> #output looks good to me
>
> Note, I tried loading attributes foo=bar;foo=blat and it came back
> foo=bar,blat.  So, you can load either way.
>
> I'll commit later today.
>
> --Malcolm
>
>
>  ------------------------------
> *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On
> Behalf Of *Lincoln Stein
> *Sent:* Friday, February 23, 2007 11:16 AM
> *To:* Cook, Malcolm
> *Cc:* bioperl list; lstein at cshl.org
> *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with
> multiple values
>
> Hi Malcom,
>
> You're quite right, and I appreciate your work in tracking down and fixing
> it. Before you commit the patch, can you confirm that the loader is working
> correctly so that comma-separated values are read back into the data
> structure as multiple attributes?
>
> Lincoln
>
> On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
> >
> > Lincoln, and other Bio::DB::SeqFeature wanderers:
> >
> > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> > does not respect the following:
> >
> > "Multiple attributes of the same type are indicated by separating the
> > values with the comma "," character"  (c.f.
> > http://www.sequenceontology.org/gff3.shtml)
> >
> > This one-liner demonstrates the problem:
> >
> > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> > -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> > J       A       PH      1       2       .       .       .
> > foo=bar;foo=blat;Name=mec
> >
> > Do you agree this is a problem?
> >
> > The fix is in the post-sig patch to
> > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> > stylistic privilege of promoting any ID, Parent, or Name attribute to
> > the front of column 9, so output is now:
> >
> > J       A       PH      1       2       .       .       .
> > Name=mec;foo=bar,blat
> >
> > Do you agree this is better?
> >
> > I am poised to commit it, as well as the functionally same patch to the
> > equivilent function in Bio/Graphics/FeatureBase.pm
> >
> > All clear?
> >
> > -- Malcolm Cook
> >
> >
> >
> > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> > ***************
> > *** 481,494 ****
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> > @values;
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   push @result,"ID=".$self->escape($id)                     if defined
> >
> > $id;
> > !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> > $parent;
> > !   push @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> > --- 481,498 ----
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> >
> > @values;
> > !     # NO! Multiple attributes of the same type are indicated by
> > !     # separating the values with the comma "," character - per
> > !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> > !     #push @result,join '=',$self->escape($t),join(',', map
> > {$self->escape($_)} @values);
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   unshift @result,"ID=".$self->escape($id)                     if
> > defined $id;
> > !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> > defined $parent;
> > !   unshift @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From enrique_rulz at yahoo.com  Sat Feb 24 16:23:59 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <9137941.post@talk.nabble.com>


Heikki Lehvaslaiho wrote:
> 
> Kurt,
> 
> There are  few things in your code to note:
> 
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth 
>   not using unless you have to. Using them once in your 
>   code slows execution down considerable. There is always 
>   an other way.
> - Keep in mind what you want to use the match positions for: 
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
> 
> Study my example code below.
> 
> Yours,
> 	-Heikki
> 
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
> 
> while ($seq =~ m/($pattern)/gi) {
> 
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
> 
>     print "$match : $start - $end\n";
> }
> 
> ###################################################################
> 
> 


Thanx for the instant reply!...Sorry cudn reply earlier..

Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
& 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
the code which I need to write says T*A shod be only the input not T.*A..So
Can we use replacment reg ex...sumthing like 
$pattern =~  s/.*/*/...or sumthing else...
But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

N e ways thanx a lot again for the code...Hope to listen frm you soon!

Kurt!


-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biology0046 at hotmail.com  Sat Feb 24 23:14:51 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 04:14:51 +0000
Subject: [Bioperl-l] how to change align output format
Message-ID: <BAY109-F2409DB6CAA116F289F8F17B48C0@phx.gbl>

Dear all:

I have problems in changing the output format of clustal alignment.
I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an 
mulitple sequences alignment, then i use the Bio::AlignIO module to write 
out the alignment. Scripts like this:
my 
$aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw');

The output :
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dere_GLEANR_9270       
..............S.............................................
FBgn0000097            
..............S.............................................
dsec_GLEANR_671        
..............S.............................................
dsim_GLEANR_6613       
..............S.............................................
dyak_GLEANR_1669       
..............S.............................................
                                     .


dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dere_GLEANR_9270       
............................................................
FBgn0000097            
............................................................
dsec_GLEANR_671        
............................................................
dsim_GLEANR_6613       
............................................................
dyak_GLEANR_1669       
............................................................

But , I want to change the output format as below, which do not change the 
identical residues into "." character. 
dere_GLEANR_9270       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dyak_GLEANR_1669       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsec_GLEANR_671        
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsim_GLEANR_6613       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
FBgn0000097            
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
                       
**************.*********************************************

dere_GLEANR_9270       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dyak_GLEANR_1669       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsec_GLEANR_671        
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsim_GLEANR_6613       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
FBgn0000097            
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
                       
************************************************************

Are their any parameters in the package that can be changed so that i can 
get the postier output format? Thank you Sincerely!

Jiang

_________________________________________________________________
?????????????????????????????? MSN Hotmail??  http://www.hotmail.com  


From bix at sendu.me.uk  Sun Feb 25 05:53:48 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:53:48 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
Message-ID: <45E16ABC.3060405@sendu.me.uk>

Tels,

I've forwarded this to the author of the module, Nat Goodman, and to the 
Bioperl mailing list 
(http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list).

But actually we have Bio::Graph::* as tentatively deprecated:
http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules
so any further work on it doesn't seem worthwhile.


-------- Original Message --------
Subject: Bio::Graph::SimpleGraph
Date: Sat, 24 Feb 2007 12:07:31 +0100
From: Tels <nospam-abuse at bloodgate.com>

Moin,

I just stumble dover Bio::Graph::SimpleGraph and read this comment:

"This is a simple, hopefully fast undirected graph package. The only reason
this exists is that the standard CPAN Graph pacakge, Graph::Base, is
seriously broken."

Really sad to see people always reinventing the wheel :/

Anyway, I wonder if you would like to make your module support Graph::Easy
(http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit
patches and do testing/documention for that.

All the best,

Tels

From bix at sendu.me.uk  Sun Feb 25 05:45:21 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:45:21 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9137941.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>
	<9137941.post@talk.nabble.com>
Message-ID: <45E168C1.80306@sendu.me.uk>

Kurt Gobain wrote:
> Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
> If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
> o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
> & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
> the code which I need to write says T*A shod be only the input not T.*A..So
> Can we use replacment reg ex...sumthing like 
> $pattern =~  s/.*/*/...or sumthing else...
> But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

These aren't Bioperl questions. For regular expression help see:
http://perldoc.perl.org/perlretut.html

Basically, you want a non-greedy match, so T.*?A

You can convert T*A by doing s/\*/.*?/

Here are some more regexs for you:
s/sum/some/g
s/frm/from/g
s/n e/any/g
etc...

From biology0046 at hotmail.com  Sun Feb 25 08:28:34 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 13:28:34 +0000
Subject: [Bioperl-l] AlignIO problems
Message-ID: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>

hi, all,
I use the AlignIO module to convert the alignment file.
my original file is :
CLUSTAL W(1.81) multiple sequence alignment


dana_GLEANR_11249      
MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
dere_GLEANR_7213       
...V...................I....................................
dgri_GLEANR_6962       
.......................I....................................
FBgn0004638            
.......................I....................................
dmoj_GLEANR_6118       
...........N...........I....................................
dper_GLEANR_18885      
...V...................I....................................
dpse_GLEANR_14384      
...V...................I....................................
dsec_GLEANR_3096       
.................N.....I....................................
dsim_GLEANR_9744       
-----------------------------...............................
dvir_GLEANR_4811       
.......................I....................................
dwil_GLEANR_10869      
.......................I....................................
dyak_GLEANR_13576      
.......................I....................................


dana_GLEANR_11249      
YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
.................L..........................................
dper_GLEANR_18885      
............................................................
dpse_GLEANR_14384      
............................................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      
VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
..............................V.D...........................
dper_GLEANR_18885      
.......................E....................................
dpse_GLEANR_14384      
.......................E....................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
dere_GLEANR_7213       ...............................
dgri_GLEANR_6962       ...............................
FBgn0004638            ...............................
dmoj_GLEANR_6118       ............Q..................
dper_GLEANR_18885      ...............................
dpse_GLEANR_14384      ...............................
dsec_GLEANR_3096       ...............................
dsim_GLEANR_9744       ...............................
dvir_GLEANR_4811       ...............................
dwil_GLEANR_10869      ...............................
dyak_GLEANR_13576      ...............................


I want to change those "." characters back to alphabetic expression, then i 
write the code like this:
use Bio::AlignIO;
my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
                      -format => 'clustalw');
my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
                       -format =>'clustalw');
while (my $aln=$in->next_aln() ){
    $aln->unmatch();
    $aln->set_displayname_flat();
    $out->write_aln($aln);
}

but when i run the code, there are error message like:

-------------------- WARNING ---------------------
MSG: Got a sequence with no letters in it cannot guess alphabet []
---------------------------------------------------

------------- EXCEPTION  -------------
MSG: No sequence with name [dsim_GLEANR_9744/1-182]
STACK Bio::SimpleAlign::displayname 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307
STACK Bio::SimpleAlign::set_displayname_flat 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374
STACK toplevel aligntest.pl:11

--------------------------------------

I don't know where is the problem.

Jiang

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From cjfields at uiuc.edu  Sun Feb 25 14:58:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Feb 2007 13:58:23 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
References: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu>

Bio::AlignIO::clustalw doesn't work with masked sequences; it parses  
the output quite literally as is, so any [.-] are treated as gaps.   
If the seqs are 100% identical then you will have a seq with 100%  
gaps and no sequence, thus giving you the warnings you see.

The best way to accomplish what you want is to not mask the sequence  
alignment to begin with when running clustalw/muscle/whatever.   
Exactly how are you generating these?  When I use clustalw no  
identity masking occurs by default.

chris

On Feb 25, 2007, at 7:28 AM, ? ?? wrote:

> hi, all,
> I use the AlignIO module to convert the alignment file.
> my original file is :
> CLUSTAL W(1.81) multiple sequence alignment
>
>
> dana_GLEANR_11249       
> MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
> dere_GLEANR_7213       ...V...................I....................... 
> .............
> dgri_GLEANR_6962       .......................I....................... 
> .............
> FBgn0004638            .......................I....................... 
> .............
> dmoj_GLEANR_6118       ...........N...........I....................... 
> .............
> dper_GLEANR_18885      ...V...................I....................... 
> .............
> dpse_GLEANR_14384      ...V...................I....................... 
> .............
> dsec_GLEANR_3096       .................N.....I....................... 
> .............
> dsim_GLEANR_9744        
> -----------------------------...............................
> dvir_GLEANR_4811       .......................I....................... 
> .............
> dwil_GLEANR_10869      .......................I....................... 
> .............
> dyak_GLEANR_13576      .......................I....................... 
> .............
>
>
>
> dana_GLEANR_11249       
> YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       .................L............................. 
> .............
> dper_GLEANR_18885      ............................................... 
> .............
> dpse_GLEANR_14384      ............................................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249       
> VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       ..............................V.D.............. 
> .............
> dper_GLEANR_18885      .......................E....................... 
> .............
> dpse_GLEANR_14384      .......................E....................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
> dere_GLEANR_7213       ...............................
> dgri_GLEANR_6962       ...............................
> FBgn0004638            ...............................
> dmoj_GLEANR_6118       ............Q..................
> dper_GLEANR_18885      ...............................
> dpse_GLEANR_14384      ...............................
> dsec_GLEANR_3096       ...............................
> dsim_GLEANR_9744       ...............................
> dvir_GLEANR_4811       ...............................
> dwil_GLEANR_10869      ...............................
> dyak_GLEANR_13576      ...............................
>
>
> I want to change those "." characters back to alphabetic  
> expression, then i write the code like this:
> use Bio::AlignIO;
> my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
>                      -format => 'clustalw');
> my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
>                       -format =>'clustalw');
> while (my $aln=$in->next_aln() ){
>    $aln->unmatch();
>    $aln->set_displayname_flat();
>    $out->write_aln($aln);
> }
>
> but when i run the code, there are error message like:
>
> -------------------- WARNING ---------------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> ---------------------------------------------------
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name [dsim_GLEANR_9744/1-182]
> STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ 
> Bio/SimpleAlign.pm:2307
> STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ 
> bioperl-live/Bio/SimpleAlign.pm:2374
> STACK toplevel aligntest.pl:11
>
> --------------------------------------
>
> I don't know where is the problem.
>
> Jiang
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cristiangary at gmail.com  Sun Feb 25 16:04:57 2007
From: cristiangary at gmail.com (Cristian Gary)
Date: Sun, 25 Feb 2007 18:04:57 -0300
Subject: [Bioperl-l] problem with blast report to ncbi webpage
Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com>

i have a problem with the blast report to the ncbi server.  the time to wait
the Rids dont showme any result.
the problem is the ncbi server o the biperl version.?
pd: the same code works very well a 3 weeks ago.


-- 
"El conocimiento le pertecene  a la humanidad"

"Gnu/linux   -------- free your mind......
www.kubuntu.org

From granjeau at tagc.univ-mrs.fr  Mon Feb 26 04:17:15 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Mon, 26 Feb 2007 10:17:15 +0100
Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object
Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr>

Hello !

I would like to fill a BioSeq object with the output from a dbfetch
request at EI on UniParc database (which replies only XML code, as I am
interested in references). If somebody could tell which BioPerl object
to use or a way or convert it in Swiss format or could tell me the way
to do it or has got a piece of code (is
http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good
starting point), I would appreciate a lot.

Best regards,
--Samuel

<entry accession="UPI00004A0D4A">
<dbReferenceList>
    <dbReference db="EMBL" id="CAI39485" version="1" version_i="1" 
active="Y" created="04-Jan-2005" last="15-Dec-2006"/>
    <dbReference db="UniProtKB/TrEMBL" id="Q5JVT0" version="1" 
version_i="1" active="N" created="15-Feb-2005" last="06-Feb-2007"/>
    <dbReference db="ENSEMBL" id="ENSP00000352958" version_i="2" 
active="Y" created="03-Apr-2006" last="27-Nov-2006"/>
    <dbReference db="IPI" id="IPI00418471" version="4" version_i="4" 
active="N" created="07-Mar-2005" last="07-Mar-2005"/>
    <dbReference db="IPI" id="IPI00646867" version="1" version_i="1" 
active="N" created="06-Sep-2005" last="06-Oct-2006"/>
    <dbReference db="VEGA" id="OTTHUMP00000019225" version_i="1" 
active="N" created="15-Aug-2005" last="02-Dec-2005"/>
</dbReferenceList>
<sequence length="431" crc64="8913D1F04A71CCFB">
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV
YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK
VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE
DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE
EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE
AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD
TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS
LNLRGKHFISL
</sequence>
</entry>


From bix at sendu.me.uk  Mon Feb 26 06:46:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Feb 2007 11:46:39 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
In-Reply-To: <45E16ABC.3060405@sendu.me.uk>
References: <45E16ABC.3060405@sendu.me.uk>
Message-ID: <45E2C89F.1020402@sendu.me.uk>

Nat replied, but I messed up to To:s so his reply didn't make it to the
list. Here's what he said:


Nathan (Nat) Goodman wrote:
Hi Tels

I agree it's sad to reinvent the wheel, but I don't think that's what
happened here. Your module seems to be focused on rendering graphs while
my module is concerned with computations on graphs.

In any case, as Sendu notes, SimpleGraph is in the process of being
deprecated. I fully support this move. It was intended to be a stopgap
until the main Perl Graph module was fixed.  Since that has now
happened, it's time for SimpleGraph to retire.

For the benefit of anyone using Graph: last I checked (six months or
more ago), it had serious performance problems on large graphs (probably
not too much of a surprise), and also was inexplicably slow on graphs
with edge attributes.  I see that the latter bug is marked "resolved" in
CPAN, but there's no indication of when or how.  We've moved to Boost
for graphs as large as the human protein interaction network.

Best,
Nat

From sanjib at bic.boseinst.ernet.in  Mon Feb 26 00:23:36 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Mon, 26 Feb 2007 10:53:36 +0530
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors

waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
xx.pep

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp

<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>

---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.

On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote
> Mailing list subscription confirmation notice for mailing list
> Bioperl-l
> 
> We have received a request from 202.141.148.27 for subscription of
> your email address, "sanjib at bic.boseinst.ernet.in", to the
> bioperl-l at lists.open-bio.org mailing list.  To confirm that you want
> to be added to this mailing list, simply reply to this message,
> keeping the Subject: header intact.  Or visit this web page:
> 
>     http://lists.open-bio.org/mailman/confirm/bioperl-
l/d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Or include the following line -- and only the following line -- in a
> message to bioperl-l-request at lists.open-bio.org:
> 
>     confirm d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Note that simply sending a `reply' to this message should work from
> most mail readers, since that usually leaves the Subject: line in the
> right form (additional "Re:" text in the Subject: is okay).
> 
> If you do not wish to be subscribed to this list, please simply
> disregard this message.  If you think you are being maliciously
> subscribed to the list, or have any other questions, send them to
> bioperl-l-owner at lists.open-bio.org.

--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070226/86a0137c/attachment.pl 

From cjfields at uiuc.edu  Mon Feb 26 09:59:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 08:59:21 -0600
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
	<20070226052336.M74918@bic.boseinst.ernet.in>
Message-ID: <C668C555-39ED-43A9-8B49-C7D0376D971F@uiuc.edu>

I tested this out and got BLAST to work for my test case (single  
fasta seq, since you didn't send any seqs for testing).  It keeps  
querying for the RID in what appears to be an infinite loop (i.e. it  
doesn't get rid of the RID properly); you can see this if you add '- 
verbose => 1' to your parameters.  I don't have time to delve into it  
but from a quick glance it may be due to your looping structure and  
how you are saving your rids.

As for your particular error, could it be something as simple as the  
server was overloaded or down?  It does happen from time to time...

Beyond that I can't make heads or tails of your script.  Was it  
cobbled together from a bunch of others?  If you are doing that you  
can probably expect some bugs to occur.

chris

On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote:

> Hi
> I have been running this script for some time and it was running  
> fine. I am
> using this linux machine with live IP(no proxy). But suudenly it  
> has stopped
> working with this errors
>
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> xx.pep
>
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
> Content-Length: 497
> Content-Type: application/x-www-form-urlencoded
>
> DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
> 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA 
> GDTLDVF
> TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT 
> AFTSLPV
> YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG 
> AAVIAMV
> HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S 
> TATISTI
> CS=off&EXPECT=1e-
> 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& 
> ENTREZ_
> QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
>
> <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Internal Server Error
> </BODY>
> </HTML>
>
> ---------------------------------------------------
>
> Though I am able to see the ncbi page from browser but am unable to  
> ping ot
> trace route to the server.
>
> Please help me.


From cjfields at uiuc.edu  Mon Feb 26 10:05:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 09:05:50 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
References: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu>

Make sure to keep this on the list, others may have some input.

You should be able to test the various sequence objects you're  
retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what  
you're expecting, then track down the problematic sequences.  My  
guess is the odd seqs are due to the way you are using Bio::DB::Fasta  
for each of the files.  I'm wondering if you are having problems with  
indices overwriting one another and are thus getting back blank seq  
objects.

You should probably consider just indexing all of your files  
together; according to the POD you can use a single Bio::DB::Fasta to  
index all of the files in one go (indicate the path and use '-glob')  
and retrieve what you need that way.  Either that or separating them  
into separate directories so the indices are also separate.

chris

On Feb 25, 2007, at 9:50 PM, ? ?? wrote:

> Thank you for your help!
> May be you are right, I use the following code to create my seq  
> object arrays:
>          my $outfilename=$dmel;
>          my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta");
>          my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta");
>          my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta");
>          my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta");
>          my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta");
>          my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta");
>          my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta");
>          my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta");
>          my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta");
>          my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta");
>          my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta");
>          my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta");
>          my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana);
>          my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana);
>          my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere);
>          my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere);
>          my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel);
>          my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel);
>          my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec);
>          my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec);
>          my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim);
>          my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim);
>          my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak);
>          my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak);
>          push @prots, $ana_pep_obj;
>          push @cdna, $ana_nuc_obj;
>          push @prots, $ere_pep_obj;
>          push @cdna, $ere_nuc_obj;
>          push @prots, $mel_pep_obj;
>          push @cdna, $mel_nuc_obj;
>          push @prots, $sec_pep_obj;
>          push @cdna, $sec_nuc_obj;
>          push @prots, $sim_pep_obj;
>          push @cdna, $sim_nuc_obj;
>          push @prots, $yak_pep_obj;
>          push @cdna, $yak_nuc_obj;
>
> then I use the @prots as input for  my  $aln=$aln_factory->align 
> (\@prots);
> This method will create align files with sequences masked.
>
> But if I use fasta files(not an object) which contain protein  
> sequences as input, $inputfile='FBgn0000097.pep';
> @params=('outorder'=>'INPUT');
> $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params);
> $aln=$factory->align($inputfile);
> #$aln->gap_char('-');
> $aln->map_chars('\.','-');
> $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw');
> $aln_out->write_aln($aln);
>
> This methods create files without masking~~~
> I think sequence objects created by "get_Seq_by_id" from sequence  
> databases directly are not appropriate.
>
> Thank you for your suggestion again!
>
> Jiang.
>
>> From: Chris Fields <cjfields at uiuc.edu>
>> To: ????? <biology0046 at hotmail.com>
>> Subject: Re: [Bioperl-l] AlignIO problems
>> Date: Sun, 25 Feb 2007 21:26:34 -0600
>>
>> I ran the same using a local fasta formatted file on my system  
>> which  works (no masking).
>>
>> Of note, the gaps were all marked as '.'.  You're gaps were both  
>> '.'  and '-',  which may mean that something is wrong with the seq  
>> objects  themselves.  Maybe SeqIO is misreading them?
>>
>> chris
>>
>> On Feb 25, 2007, at 7:34 PM, ????? wrote:
>>
>>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry  
>>> out  multiple alignment.
>>> my code is:
>>>         my @clustal_param=('outorder'=>'INPUT');
>>>         my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new  
>>> (@clustal_param);
>>>         my  $aln=$aln_factory->align(\@prots);###@prots is   
>>> array  of protein sequence objects
>>>         my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ 
>>> clustal/ ${outfilename}.aln",-format=>'clustalw');
>>>
>>>         $aln_out->write_aln($aln);
>>> This code produce alignment which mask identity residues.
>>> But if i use clustalW directly, the output is normal.
>>> Thank you for your help~
>>>
>>> Jiang
>>
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From michael.watson at bbsrc.ac.uk  Mon Feb 26 11:00:31 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon, 26 Feb 2007 16:00:31 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
	<6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi Lincoln/List
 
That's great, the axis now appears, but there are no labels.  This in
itself isn't a problem, as long as we can assume that the tick marks are
at 0, 50% and 100%?  If that's true, we can go with what we have,
otherwise I'm going to have to figure out a way to label the y-axis
 
Thanks
Mick

________________________________

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf
Of Lincoln Stein
Sent: 15 February 2007 18:53
To: michael watson (IAH-C)
Cc: BioPerl-List
Subject: Re: [Bioperl-l] The axis of GC content in
Bio::Graphics::glyph:dna


Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20); 

This will leave enough room on the left and right for you to see the Y
axis. Otherwise it runs off the edge of the image (ok, this is a
mis-design, but it was the only way to solve a chicken-and-egg problem
about who gets to say how wide the panel is) 

Lincoln


On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote: 

	Hi
	
	OK I have some great images out of this glyph, but I can't see
the axis,
	and nor is it labelled (ie does it go from 0 - 100%?) so isn't
great for
	publication.  The docs say:
	
	"NOTE: -gc_window=>'auto' gives nice results and is recommended
for 
	drawing GC content. The GC content axes draw slightly outside
the
	panel, so you may wish to add some extra padding on the right
and
	left. "
	
	Any idea how to do this?
	
	Basically, I want a nice GC graph with the axis quite clearly
labelled, 
	and a nice "%GC" title next to it :)
	
	Thanks
	
	Mick
	
	The information contained in this message may be confidential or
legally
	privileged and is intended solely for the addressee. If you have

	received this message in error please delete it & notify the
originator
	immediately.
	Unauthorised use, disclosure, copying or alteration of this
message is
	forbidden & may be unlawful.
	The contents of this e-mail are the views of the sender and do
not 
	necessarily represent the views of the Institute.
	This email and associated attachments has been checked locally
for
	viruses but we can accept no responsibility once it has left our
	systems.
	Communications on Institute computers are monitored to secure
the 
	effective operation of the systems and for other lawful
purposes.
	
	_______________________________________________
	Bioperl-l mailing list
	Bioperl-l at lists.open-bio.org 
	http://lists.open-bio.org/mailman/listinfo/bioperl-l
	

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory 
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Mon Feb 26 12:18:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 11:18:38 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
References: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu>


On Feb 26, 2007, at 9:59 AM, ? ?? wrote:

> Thank you!
> I have checked the sequences retrieved through lots of Bio:DB  
> objects work simultaneously.
> There are not problems you mentioned, the sequences are not  
> overwritten.

Again, keep this on the list.  I have my hands full this month so I  
will be checking the list only very sporadically; someone else may be  
able to help you.

The only explanation for the clustalw output you get is that you are  
not retrieving the correct sequence in some way fundamental way,  
which to me indicates the bug originates either in the way the  
sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my  
thought about conflicting indices) or in the way they are converted  
via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw.

When I have used Bio::DB::Fasta in the past I have never had a  
problem when indexing multiple files and retrieving sequences, so  
beyond running tests with your data I can't help you much beyond the  
above conjecturing.

chris


From jason at bioperl.org  Mon Feb 26 13:45:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 10:45:34 -0800
Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast
In-Reply-To: <20070226095515.68810@gmx.net>
References: <20070226095515.68810@gmx.net>
Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org>

Alex -
I am glad to see of your interest in the module, but I don't  
currently have any time to maintain it and so queries should be sent  
to the BioPerl mailing list.  In general we prefer you don't contact  
developers directly, but use the mailing list so that others can  
learn from questions.

Please note there are several tutorials and documentation on the  
website, you will get a better response from people if you can show  
you have at least tried to use the existing example code to construct  
your program.

-jason
On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote:

> Daer Jason Stajich,
> I hope you can me help.
>
> I am inspired of their module and would like to work with it.
> I am a student to the TFH Wildau.
> I have problems with the understanding of the module.
>
> You could send me an example.
>
> The example is to process a text file (FASTA) with NCBI-Blast (Web).
>
> Parameter:
> Choose database -> Others -> nr
> Limit by entrez query -> Campylobacter -> or select from: ->  
> Bacteria [ORGN]
> Expect -> 10
> Other advanced -> -q-1
>
> output format
> plain text without Graphical Overview
> Number of: -> Descriptions -> 10000
> Alignment view -> query-anchored with identities
>
> All other parameters remain undef.
>
> Thank you for your help.
>
> faithfully Alexander Auner
> -- 
> "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ...
> Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out


From jason at bioperl.org  Mon Feb 26 14:13:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 11:13:00 -0800
Subject: [Bioperl-l] BioPerl leadership additions
Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>

Dear BioPerl Users and Developers,

I want to announce a addition in the leadership of BioPerl.   
Christopher Fields and and Sendu Bala are now members of the BioPerl  
Core developer group to recognize their ongoing leadership in the  
project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
release and have made a significant commitment and contribution to  
the quality of the code and the documentation of the project.  We  
have invited them to be part of the core to recognize their work and  
to feel comfortable to ask them to do more. ;-)

The Core group was established to insure that someone was responsible  
for making code releases, vetting new developers for CVS write  
accounts, and generally dealing with things that might otherwise slip  
through the cracks.  We are very excited to have more people  
contributing to and maintaining the toolkit.  We look forward to  
their help along with all the other developers, as we work towards a  
1.6 release release this year.

As always, while their is a need for some individuals to lead the  
project, we encourage contributions from all levels of expertise to  
improve the code, documentation, and tutorials of the project.

We plan to discuss the progress of the toolkit at this year's  
Bioinformatics Open Source Conference held in Vienna, Austria in  
conjunction with the SIG meetings at ISMB.   We are trying to use  
BOSC 2007 as a chance for the developers of Open Bioinformatics  
Foundation sponsored and related projects to coordinate future  
development and release cycles.

Jason Stajich on behalf of the Core developers


From khan at cshl.edu  Mon Feb 26 15:29:19 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Mon, 26 Feb 2007 15:29:19 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791CA@mailbox02.cshl.edu>

Thanks Michael.  I have the scripts installed.  I can pass an id to indexed fasta file and retrieve the seq.  However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids?
Thanks.

-Sohail

-----Original Message-----
From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk]
Sent: Tuesday, February 20, 2007 4:33 PM
To: Khan, Sohail; Bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file.


Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Feb 26 16:44:49 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 26 Feb 2007 15:44:49 -0600
Subject: [Bioperl-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx>

Congrats Chris & Sendu! Very well-deserved. Keep up the great work.

Cheers!
Mauricio.

Jason Stajich wrote:
> Dear BioPerl Users and Developers,
> 
> I want to announce a addition in the leadership of BioPerl.   
> Christopher Fields and and Sendu Bala are now members of the BioPerl  
> Core developer group to recognize their ongoing leadership in the  
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
> release and have made a significant commitment and contribution to  
> the quality of the code and the documentation of the project.  We  
> have invited them to be part of the core to recognize their work and  
> to feel comfortable to ask them to do more. ;-)
> 
> The Core group was established to insure that someone was responsible  
> for making code releases, vetting new developers for CVS write  
> accounts, and generally dealing with things that might otherwise slip  
> through the cracks.  We are very excited to have more people  
> contributing to and maintaining the toolkit.  We look forward to  
> their help along with all the other developers, as we work towards a  
> 1.6 release release this year.
> 
> As always, while their is a need for some individuals to lead the  
> project, we encourage contributions from all levels of expertise to  
> improve the code, documentation, and tutorials of the project.
> 
> We plan to discuss the progress of the toolkit at this year's  
> Bioinformatics Open Source Conference held in Vienna, Austria in  
> conjunction with the SIG meetings at ISMB.   We are trying to use  
> BOSC 2007 as a chance for the developers of Open Bioinformatics  
> Foundation sponsored and related projects to coordinate future  
> development and release cycles.
> 
> Jason Stajich on behalf of the Core developers
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Tue Feb 27 08:26:30 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 27 Feb 2007 14:26:30 +0100
Subject: [Bioperl-l] parsing blast results
Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>

Hi,
I am using the module Bio::SearchIO to parse some blast results. I would
like to store the ids of the results into an array but I am not sure if this
is possible to do it with an existing subroutine. Does anyone have an idea
whether there is a method included within the module Bio::SearchIO to do so?
Thanks in advance,
L.Pardo

From cjfields at uiuc.edu  Tue Feb 27 09:11:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 08:11:37 -0600
Subject: [Bioperl-l] parsing blast results
In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
Message-ID: <E1B6ED22-1120-4333-AA73-19B57D102EA9@uiuc.edu>


On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote:

> Hi,
> I am using the module Bio::SearchIO to parse some blast results. I  
> would
> like to store the ids of the results into an array but I am not  
> sure if this
> is possible to do it with an existing subroutine. Does anyone have  
> an idea
> whether there is a method included within the module Bio::SearchIO  
> to do so?
> Thanks in advance,
> L.Pardo

Bio::SearchIO doesn't currently have a method to retrieve all the  
accessions in a BLAST result.  The best way to do this is to iterate  
through the objects:

my @accs;

while (my $result = $searchio->next_result) {
     while (my $hit = $result->next_hit) {
         push @accs, $hit->accession;
         # do whatever else here...
     }
}

print join ',', @accs;

I don't think all accessions in the description are parsed out at the  
moment, just the first one (or the one in the hit table).  If you  
want all of them or if you want the NCBI GI you'll need to parse them  
out of the description heading ($hit->description).

chris

From sac at bioperl.org  Tue Feb 27 12:59:22 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 27 Feb 2007 09:59:22 -0800
Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com>

Welcome to the club, Chris & Sendu. Always good to have an infusion of new
blood and capable, motivated hands.

Steve

On 2/26/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Dear BioPerl Users and Developers,
>
> I want to announce a addition in the leadership of BioPerl.
> Christopher Fields and and Sendu Bala are now members of the BioPerl
> Core developer group to recognize their ongoing leadership in the
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer
> release and have made a significant commitment and contribution to
> the quality of the code and the documentation of the project.  We
> have invited them to be part of the core to recognize their work and
> to feel comfortable to ask them to do more. ;-)
>
> The Core group was established to insure that someone was responsible
> for making code releases, vetting new developers for CVS write
> accounts, and generally dealing with things that might otherwise slip
> through the cracks.  We are very excited to have more people
> contributing to and maintaining the toolkit.  We look forward to
> their help along with all the other developers, as we work towards a
> 1.6 release release this year.
>
> As always, while their is a need for some individuals to lead the
> project, we encourage contributions from all levels of expertise to
> improve the code, documentation, and tutorials of the project.
>
> We plan to discuss the progress of the toolkit at this year's
> Bioinformatics Open Source Conference held in Vienna, Austria in
> conjunction with the SIG meetings at ISMB.   We are trying to use
> BOSC 2007 as a chance for the developers of Open Bioinformatics
> Foundation sponsored and related projects to coordinate future
> development and release cycles.
>
> Jason Stajich on behalf of the Core developers
>
> _______________________________________________
> Bioperl-announce-l mailing list
> Bioperl-announce-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l
>

From cjfields at uiuc.edu  Tue Feb 27 15:57:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 14:57:40 -0600
Subject: [Bioperl-l] Bio::SeqIO::FTHelper
Message-ID: <D6922F04-A349-41C4-B4DC-6763E3195B05@uiuc.edu>

Could anyone tell me what FTHelper is used for?  From what I gather  
it rolls up seqfeature data into a lightweight object but then  
creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ 
Swiss), which seems to be a waste of memory and time.  Is there  
something I'm missing (besides my sanity of course)?

chris

From Jay at jays.net  Wed Feb 28 04:39:55 2007
From: Jay at jays.net (Jay Hannah)
Date: Wed, 28 Feb 2007 03:39:55 -0600
Subject: [Bioperl-l] "Command-Line Bioinformatics"
Message-ID: <F7C1E903-1712-40A5-B817-8CDAADECEBF4@jays.net>

Reading this article:
http://www.linuxjournal.com/article/6977
Sequencing the SARS Virus - Linux Journal, Nov 2003

This guy needs Perl and/or BioPerl.  :)

> The sequence file is in FASTA format consisting of a header line  
> and the sequence, split into fixed-width lines. The following  
> counts the number of Gs and Cs in the sequence and presents the  
> total as a fraction of the total number of bases:
>
> > grep -v "^>" AY274119.fa | fold -w 1 |
> tr "ATGC" "..xx" | sort | uniq -c |
> sed 's/[^0-9]//g' | t -s "\012" " " |
> sed 's/\([0-9]*\) \([0-9]*\)/scale = 3;
> ?\2 \/ (\1+\2)/' |
> bc -i
> scale = 3; 12127 / (17624+12127)
> .407
>
> Out of the 29,751 bases in our sequence, 12,127 are either G or C,  
> giving a GC content of 41%.

BioPerl version:

use Bio::SeqIO;
my $io = Bio::SeqIO->new(
   -file   => 'AY274119.fa',
   -format => 'Fasta'
);
my $seq = $io->next_seq->seq;
print ( ($seq =~ tr/GC/GC/) / length ($seq) );

Command-line Perl:

perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ /  
length($_)' AY274119.fa

I'm sure you can Perl Golf my stabs at it.  :)

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From n.saunders at uq.edu.au  Wed Feb 28 05:25:08 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:25:08 +1000
Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E55884.9010908@uq.edu.au>

Dear Bioperlers,

I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used 
in a CGI script.  Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7.

If I load this test CGI script (cgi.pl) in a browser:

BEGIN CODE
----------
#!/usr/bin/perl -Tw
use strict;
use CGI;
use Bio::Factory::EMBOSS;

my $cgi = new CGI;
my $f   = new Bio::Factory::EMBOSS;

print $cgi->header,
       $cgi->start_html,
       $cgi->end_html;
--------
END CODE

I get a 500 server error and the Apache error log reads:
[error] [client 192.168.0.3] Premature end of script headers: cgi.pl

I can fix this in 2 ways:

(1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, 
which isn't a very useful fix.
(2) Remove the -T switch from the shebang line

There seem to be a few old posts on the list regarding "taint-safe" modules.  It 
seems that the new Bio::Factory::EMBOSS object is interfering with the headers 
in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com

From n.saunders at uq.edu.au  Wed Feb 28 05:30:31 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:30:31 +1000
Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E559C7.1090308@uq.edu.au>

Further to my previous email, adding:

BEGIN {
     $|=1;
     print "Content-type: text/html\n\n";
     use CGI::Carp('fatalsToBrowser');
}

to my CGI script generates:

Insecure $ENV{PATH} while running with -T switch at 
/usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com

From n.saunders at uq.edu.au  Wed Feb 28 05:50:58 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:50:58 +1000
Subject: [Bioperl-l] CGI taint solved
Message-ID: <45E55E92.10608@uq.edu.au>

Apologies for running a one-man thread, but I realised that I've now answered my 
own question regarding errors with CGI, Bio::Factory::EMBOSS and taint.

Given that the EMBOSS binaries are in /usr/local/bin, adding:

$ENV{'PATH'} = '/usr/local/bin'

near the top of the script does the trick.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From cjfields at uiuc.edu  Wed Feb 28 08:39:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 07:39:24 -0600
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <45E55E92.10608@uq.edu.au>
References: <45E55E92.10608@uq.edu.au>
Message-ID: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>

That could possibly clobber any other program calls from within the  
same script (unless they reside in /usr/local/bin) since you're  
explicitly assigning PATH, not appending:

$ENV{"PATH"} = '/usr/local/bin';

gets me (printing $ENV{"PATH"}):

/usr/local/bin

whereas this:

$ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};

gets me:

/usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin

There's probably a File::* module that does this safely per OS flavor.

chris

On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:

> Apologies for running a one-man thread, but I realised that I've  
> now answered my
> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
> taint.
>
> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>
> $ENV{'PATH'} = '/usr/local/bin'
>
> near the top of the script does the trick.
>
>
> Neil
> -- 
>   School of Molecular and Microbial Sciences
>   University of Queensland
>   Brisbane 4072 Australia
>
> http://nsaunders.wordpress.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Feb 28 10:35:31 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 10:35:31 -0500
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
References: <45E55E92.10608@uq.edu.au>
	<E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
Message-ID: <45E5A143.3080303@bms.com>

Neil, I believe this is your situation:
http://wn.cyberwerks.com/2000/0411.html
my advice: any commands executed from within cgi script should have a 
path hardcoded whenever possible.
If those commands require different path, try writing a wrapper shell 
script that sets the environment (which should be reset to the default 
once the shell script terminates). It all also depends on the type of 
environment you have- it it is not secure you may wish to think hard how 
to eliminate all security loopholes with CGI, I am definitely not an 
expert on this.
Stefan

Chris Fields wrote:
> That could possibly clobber any other program calls from within the  
> same script (unless they reside in /usr/local/bin) since you're  
> explicitly assigning PATH, not appending:
>
> $ENV{"PATH"} = '/usr/local/bin';
>
> gets me (printing $ENV{"PATH"}):
>
> /usr/local/bin
>
> whereas this:
>
> $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};
>
> gets me:
>
> /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
> local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
>
> There's probably a File::* module that does this safely per OS flavor.
>
> chris
>
> On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:
>
>   
>> Apologies for running a one-man thread, but I realised that I've  
>> now answered my
>> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
>> taint.
>>
>> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>>
>> $ENV{'PATH'} = '/usr/local/bin'
>>
>> near the top of the script does the trick.
>>
>>
>> Neil
>> -- 
>>   School of Molecular and Microbial Sciences
>>   University of Queensland
>>   Brisbane 4072 Australia
>>
>> http://nsaunders.wordpress.com
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From lubapardo at gmail.com  Wed Feb 28 12:21:07 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Wed, 28 Feb 2007 18:21:07 +0100
Subject: [Bioperl-l] retrieven ids
Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>

Hi everyone,
I wonder if someone could give an advice of the following:
I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not
want to translate the protein back to DNA, but rather get the DNA coding
sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any
module that allow to get all possible ids for a sequence given a gi protein
?

Thank you very much in advance,
L. Pardo

From johnston at biochem.ucl.ac.uk  Wed Feb 28 12:05:49 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT)
Subject: [Bioperl-l] _rearrange
Message-ID: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>

hi,

Is there a discussion of the rationale behind the _rearrange method
somewhere? I'm probably just being gormless, but I think I'm missing the
point a bit.

Is it okay for a method just to expect named params like
->foo(arg1=>'stuff', arg2=>'things'); ?

Cxx


From ckuanglim at yahoo.com  Wed Feb 28 10:51:50 2007
From: ckuanglim at yahoo.com (Chan Kuang Lim)
Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST)
Subject: [Bioperl-l] Problem of Installing Bioperl
Message-ID: <459942.77644.qm@web60518.mail.yahoo.com>

I have problem of installing bioperl in windows using command-line installation.
In the cmd windows, after 
ppm-shell
search bioperl
install 2

many downloading had done, but the next line is:
Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz


Hope you can answer my question. Thank you.

Regards,
Chan Kuang Lim
Malaysia

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.

From cjfields at uiuc.edu  Wed Feb 28 13:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 12:30:45 -0600
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu>

 From what I gather it's a convenient utility method that is used for  
consistent and enforced parameter checking/setting for any method,  
including the constructor.

There are a few modules that don't use _rearrange (Bio::WebAgent::new 
() comes to mind).  It's not required that you use it but the naming  
conventions for parameters outlined in _rearrange (in  
Bio::Root::RootI POD) are generally enforced for consistency across  
classes.

As a note, Sendu has committed a related method (_set_from_args) to  
CVS which works rather well, but I don't think it is in the last  
release.

chris

On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote:

> hi,
>
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm  
> missing the
> point a bit.
>
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?
>
> Cxx
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Wed Feb 28 14:31:29 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST)
Subject: [Bioperl-l] retrieven ids
In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu>

Whenever I'm unsure of how to do something, I first look to see if one of
the  HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has
example code which I think will do what you want.

Genbank records typically have the coding sequence of a protein as a
feature, so I would do something like:

- use the RefSeq protein IDs to query Entrez and get back the Genbank
records.

- read the Features HOWTO to refresh my memory on the syntax for grabbing
features.

That HOWTO is at:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

- whip up a little script to loop through the Genbank records one at a
time with SeqIO and pull out the cDNA sequence features.


Dave


From bix at sendu.me.uk  Wed Feb 28 14:38:46 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 19:38:46 +0000
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <45E5DA46.3020503@sendu.me.uk>

Caroline Johnston wrote:
> hi,
> 
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm missing the
> point a bit.
> 
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?

The Bioperl style for named args is -arg1, and wrong case is allowed as 
well. So, make use of _rearrange; it won't do you any harm.

From johnsonm at gmail.com  Wed Feb 28 14:59:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 13:59:09 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark
	and Glimmer
Message-ID: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>

    I happen to need something like Bio::Tools::Run::Genemark, so I'm coding
one up.  When I started on the tests for it, I realized I have a problem.  I
can distribute a fasta file downloaded from GenBank to use as input, but I
can't distribute the model file needed to actually run Genemark (
Genemark.hmm for prokaryotes, gmhmmp, in my case).
    It took *forever* to get a license, and I'm not thrilled with the
prospect of talking them out of a redistributable model file.  I'd love to
distribute the test, but I don't see how I'm going to be able to.
Suggestions?
    Also, I've settled on IPC::Run instead of system().  The docs indicate
the bits of it I'm using should be OK on Windows, except maybe for Win9X.
I don't want to clutter up the console, I don't like embedding stdout/stderr
redirection in command strings, and I don't want to have to worry about
signal handling (What if the child catches a ctrl-c halfway through
parsing?  What if the parent does?).  Anybody object to that?
   One final thing.  I'm lazy, I don't want to deal with parsing arguments
to the constructor, so I'm just calling _rearrange() to deal with it.  The
Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
stuff in Bio::Tools::Run:: takes dashless args.  Objections?

From dmessina at wustl.edu  Wed Feb 28 15:14:56 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST)
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>

> I'm not thrilled with the prospect of talking them out of a
redistributable
> model file.

I suppose it's not possible to fake your own, or at least the parts of it
you're testing for?

If not, I'd put the tests in a skip block while waiting to hear from the
Genemark folks.


> The Bio::Tools:: parsers all take dash options, but it looks like a
bunch of
> the stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu will chime in I'm sure, but I think he was planning to switch
everything  in Bio::Tools::Run over to dashed args anyway...


Dave


From bix at sendu.me.uk  Wed Feb 28 15:52:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 20:52:23 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <45E5EB87.9020106@sendu.me.uk>

Mark Johnson wrote:
>    One final thing.  I'm lazy, I don't want to deal with parsing arguments
> to the constructor, so I'm just calling _rearrange() to deal with it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby 
for an example.


From bix at sendu.me.uk  Wed Feb 28 16:29:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 21:29:32 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
Message-ID: <45E5F43C.9080902@sendu.me.uk>

I have GD 2.35 and GD::SVG 2.33 installed.

I have a working script in which a Bio::Graphics::Panel object is made 
and output with:

print $panel->png;

This is fine. Changing it to:

print $panel->svg;

Gives the error:

Can't locate object method "svg" via package "GD:Image" at 
/.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.


Am I supposed to do something else to get this to work?


Cheers,
Sendu.

From crabtree at tigr.ORG  Wed Feb 28 16:40:52 2007
From: crabtree at tigr.ORG (Jonathan Crabtree)
Date: Wed, 28 Feb 2007 16:40:52 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F6E4.80003@tigr.org>


Sendu-

I believe you must set 'image_class' to 'GD::SVG' when you create the 
Panel (and note that older versions of Bio::Graphics::Panel don't know 
anything about this parameter.)  Here's the relevant part of the Panel 
perldoc:

   -image_class To create output in scalable vector
                graphics (SVG), optionally pass the image
                class parameter 'GD::SVG'. Defaults to
                using vanilla GD. See the corresponding
                image_class() method below for details.

Jonathan


Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
> 
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
> 
> print $panel->png;
> 
> This is fine. Changing it to:
> 
> print $panel->svg;
> 
> Gives the error:
> 
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
> 
> 
> Am I supposed to do something else to get this to work?
> 
> 
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

From bix at sendu.me.uk  Wed Feb 28 17:01:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 22:01:17 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F6E4.80003@tigr.org>
References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org>
Message-ID: <45E5FBAD.3030404@sendu.me.uk>

Jonathan Crabtree wrote:
> 
> Sendu-
> 
> I believe you must set 'image_class' to 'GD::SVG' when you create the 
> Panel (and note that older versions of Bio::Graphics::Panel don't know 
> anything about this parameter.)  Here's the relevant part of the Panel 
> perldoc:

... Oh! I had no idea there was any perldoc for these modules, hiding 
down there at the bottom. Does anyone want to intersperse the docs?...

From cjfields at uiuc.edu  Wed Feb 28 17:10:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 16:10:54 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>

On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote:

>     I happen to need something like Bio::Tools::Run::Genemark, so  
> I'm coding
> one up.  When I started on the tests for it, I realized I have a  
> problem.  I
> can distribute a fasta file downloaded from GenBank to use as  
> input, but I
> can't distribute the model file needed to actually run Genemark (
> Genemark.hmm for prokaryotes, gmhmmp, in my case).
>     It took *forever* to get a license, and I'm not thrilled with the
> prospect of talking them out of a redistributable model file.  I'd  
> love to
> distribute the test, but I don't see how I'm going to be able to.
> Suggestions?

For bioperl-run tests you have to have the program installed for  
tests to work (otherwise they are passed over).  Therefore one would  
assume if you had the GeneMark program you would have the models as  
well.

You could set up your module to require an env. variable be set (like  
the HMMER module, for instance) which contains the executables and/or  
the models, so that if it isn't set the tests are skipped.

>     Also, I've settled on IPC::Run instead of system().  The docs  
> indicate
> the bits of it I'm using should be OK on Windows, except maybe for  
> Win9X.
> I don't want to clutter up the console, I don't like embedding  
> stdout/stderr
> redirection in command strings, and I don't want to have to worry  
> about
> signal handling (What if the child catches a ctrl-c halfway through
> parsing?  What if the parent does?).  Anybody object to that?

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?   
Otherwise we'll need to add it to the optional dependencies for  
bioperl-run.

>    One final thing.  I'm lazy, I don't want to deal with parsing  
> arguments
> to the constructor, so I'm just calling _rearrange() to deal with  
> it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a  
> bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in  
another thread _rearrange() works as well.

chris

From johnsonm at gmail.com  Wed Feb 28 17:29:36 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:29:36 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
Message-ID: <ebf5eb170702281429u51e8f7fgb9c0591a410500f8@mail.gmail.com>

On 2/28/07, Dave Messina <dmessina at wustl.edu> wrote:
>
> > I'm not thrilled with the prospect of talking them out of a
> redistributable model file.
>
> I suppose it's not possible to fake your own, or at least the parts of it
> you're testing for?


We got a gzipped tarball with some model files and a precompiled executable
(gmhmmp).  As far as building a model file goes, I don't even have two
sticks to rub together.  I suppose it's possible that it's not actually some
weird proprietary format, I'll go dig for some docs...but I don't hold out a
lot of hope.

From sukhinder.sandhu at osumc.edu  Wed Feb 28 16:49:31 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Wed, 28 Feb 2007 16:49:31 -0500
Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx
Message-ID: <C20B631B.1E0%sukhinder.sandhu@osumc.edu>

Hi
I am having trouble installing Bundle::BioPerl through CPAN. I don't know if
this has something to do with my having root priveleges. Can you please
suggest how may I proceed to get over this. I shall really appreciate any
help. I am pasting part of the error it keeps giving after trying to install
every module.
######################
CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz

make: *** No rule to make target
`/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h',
needed by `Makefile'.  Stop.
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible

###############################
Thanks

sukhinder


From sukhinder.sandhu at osumc.edu  Tue Feb 27 23:41:43 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Tue, 27 Feb 2007 23:41:43 -0500
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
Message-ID: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>

Hi
I am trying to install bioperl on my MACOSX and having problems. I try to
following the instructions both at the www.tc.umn.edu..... And in the README
and INSTALL files in the bioperl folder that I downloaded.
The error I get is the following: (end part of the output is copied)
####################
t/versions........ok
t/xs..............skipped
        all skipped: C_support not enabled
Failed Test Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/compat.t     5  1280    60    5   8.33%  25-28 31
4 tests and 31 subtests skipped.
Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay.
make: *** [test] Error 2
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force
Couldn't install Module::Build, giving up.
BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51.
Compilation failed in require at Build.PL line 14.
BEGIN failed--compilation aborted at Build.PL line 14.
###########################################################################
I am not able to figure out whats' going wrong.
And when I try to run the CPAN, I get the follwing error. I have no idea how
to fix these. Any help is greatly appreciated.
############################################################################
[Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e
shell  Terminal does not support AddHistory.

There seems to be running another CPAN process (pid 7207).  Contacting...
Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
    On UNIX try:
    rm /Users/sand60/.cpan/.lock
  and then rerun us.
 at -e line 1
###################################################
And doing what it says, removing some lock file doesn't help. I am wondering
if all this has something to do with having root priveleges on the system
and if so , is there an alternative? Thanks


sukhinder


From stefan.kirov at bms.com  Wed Feb 28 16:44:05 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 16:44:05 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F7A5.3090805@bms.com>

I think you should create the object with -image_class='svg'. Can you 
post the code with wich you create the object?
Stefan

Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
>
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
>
> print $panel->png;
>
> This is fine. Changing it to:
>
> print $panel->svg;
>
> Gives the error:
>
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
>
>
> Am I supposed to do something else to get this to work?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From johnsonm at gmail.com  Wed Feb 28 17:54:02 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:54:02 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
Message-ID: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>

On 2/28/07, Chris Fields <cjfields at uiuc.edu> wrote:

> For bioperl-run tests you have to have the program installed for
> tests to work (otherwise they are passed over).  Therefore one would
> assume if you had the GeneMark program you would have the models as
> well.
>
> You could set up your module to require an env. variable be set (like
> the HMMER module, for instance) which contains the executables and/or
> the models, so that if it isn't set the tests are skipped.


Sounds like a plan.

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?
> Otherwise we'll need to add it to the optional dependencies for
> bioperl-run.


I'd test it, but I don't have access to any Win9x boxes anymore.  IPC::Run
is not a core module, but I think it's worth the dependency.  I considered
IPC::Open3, but it can't be made reliable on Win32, something about not
being able to select() on filehandles, only sockets.  I also looked at
IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection
layered on top of system().  I don't like using system() due to issues with
signals (Such as the user hitting ctrl-c and taking out the child).  I feel
better knowing the wrapped executable is in another process disconnected
from the console.

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in
> another thread _rearrange() works as well.


I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
preferred to the other?

From bix at sendu.me.uk  Wed Feb 28 19:13:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 01 Mar 2007 00:13:29 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules
 for	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
Message-ID: <45E61AA9.9030906@sendu.me.uk>

Mark Johnson wrote:
> I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
> preferred to the other?

_set_from_args() is implemented using _rearrange() iirc. In any case, 
they do different things but _set_from_args() just makes creating 
wrapper modules a lot simpler. Another example: compare revisions 1.15 
and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it 
to use _set_from_args() and _setparams().

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h

So, its new, but I'd recommend new modules, especially wrappers, make 
use of it.

From bix at sendu.me.uk  Wed Feb 28 19:19:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 01 Mar 2007 00:19:29 +0000
Subject: [Bioperl-l] Problem of Installing Bioperl
In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com>
References: <459942.77644.qm@web60518.mail.yahoo.com>
Message-ID: <45E61C11.90806@sendu.me.uk>

Chan Kuang Lim wrote:
> I have problem of installing bioperl in windows using command-line installation.
> In the cmd windows, after 
> ppm-shell
> search bioperl
> install 2
> 
> many downloading had done, but the next line is:
> Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz

Does that file exist on your system? Is it larger than 0kb? Can you open 
it yourself?

From cjfields at uiuc.edu  Wed Feb 28 20:19:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 19:19:31 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules
	for	Genemark and Glimmer
In-Reply-To: <45E61AA9.9030906@sendu.me.uk>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
	<45E61AA9.9030906@sendu.me.uk>
Message-ID: <93734147-BDDE-4D73-B8F1-FB4A9D073F9B@uiuc.edu>


On Feb 28, 2007, at 6:13 PM, Sendu Bala wrote:

> Mark Johnson wrote:
>> I'm using _rearrange() now.  I'll look at _set_from_args().  Is  
>> either one
>> preferred to the other?
>
> _set_from_args() is implemented using _rearrange() iirc. In any case,
> they do different things but _set_from_args() just makes creating
> wrapper modules a lot simpler. Another example: compare revisions 1.15
> and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it
> to use _set_from_args() and _setparams().
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/ 
> Alignment/Lagan.pm.diff? 
> r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h
>
> So, its new, but I'd recommend new modules, especially wrappers, make
> use of it.

Agreed; I think it allows for parameter variations (dashed, dashless,  
etc) and can create on-the-fly simple get/setters, so is particularly  
suited for wrappers.

_rearrange() will always have use in situations where using named  
parameters helps (long arg lists) but you don't want get/setters,  
just values.

From dmessina at wustl.edu  Wed Feb 28 20:40:39 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 19:40:39 -0600 (CST)
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
In-Reply-To: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>
References: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>
Message-ID: <58485.75.33.119.169.1172713239.squirrel@gscmail.wustl.edu>

> t/compat.t     5  1280    60    5   8.33%  25-28 31

This is the test that failed. I think you snipped the part above where the
actual errors causing the failure was printed.


> There seems to be running another CPAN process (pid 7207). Contacting...
> Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
>     On UNIX try:
>     rm /Users/sand60/.cpan/.lock
>   and then rerun us.
>  at -e line 1
> ###################################################
> And doing what it says, removing some lock file doesn't help.

Are you sure the lock file is really being removed? If so, what was the
error you got when running it after doing that?


Also, this line is important:
>  /usr/bin/make test -- NOT OK

It looks like you're trying to install on OS X. By default, OS X has perl
but not make. So /usr/bin/make probably doesn't exist on your system,
along with lots of other UNIX tools you'll want. To verify this, type:

which /usr/bin/make

on the command line. If you get:
/usr/bin/make: Command not found.

you'll need to install the OS X developer tools, called Xcode. You'll need
to register first, but you can get the latest version at:
http://developer.apple.com/tools/download/

After you do that, reread the BioPerl install docs and try to install
again. Since you don't have root on your machine, be sure to read the part
of the install instructions that describe what to do.


Dave


From hlapp at gmx.net  Wed Feb 28 23:16:38 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 28 Feb 2007 23:16:38 -0500
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
Message-ID: <EE9CB4BA-3C6C-4F38-85DB-E0A21FCD8B07@gmx.net>


On Feb 28, 2007, at 5:54 PM, Mark Johnson wrote:

> I don't like using system() due to issues with
> signals (Such as the user hitting ctrl-c and taking out the  
> child).  I feel
> better knowing the wrapped executable is in another process  
> disconnected
> from the console.

I'm not sure how the user would be able to take out the child hitting  
ctrl-c if you run it through system() (except if the parent  
terminates first - but maybe then terminating a run-away child is in  
good order).

I haven't read the IPC::run POD in full detail but you will want to  
make sure that if the parent gets killed the child does get killed  
too, or otherwise you'll have a run-away process that novices will  
have trouble with understanding or terminating.

Other than that though IPC::run seems like a useful module, so  
incurring this as a dependency should be fine.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cuiw at ncbi.nlm.nih.gov  Thu Feb  1 09:47:38 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Thu, 1 Feb 2007 09:47:38 -0500
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov>

This is a simple test from gene ID 3632373 (protein is 46100068) to
contig coordinates: 

perl -MLWP::Simple -e 'map {print $_, "\n" if
/<(Gene-source_src.*?>)(.*)?<$1/} (split "\n",
get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i
d=3632373&retmode=xml}))'

You need to translate protein id to gene id though. 

If the genome is available at Map Viewer, try (the contig name is
NW_101115 from last step)
http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA
PS=genes&cmd=txt

Wenwu Cui, PhD

-----Original Message-----
From: Rainer Machne [mailto:raim at tbi.univie.ac.at] 
Sent: Wednesday, January 31, 2007 4:10 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Dear Bioperl list,

hoping not be on the wrong email list, i would have a short question:

Is there a standard way or are there nice (Bioperl) tools to come from a

gene id (gi) other ids (see below) to the genomic coordinates of the 
respective gene?

We have Fasta files retrieved from NCBI protein Blast in fungal genomes:

 >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago 
maydis 521]
or
 >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida
glabrata]

(we only have gi, ref and gb in my set).

I retrieved all my fasta files from whole fungal genomes with available 
protein sequences at
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi

As I only searched whole finished genomes (not shotgun), I thought it 
would then be easy to get the genomic coordinates and retrieve upstream 
sequences, but we have failed so far to find a consistent way to do this

automatically. Many of the gi entries refer to mRNAs or partial mRNAs 
and the way to the coordinates seems to differ for each case.

Any suggestions would be appreciated.

with kind regards,
Rainer Machne

University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From raim at tbi.univie.ac.at  Thu Feb  1 07:54:21 2007
From: raim at tbi.univie.ac.at (Rainer Machne)
Date: Thu, 01 Feb 2007 13:54:21 +0100
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at>

Barry and Jason,

thanks for your quick and very helpful replies.

I guess we should have done (or repeat) our blast search at 
http://fungal.genome.duke.edu/
to get better mapping from proteins to genomes ?

As I retrieved all my proteins via whole genome blasts we should find 
(most of) them in the genbank files ... a good opportunity for me to 
learn some Bioperl and the other packages you mentioned in case we want 
to do more complex analysis later :-)

Thank you very much!

Rainer


Barry Moore wrote:
> Rainer,
> 
> We use a perl library called CGL written by Mark Yandell and  colleagues 
> (which in turn uses Chris Mungal's BioChaos and  Unflattener.pm referred 
> to by Jason) for this type of task.  The  basic pipeline is convert 
> GenBank files to Chaos XML, then use CGL  with those XML files to get a 
> nice object oriented access to exons,  transcripts, proteins, 
> coordinates and more for of those genes.  I am  currently using this 
> with good success on most GenBank genomes  (unfortunately I haven't been 
> working with the fungal genomes, but it  should work fine).  The Ensembl 
> API provides similar functionality  for Ensembl genomes - but not very 
> many fungi there.
> 
> http://www.yandell-lab.org/cgl/
> http://www.ensembl.org/info/software/core/core_tutorial.html
> 
> Feel free to contact Mark or myself  directly if you are interested  in 
> using CGL.
> 
> Barry
> 
> On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:
> 
>> Dear Bioperl list,
>>
>> hoping not be on the wrong email list, i would have a short question:
>>
>> Is there a standard way or are there nice (Bioperl) tools to come  from a
>> gene id (gi) other ids (see below) to the genomic coordinates of the
>> respective gene?
>>
>> We have Fasta files retrieved from NCBI protein Blast in fungal  genomes:
>>
>>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
>>
>> maydis 521]
>> or
>>
>>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida  glabrata]
>>
>>
>> (we only have gi, ref and gb in my set).
>>
>> I retrieved all my fasta files from whole fungal genomes with  available
>> protein sequences at
>> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
>>
>> As I only searched whole finished genomes (not shotgun), I thought it
>> would then be easy to get the genomic coordinates and retrieve  upstream
>> sequences, but we have failed so far to find a consistent way to do  this
>> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
>> and the way to the coordinates seems to differ for each case.
>>
>> Any suggestions would be appreciated.
>>
>> with kind regards,
>> Rainer Machne
>>
>> University of Vienna
>> Department for Theoretical Chemistry
>> Theoretical Biochemistry Group
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu Feb  1 12:55:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 11:55:27 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
Message-ID: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>


On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:

> Barry and Jason,
>
> thanks for your quick and very helpful replies.
>
> I guess we should have done (or repeat) our blast search at
> http://fungal.genome.duke.edu/
> to get better mapping from proteins to genomes ?
>
> As I retrieved all my proteins via whole genome blasts we should find
> (most of) them in the genbank files ... a good opportunity for me to
> learn some Bioperl and the other packages you mentioned in case we  
> want
> to do more complex analysis later :-)
>
> Thank you very much!
>
> Rainer

If the data is available in GenBank you could run the BLAST searches  
at NCBI and limit the search with an Entrez query:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query

Most (all?) genome files are tagged as complete

I'm not sure but there might be a way of doing this via  
Bio::Tools::Run::RemoteBlast.  Jason, any ideas?

chris


From cjfields at uiuc.edu  Thu Feb  1 13:09:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 12:09:16 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu>

> If the data is available in GenBank you could run the BLAST searches
> at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete

sorry, didn't finish that...

"Most (all?) genome files are tagged as complete, wgs, in progress,  
etc. and can be limited by taxonomy using Fungi[ORGN] or similar."

chris


From jason at bioperl.org  Thu Feb  1 13:36:02 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 10:36:02 -0800
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <D8E2FDBC-AA2E-4EB9-8CB1-F3610776B41C@bioperl.org>


On Feb 1, 2007, at 9:55 AM, Chris Fields wrote:

>
> On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:
>
>> Barry and Jason,
>>
>> thanks for your quick and very helpful replies.
>>
>> I guess we should have done (or repeat) our blast search at
>> http://fungal.genome.duke.edu/
>> to get better mapping from proteins to genomes ?
>>

Well I'm not quite sure of your exact goals.  To find upstream  
regions of known genes, or look at upstream regions of orthologous  
genes?

You can first figure out orthologs based on protein similarities,  
then go in an extract upstream regions for the orthologous genes (I  
provide a link to a big all-vs-all FASTA result at the bottom of the  
page if you want those results, as well as some pairiwise orthology  
assignments, although you may want more or less stringent parameters).

All the GFF and AA data is freely available for download on the site  
for each genome we've annotated or for annotation we've re-formatted  
so you can do things locally and/or modify it to your liking.


>> As I retrieved all my proteins via whole genome blasts we should find
>> (most of) them in the genbank files ... a good opportunity for me to
>> learn some Bioperl and the other packages you mentioned in case we  
>> want
>> to do more complex analysis later :-)
>>
>> Thank you very much!
>>
>> Rainer
>
> If the data is available in GenBank you could run the BLAST  
> searches at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete
>
> I'm not sure but there might be a way of doing this via  
> Bio::Tools::Run::RemoteBlast.  Jason, any ideas?
>
> chris

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From reenayadav at gmail.com  Thu Feb  1 13:38:03 2007
From: reenayadav at gmail.com (Reena Yadav)
Date: Fri, 2 Feb 2007 00:08:03 +0530
Subject: [Bioperl-l] pdb parser
Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com>

hi need to extract pdb atomic coordinates (1ake), and do certain
calculations.
i am going stepwise:
steps that involved are:
(1) reading the atomic coordinates
(2) read the result in a file.

need to understand how to whole xyz line in another file.
could someone help.
R.


From jason at bioperl.org  Thu Feb  1 08:06:42 2007
From: jason at bioperl.org (sandhya khatal)
Date: Thu, 1 Feb 2007 13:06:42 +0000
Subject: [Bioperl-l] Regarding Bioperl program
Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com>

Respected Sir,
                      I want to do a program which gives dendrogram like
UPGMA a clustering method, but i want this dendrogram by using single
linkage or centroid method.Can u help me for this.U have given the  
code for
tree but i want dendrogram as output by using above any method.

Thanks for anticipating.

Regards,
Sandhya Khatal.


From jason at bioperl.org  Thu Feb  1 19:55:26 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 16:55:26 -0800
Subject: [Bioperl-l] Fwd: Regarding Bioperl program
References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com>
Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org>

re-forwarding Sandhya's email to the list so the email address is  
visible.

The approach that is coded in bioperl is for distance based data such  
as evolutionary distance of DNA or protein sequences - I assume you  
are talking about clustering expression data? You may want to focus  
on the available literature and toolkits that focus on expression  
data - something BioPerl doesn't deliberately focus on right now.

-jason
Begin forwarded message:

> From: "sandhya khatal" <sandhya.khatal at gmail.com>
> Date: February 1, 2007 5:06:42 AM PST
> To: jason at bioperl.org
> Subject: Regarding Bioperl program
>
> Respected Sir,
>                      I want to do a program which gives dendrogram  
> like
> UPGMA a clustering method, but i want this dendrogram by using single
> linkage or centroid method.Can u help me for this.U have given the  
> code for
> tree but i want dendrogram as output by using above any method.
>
> Thanks for anticipating.
>
> Regards,
> Sandhya Khatal.

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From lzhtom at hotmail.com  Thu Feb  1 22:20:10 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:20:10 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F24A936E35D7C6B9059EE3CC79B0@phx.gbl>


_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From lzhtom at hotmail.com  Thu Feb  1 22:27:39 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:27:39 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>

Sorry guys, the former empty mail was sent out by mistake.

I'm using Bio::index::Fasta to index a file containing lots of sequences in 
fasta format. All is fine except one thing.

According to the bioperl tutorial and the documents, the following code 
will make a indexed file:

my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
                                     -write_flag => 1);
    $inx->make_index("test.fasta");

And in another script I can access the indexed file by sayinig

$ENV{BIOPERL_INDEX} = "."; # find index in current directory
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
my $seq=$inx->fetch("ent1001");        #fetch the sequence named ent1001

However, after running the first script, I cannot find a new file 
test.fasta.idx in my current directory. And not surprisingly, when I ran 
the second script, perl told me it couldn't find "test.fasta.idx".

What's going on here?

Thanks a lot!

_________________________________________________________________
???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  


From jason at bioperl.org  Fri Feb  2 01:24:44 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 22:24:44 -0800
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
References: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
Message-ID: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>

I don't think BIOPERL_INDEX does anything in the module so that  
documentation is not quite right.  the env variable is used in the  
scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job  
went bad somewhere.

you need to specify the full path you want with -filename - you can  
just prepen the BIOPERL_INDEX to the filename like.
-filename => $ENV{BIOPERL_INDEX}."/$index"

-jason
On Feb 1, 2007, at 7:27 PM, zhihua li wrote:

> Sorry guys, the former empty mail was sent out by mistake.
>
> I'm using Bio::index::Fasta to index a file containing lots of  
> sequences in fasta format. All is fine except one thing.
>
> According to the bioperl tutorial and the documents, the following  
> code will make a indexed file:
>
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
>                                     -write_flag => 1);
>    $inx->make_index("test.fasta");
>
> And in another script I can access the indexed file by sayinig
>
> $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> my $seq=$inx->fetch("ent1001");        #fetch the sequence named  
> ent1001
>
> However, after running the first script, I cannot find a new file  
> test.fasta.idx in my current directory. And not surprisingly, when  
> I ran the second script, perl told me it couldn't find  
> "test.fasta.idx".
>
> What's going on here?
>
> Thanks a lot!
>
> _________________________________________________________________
> ?????????????? MSN Messenger:  http:// 
> messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From marian.thieme at lycos.de  Fri Feb  2 05:06:09 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 10:06:09 +0000
Subject: [Bioperl-l] seqDiff
Message-ID: <101051013116870@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/cb3feed1/attachment-0002.html>

From marian.thieme at lycos.de  Fri Feb  2 06:37:05 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 11:37:05 +0000
Subject: [Bioperl-l] susp. header
Message-ID: <188661178024725@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/d3c3535c/attachment-0002.html>

From lubapardo at gmail.com  Fri Feb  2 09:31:06 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 2 Feb 2007 15:31:06 +0100
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>

Hello, (I am using bioperl-1.5.2_100, linux machine)
I am trying to get the ids of a list of genes using the module
Bio::DB::Query:GenBank. I have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n";
my @a1=<READER_1>;
close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives me
the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards,
Luba Pardo


From hlapp at gmx.net  Fri Feb  2 10:44:02 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:44:02 -0500
Subject: [Bioperl-l] susp. header
In-Reply-To: <188661178024725@lycos-europe.com>
References: <188661178024725@lycos-europe.com>
Message-ID: <EE6A34C7-0579-487E-B529-1F82E714793D@gmx.net>

You are sending HTML emails. You should configure your mailer to  
ideally just send plain text. If you really must have fancy formatted  
emails (i.e., HTML-formatted emails), then configure it such that the  
mailer will send a plain text and a HTML version.

(Many spam filters will flag email the body of which consists of only  
an HTML attachment.)

	-hilmar

On Feb 2, 2007, at 6:37 AM, marian thieme wrote:

> why each message I sent to this list is considered to have a susp.  
> header ?
>
> Marian
>
>  Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit  
> 20 Singles aus Ihrer Umgebung.Meetic.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 11:03:16 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 11:03:16 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <1170432196.2706.661.camel@localhost.localdomain>

Hi Hilmar,

That is a good idea; when I started down this road, it felt like there
would only be a few things that I might want to allow to be different,
but I think you are right that having one standard implementation that
can be subclassed for legacy systems is a good thing.

Scott


On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> 
> > The second main change was to introduce a -flybase_compat argument  
> > when
> > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> > (that are compatable with flybase) will be used, but now the default
> > will be to use current standards:
> 
> Just my $0.02 ... obviously, Flybase may be the only organization  
> that uses an 'old style' or any other way not compliant with 'current  
> standards' (presumably SO), but if it's not the only one then this  
> approach won't scale.
> 
> Also, an argument -flybase_compat suggests to the unsuspecting that  
> this is an endorsed flavor of the standard and fine to use for  
> everyone else too.
> 
> If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
> compliant with the standard as we all want it, keep it free from  
> litter caused by usage of old versions of SO, and create a second  
> module fb-chadoxml.pm that inherits from the first and merely  
> overrides a few things so that it works for Flybase. This way, other  
> organizations with similar needs can follow the path and create their  
> own xyz-chadoxml.pm, rather than having to muck around in the  
> chadoxml.pm that comes with the distribution.
> 
> I'm not sure I fully grasp the underlying issue, so I may not make  
> much sense here. Apologies if that's the case ...
> 
> 	-hilmar
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/2488afc4/attachment-0002.bin>

From bosborne11 at verizon.net  Fri Feb  2 10:27:44 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 02 Feb 2007 10:27:44 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <C1E8C2A0.C967%bosborne11@verizon.net>

Hilmar,

I second your motion, good idea. Let's keep the standard module nice and
clean.

Brian O.


On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

> and create a second
> module fb-chadoxml.pm that inherits from the first and merely
> overrides a few things so that it works for Flybase


From Kevin.M.Brown at asu.edu  Fri Feb  2 10:52:20 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 2 Feb 2007 08:52:20 -0700
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu>

It looks like you have some problems with the code you posted.

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i < @a1;$i++ ) {
        
# is this necessary as you don't seem to use it anywhere later in your
code.
my @a1_s=split/\s+/,$a1[$i];

# you enclosed the variable in '' which means perl won't evaluate it
# changed the query so that perl can evaluate the variable
my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; 
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo
Sent: Friday, February 02, 2007 7:31 AM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;

Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get
the ids of a list of genes using the module Bio::DB::Query:GenBank. I
have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives
me the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards, Luba Pardo _______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Feb  2 11:37:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 10:37:49 -0600
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>

I was going to suggest maybe allowing one to switch out XML handlers/ 
writers based on the style (ala XML::SAX), but I see that chadoxml  
currently uses XML::Writer and there is no next_seq() implemented.   
Oh well...

chris

On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:

> Hi Hilmar,
>
> That is a good idea; when I started down this road, it felt like there
> would only be a few things that I might want to allow to be different,
> but I think you are right that having one standard implementation that
> can be subclassed for legacy systems is a good thing.
>
> Scott
>
>
> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>
>>> The second main change was to introduce a -flybase_compat argument
>>> when
>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>> cvterms
>>> (that are compatable with flybase) will be used, but now the default
>>> will be to use current standards:
>>
>> Just my $0.02 ... obviously, Flybase may be the only organization
>> that uses an 'old style' or any other way not compliant with 'current
>> standards' (presumably SO), but if it's not the only one then this
>> approach won't scale.
>>
>> Also, an argument -flybase_compat suggests to the unsuspecting that
>> this is an endorsed flavor of the standard and fine to use for
>> everyone else too.
>>
>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>> compliant with the standard as we all want it, keep it free from
>> litter caused by usage of old versions of SO, and create a second
>> module fb-chadoxml.pm that inherits from the first and merely
>> overrides a few things so that it works for Flybase. This way, other
>> organizations with similar needs can follow the path and create their
>> own xyz-chadoxml.pm, rather than having to muck around in the
>> chadoxml.pm that comes with the distribution.
>>
>> I'm not sure I fully grasp the underlying issue, so I may not make
>> much sense here. Apologies if that's the case ...
>>
>> 	-hilmar
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Fri Feb  2 11:45:30 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 11:45:30 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>

There must be at least a stub for next_seq(). It may throw a not- 
implemented exception, but it should not just be absent.

	-hilmar

On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:

> I was going to suggest maybe allowing one to switch out XML  
> handlers/writers based on the style (ala XML::SAX), but I see that  
> chadoxml currently uses XML::Writer and there is no next_seq()  
> implemented.  Oh well...
>
> chris
>
> On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
>
>> Hi Hilmar,
>>
>> That is a good idea; when I started down this road, it felt like  
>> there
>> would only be a few things that I might want to allow to be  
>> different,
>> but I think you are right that having one standard implementation  
>> that
>> can be subclassed for legacy systems is a good thing.
>>
>> Scott
>>
>>
>> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>>
>>>> The second main change was to introduce a -flybase_compat argument
>>>> when
>>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>>> cvterms
>>>> (that are compatable with flybase) will be used, but now the  
>>>> default
>>>> will be to use current standards:
>>>
>>> Just my $0.02 ... obviously, Flybase may be the only organization
>>> that uses an 'old style' or any other way not compliant with  
>>> 'current
>>> standards' (presumably SO), but if it's not the only one then this
>>> approach won't scale.
>>>
>>> Also, an argument -flybase_compat suggests to the unsuspecting that
>>> this is an endorsed flavor of the standard and fine to use for
>>> everyone else too.
>>>
>>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>>> compliant with the standard as we all want it, keep it free from
>>> litter caused by usage of old versions of SO, and create a second
>>> module fb-chadoxml.pm that inherits from the first and merely
>>> overrides a few things so that it works for Flybase. This way, other
>>> organizations with similar needs can follow the path and create  
>>> their
>>> own xyz-chadoxml.pm, rather than having to muck around in the
>>> chadoxml.pm that comes with the distribution.
>>>
>>> I'm not sure I fully grasp the underlying issue, so I may not make
>>> much sense here. Apologies if that's the case ...
>>>
>>> 	-hilmar
>> -- 
>> --------------------------------------------------------------------- 
>> ---
>> Scott Cain, Ph. D.                                    
>> cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 12:02:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 12:02:32 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
	<3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
Message-ID: <1170435752.2706.676.camel@localhost.localdomain>

Ah, I'll go ahead and add one, though it will just throw an exception
because this is a write-only adapter.

Scott


On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote:
> There must be at least a stub for next_seq(). It may throw a not- 
> implemented exception, but it should not just be absent.
> 
> 	-hilmar
> 
> On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:
> 
> > I was going to suggest maybe allowing one to switch out XML  
> > handlers/writers based on the style (ala XML::SAX), but I see that  
> > chadoxml currently uses XML::Writer and there is no next_seq()  
> > implemented.  Oh well...
> >
> > chris
> >
> > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
> >
> >> Hi Hilmar,
> >>
> >> That is a good idea; when I started down this road, it felt like  
> >> there
> >> would only be a few things that I might want to allow to be  
> >> different,
> >> but I think you are right that having one standard implementation  
> >> that
> >> can be subclassed for legacy systems is a good thing.
> >>
> >> Scott
> >>
> >>
> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> >>>
> >>>> The second main change was to introduce a -flybase_compat argument
> >>>> when
> >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
> >>>> cvterms
> >>>> (that are compatable with flybase) will be used, but now the  
> >>>> default
> >>>> will be to use current standards:
> >>>
> >>> Just my $0.02 ... obviously, Flybase may be the only organization
> >>> that uses an 'old style' or any other way not compliant with  
> >>> 'current
> >>> standards' (presumably SO), but if it's not the only one then this
> >>> approach won't scale.
> >>>
> >>> Also, an argument -flybase_compat suggests to the unsuspecting that
> >>> this is an endorsed flavor of the standard and fine to use for
> >>> everyone else too.
> >>>
> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
> >>> compliant with the standard as we all want it, keep it free from
> >>> litter caused by usage of old versions of SO, and create a second
> >>> module fb-chadoxml.pm that inherits from the first and merely
> >>> overrides a few things so that it works for Flybase. This way, other
> >>> organizations with similar needs can follow the path and create  
> >>> their
> >>> own xyz-chadoxml.pm, rather than having to muck around in the
> >>> chadoxml.pm that comes with the distribution.
> >>>
> >>> I'm not sure I fully grasp the underlying issue, so I may not make
> >>> much sense here. Apologies if that's the case ...
> >>>
> >>> 	-hilmar
> >> -- 
> >> --------------------------------------------------------------------- 
> >> ---
> >> Scott Cain, Ph. D.                                    
> >> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)                      
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/9acaa3c3/attachment-0002.bin>

From peili at morgan.harvard.edu  Fri Feb  2 10:56:56 2007
From: peili at morgan.harvard.edu (Peili Zhang)
Date: Fri, 02 Feb 2007 10:56:56 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <C1E8C2A0.C967%bosborne11@verizon.net>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
Message-ID: <1170431816.6583.47.camel@jacks>

i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
because i wrote it for fb's data loading task. no need to worry about
flybase compatibility in making the module generic. in fact, at flybase,
i tweak the module frequently to make it work for different scenarios.

cheers,
peili
 
On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> Hilmar,
> 
> I second your motion, good idea. Let's keep the standard module nice and
> clean.
> 
> Brian O.
> 
> 
> On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> 
> > and create a second
> > module fb-chadoxml.pm that inherits from the first and merely
> > overrides a few things so that it works for Flybase
> 
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> 


From cain.cshl at gmail.com  Fri Feb  2 13:05:47 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 13:05:47 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170431816.6583.47.camel@jacks>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
	<1170431816.6583.47.camel@jacks>
Message-ID: <1170439549.2706.683.camel@localhost.localdomain>

Hi Peili,

A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is
fairly simple.  My suggestion is that when you make tweaks for different
scenarios, that you turn the things you are tweaking into methods in
BSIO::chadoxml and then override them in flybase_chadoxml (and commit at
least the chadoxml module) to make it more flexible when other people
have similar scenarios.

Scott


On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote:
> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
> because i wrote it for fb's data loading task. no need to worry about
> flybase compatibility in making the module generic. in fact, at flybase,
> i tweak the module frequently to make it work for different scenarios.
> 
> cheers,
> peili
>  
> On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> > Hilmar,
> > 
> > I second your motion, good idea. Let's keep the standard module nice and
> > clean.
> > 
> > Brian O.
> > 
> > 
> > On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> > 
> > > and create a second
> > > module fb-chadoxml.pm that inherits from the first and merely
> > > overrides a few things so that it works for Flybase
> > 
> > 
> > 
> > -------------------------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services, security?
> > Get stuff done quickly with pre-integrated technology to make your job easier.
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmod-schema at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/a6d23204/attachment-0002.bin>

From cjfields at uiuc.edu  Fri Feb  2 15:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 14:33:46 -0600
Subject: [Bioperl-l] seqDiff
In-Reply-To: <101051013116870@lycos-europe.com>
References: <101051013116870@lycos-europe.com>
Message-ID: <C752CE9D-61A7-4DF2-958E-7162723D0BA9@uiuc.edu>

Judging by the code you'll have to recreate the SeqDiff while  
iterating through various alleles; there is no method to remove  
particular variants or purge them (at least I couldn't find one).

I also noticed SeqDiff doesn't support deletions/insertions either;  
using a null allele (no seq) or leaving out either the mutant or  
original allele leads to errors.  I'll look into the latter, and I  
may try to add a method to at least purge variants and reset dna_mut().

chris

On Feb 2, 2007, at 4:06 AM, marian thieme wrote:

> HI,
>
> is there a way to put out all mutated sequences of a seqdiff object ?
> Suppose I add some variants via:
>
> $dnamut->add_Allele($a2);
> $dnamut->add_Allele($a3);
> $seqDiff->add_Variant($dnamut);
>
> and afterwards want to access the alternative sequences via
> $seqDiff->dna_mut()
>
> which allele is choosen when using dna_mut(), respective can I  
> control to access the first or the second alternate sequence ?
> If yes, how can I do this ?
>
> Regards,
> Marian
>
> Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme  
> Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die  
> Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf  
> www.spain.info
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From MEC at stowers-institute.org  Fri Feb  2 16:47:08 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 2 Feb 2007 15:47:08 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and
	annotations
Message-ID: <CED81D34E37D5043A1211565277A51E50768EDB3@exchkc02.stowers-institute.org>

Lincoln,
 
I don't think that adding this directive is a good idea after all
either.
 
But, I see that you remap the ID= to a load_id attribute which is
preserved in the Bio::DB::SeqFeatureStore database.
 
And then it gets squelched during GFF production by
NormalizedFeature::format_attributes.
 
However, if ID is prone to clashes, then certainly simply renaming the
attribute to be load_id does not preclude clashes from happening, and
only courts disaster.  Don't you think?
 
I'm a little blurry on the GFF3Loader, but it looks like you're using
load_id to facilitate loading parent/child features out of order.  Is
that right?  If so, I suggest you delete all load_id features
immediately after performing a load.  It has not further use.
 
Or, you might consider instead of `round-trip-ids` directive, rather,
give the GFF3Loader  an IDAttribute option which would allow the use of
the loader to preserve the ID values, but to use a named
 
In my case, munging flybase gff,  I would then use it like this:
 
bp_seqfeature_load.PLS --fast --IDAttribute flybaseID
 
which would preserve the ID values in the database but under the
FlybaseID attribute for features so loaded.
 
---------------------------------------------
 
On a related topic:
I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature

_create_subfeatures : ensure that subfeatures get the `source` of their
parent

While doing this I wonder: what is the -class that subfeatures are
getting from their parent...??? I left it in place. Please advise! Fix
my thinking....

----------------------------------------------

Further, I observe that Bio::Graphics::FeatureBase::new handles the
-segments option is to call add_segment.  So, when I create a new
Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the
-segments option gets handled by Bio::Graphics::FeatureBase::new, which,
as mentioned, calls add_segment. The surprising thing to me when thrying
to trace through the class modules and understand what is going on is
that what gets run at this point is not
Bio::Graphics::FeatureBase::add_segment, but rather
Bio::DB::SeqFeature::add_segment, whose semantics is different in at
least one regard, namely, that it does not set the start and stop of the
parent feature from the min and max of the segments.

I have committed a patch to Bio::Graphics::FeatureBase with a comment to
this effect, and have also patched it's add_segment method to copy the
parent's source into the segment.

I hope my commits and suggestions further the cause.  Let me know if
not!
 
-- Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Tuesday, January 30, 2007 4:46 PM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature treamtent of tags and
annotations
	
	
	I've fixed the first issue in CVS. Sorry for the inconsistency.
add_tag_value(), delete_tag_value() and get_Annotations() now all work
as expected.
	
	The problem with the ID column is that it is supposed to be
LOCAL to the GFF3 file and is not intended to be stored in the database.
In contrast, Name can survive roundtripping. Perhaps the thing to do is
to add a flag to the GFF3 file that turns on ID round-tripping, e.g.
	
	##round-trip-ids: 1
	
	If you like this idea, I can implement it.
	
	Lincoln
	
	
	On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln,
		 
		Thanks for your suggestions on approach to my problems
augmenting Flybase annotation.  I am trying to follow them and finding
the following oddities
		 
		The first issue relates to the intermix of 'annotations'
and 'tag values'.  I find that Bio::DB::SeqFeature implements some of
the 'tag' methods and some of the 'Annotation' methods.  Here is a perl
one-liner that shows values stored using add_tag_value are not retreived
with get_tag_values, but rather with get_Annotations.
		 
		> perl -MBio::DB::SeqFeature -e 'my
$f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print
"get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" .
$f->get_Annotations("x");'
		 
		whose output is:
		get_tag_values: 
		get_Annotations:    666
		 
		Tracing this shows me that this results from the fact
that:
		 
		
		Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase
(via Bio::DB::SeqFeature::NormalizedFeature) which does not support
-tags in ->new but rather -attributes, viz:
		 
		
		  -attributes   a hashref of tag value attributes, in
which the key is the tag
		                  and the value is an array reference of
values
		 
		
		And though Bio::Graphics::FeatureBase purports to
implement Bio::SeqFeatureI, it only partially implements the  'tag'
methods (now deprecated and relegated to Bio::AnnotatableI).  In
particular, the '*' methods Bio::SeqFeatureI are not implemented in
Bio::Graphics::FeatureBase 

		  has_tag
		*  add_tag_value
		  get_tag_values
		  get_all_tags
		*  remove_tag
		  get_tagset_values
		  get_Annotations

		As a result, add_tag_value and remove_tag are inherited
from different modules whose understanding of tags is not the same!

		This one-liner :

		>perl -MClass::ISA -MClass::Inspector
-MBio::DB::SeqFeature -e 'my @c =
Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn
qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep
{Class::Inspector->function_exists($_, $fn)} @c)}'

		confirms that they are defined in different packages,
namely:

		add_tag_value: Bio::AnnotatableI 
		get_tag_values: Bio::Graphics::FeatureBase
Bio::AnnotatableI

		
		Proposed solution...  hmmmm ..... I dunno.... maybe the
following patch to Bio::Graphics::FeatureBase->add_tag_value :
		 
		sub add_tag_value {
		  my ($self,$tag, at vals) = @_;
		  push @{$self->{attributes}{$tag}}, @vals;
		}
		
		
		It fixes my use case for now but I'm still concerned and
confused about this variety of methods.  
		 
		Suggestions?
		 

------------------------------------------------------------------------
-

		Also, I think that any "ID" in column 9 of GFF3 float
file should be preserved through a round-trip through a
Bio::DB::SeqFeature store, but this is not yet possible since any ID
attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to
locally patch GFF3Loader::handle_feature method to add the following:

		  # mec at stowers-institute.org
<mailto:mec at stowers-institute.org>  , wondering why not all attributes
are
		  # carried forward, adds ID tag in particular service
of
		  # round-tripping ID, which, though present in database
as load_id
		  # attribute, was getting lost as itself
		  $unreserved->{ID}= $reserved->{ID}     if exists
$reserved->{ID}; 

		Poised to patch.... what d'you think?

		Malcolm Cook
		Stowers Institute for Medical Research - Kansas City,
Missouri
		  

________________________________

			From: lincoln.stein at gmail.com [mailto:
lincoln.stein at gmail.com <mailto:lincoln.stein at gmail.com> ] On Behalf Of
Lincoln Stein
			Sent: Tuesday, December 19, 2006 3:58 PM
			To: Cook, Malcolm
			Cc: bioperl list; lstein at cshl.org
			Subject: Re: bp_seqfeature_load /
Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase
annotation
			
			
			Hi Malcom,
			
			Your second guess was right. The use case of
augmenting an existing gene with additional splice forms isn't provided
for. You can get the functionality by making direct calls to
Bio::DB::SeqFeature::Store methods:
			
			my @genes =
$db->get_features_by_name('FBgn0017545');
			@genes == 1 or die "Didn't get exactly one
gene";
			my $parent = $genes[0];
			
			my $parent = $genes[0];
			my $chr    = $parent->seq_id;
			my $start  = $parent->start;
			my $end    = $parent->end;
			my $strand = $parent->strand;
			
			my $new_splice_form =
$db->new_feature(-primary_tag => 'mRNA',
			                       -source      => 'added',
			                       -seq_id   => '4r',
			                       -strand   => $strand,
			                       -start    => $start+10,
			                       -end      => $end,
			                       );
			$parent->add_SeqFeature($new_splice_form);
			
			for my $pos
([$start+10,$start+100],[$start+200,$end]) {
			  my ($e_start,$e_end) = @$pos;
			  my $exon =
Bio::DB::SeqFeature->new(-primary_tag => 'exon',
			                                      -store
=> $db,
			                      -seq_id      => '4r',
			                      -strand     => $strand,
			                      -start       => $e_start,
			                      -end         => $e_end);
			  $new_splice_form->add_SeqFeature($exon);
			}
			
			I found a bug in updating the seqfeature
database when I wrote this script, so you'll have to get the latest
biperl live. I think you can use this to write a splice form updating
script.
			
			In order to support the idea of adding new
splice forms to an existing gene using the GFF3 format, I will have to
either modify the loader, or write a separate script (probably better to
do the latter). It shouldn't be hard if you'd like to give it a try.
			
			Lincoln
			
			
			On 12/19/06, Cook, Malcolm
<MEC at stowers-institute.org <mailto:MEC at stowers-institute.org>  > wrote: 

				Lincoln and fellow Bio::DB::SeqFeature
travelers,
				
				I find that using bp_seqfeature_load.PLS
to load subfeatures of genes
				already loaded using
bp_seqfeature_load.PLS fails with
				
				------------- EXCEPTION  ------------- 
				MSG: FBgn0017545 doesn't have a primary
id
				STACK
	
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load_fh 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
				STACK toplevel
	
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo

				ad.PLS:76
				
				Where FBgn0017545 is the ID of a gene
previously loaded.
				
				I am unsure how to remedy my situation
and welcome any advise on correct
				or improved approach to my problem.
				
				Here's some detail if it helps.  I am
developing a pipeline to design a 
				microarray probes capable of
distinguishing among splice variants in
				drosophila (using latest Flybase
dmel_r5.1 annotation).  So I
				
				1) load a filtered selection of Flybase
annotation using
				bp_seqfeature_load.  (for testing
purposes, I am using a single gene's 
				worth of annotation, FBgn0017545.gff,
attached).  This is done as
				follows:
				
				        > bp_seqfeature_load.PLS
--create FBgn0017545.gff
				
				2) analyze all the genes in the
database, and create GFF3 output each 
				feature of which has a 'Parent' that is
a previously loaded gene (i.e.
				FBgn0017545).  (These features represent
the unique introns, splice
				sites, and exonic design targets. Output
of this analysis,
				FBgn0017545_matd.gff, is also attached) 
				
				3) load these analysis results into the
same database, as follows:
				
				        > bp_seqfeature_load.PLS
FBgn0017545_matd.gff
				
				It is at this point that I get the above
error.
				
				However, I don't get any error and the
data loads fine if I load the two
				files together, as follows: 
				
				        > bp_seqfeature_load.PLS
--create <(cat FBgn0017545.gff
				FBgn0017545_matd.gff)
				
				So, I suspect that either I am
misunderstanding when/how to use
				bp_seqfeature_load.PLS or else this use
case has not yet arisen and must 
				be provided for somehow.
				
				I am running against bioperl-live
				
				Thanks for your thoughts and assistance,
				
				Malcolm Cook
				Database Applications Manager -
Bioinformatics
				Stowers Institute for Medical Research -
Kansas City, Missouri 
				
				
			-- 
			Lincoln D. Stein
			Cold Spring Harbor Laboratory
			1 Bungtown Road
			Cold Spring Harbor, NY 11724
			(516) 367-8380 (voice)
			(516) 367-8389 (fax)
			FOR URGENT MESSAGES & SCHEDULING, 
			PLEASE CONTACT MY ASSISTANT, 
			SANDRA MICHELSEN, AT michelse at cshl.edu
<mailto:michelse at cshl.edu>  


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From neha_bafs at yahoo.co.in  Mon Feb  5 12:59:03 2007
From: neha_bafs at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From jason at bioperl.org  Mon Feb  5 13:10:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 10:10:42 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org>

you want to write the TREE out not the TREE WRITER.

$treeout->write_tree($tree)

not
$treeout->write_tree($treeout);

On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:

> Hello everyone,
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
> /*------------------------------------------------------------*/
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
> use Bio::TreeIO;
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
> exit 0;
>
>
> /*------------------------------------------------------------*/
>
> Running the script through command line:
> Gives the following error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Questions:-
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 13:05:26 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$  ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank  you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From hlapp at duke.edu  Fri Feb  2 10:09:57 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:09:57 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>


On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:

> The second main change was to introduce a -flybase_compat argument  
> when
> initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> (that are compatable with flybase) will be used, but now the default
> will be to use current standards:

Just my $0.02 ... obviously, Flybase may be the only organization  
that uses an 'old style' or any other way not compliant with 'current  
standards' (presumably SO), but if it's not the only one then this  
approach won't scale.

Also, an argument -flybase_compat suggests to the unsuspecting that  
this is an endorsed flavor of the standard and fine to use for  
everyone else too.

If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
compliant with the standard as we all want it, keep it free from  
litter caused by usage of old versions of SO, and create a second  
module fb-chadoxml.pm that inherits from the first and merely  
overrides a few things so that it works for Flybase. This way, other  
organizations with similar needs can follow the path and create their  
own xyz-chadoxml.pm, rather than having to muck around in the  
chadoxml.pm that comes with the distribution.

I'm not sure I fully grasp the underlying issue, so I may not make  
much sense here. Apologies if that's the case ...

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From jason at bioperl.org  Mon Feb  5 14:43:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 11:43:09 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com>
References: <209988.63723.qm@web8715.mail.in.yahoo.com>
Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org>

please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your  
code so I don't know if you still have a typo.

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file => $filein, -format => $format);
my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");

while( my $t = $in->next_tree ) {
  $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

> Thank you very much for the reply.
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
> Please help me out with this script.
>
> Thank you.
> Regards,
> Neha
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
> $treeout->write_tree($tree)
>
> not
> $treeout->write_tree($treeout);
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
> Hello everyone,
>
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
> /*------------------------------------------------------------*/
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
> use Bio::TreeIO;
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
> exit 0;
>
>
>
>
> /*------------------------------------------------------------*/
>
>
> Running the script through command line:
> Gives the following error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
> Questions:-
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 14:58:08 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com>
Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com>


Hi,
Thank you for the code.
I tried it but I still get the same exception.

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus1.pl:18


Please find attached the perl file(nexus.pl).


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Please let me know if I am using the correct version.If not, please point me to the latest one.

Thank you.
Regards,
nnahar


Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo.  

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");


while( my $t = $in->next_tree ) { 
 $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

Thank you very much for the reply.


I fixed the code as per your suggestion,but now am getting a different error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


-------------  EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Please help me out with this script.


Thank you.
Regards,
Neha


Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE out not the TREE WRITER.


$treeout->write_tree($tree) 


not 
$treeout->write_tree($treeout);


On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:


Hello everyone,


I am trying  to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :


/*------------------------------------------------------------*/


$ cat nexus.pl
#!/usr/bin/perl -w


use Bio::TreeIO;


($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }


exit 0;


/*------------------------------------------------------------*/


Running the script through command line:
Gives the following error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm


Questions:-


1. Please let me know if I am using the correct version.
If not, please point me to the latest one.


2. Provided that the version I am using is the right one, please let me know what is wrong with the script.


Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"


---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441


http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
     

---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
 

 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441

http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/
 

-Neha Nahar
  " Work  for cause and not for applause, live to express and not to impress !"         

---------------------------------
  Here?s a new way to find what you're looking for - Yahoo! Answers 


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nexus.pl
Type: application/x-perl
Size: 811 bytes
Desc: 1389215665-nexus.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070205/c6453dcf/attachment-0002.bin>

From jason at bioperl.org  Mon Feb  5 17:15:52 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 14:15:52 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com>
References: <36024.1212.qm@web8405.mail.in.yahoo.com>
Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>

Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> <nexus.pl>


From lzhtom at hotmail.com  Mon Feb  5 22:31:56 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Tue, 06 Feb 2007 03:31:56 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>
Message-ID: <BAY110-F28F9C9145AC24F2D0E0D34C79F0@phx.gbl>

Thanks a lot!

After checking out the script bp_index, I changed the syntax to:
 my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE');
$inx->make_index("test.fasta");


Now I have a index file test.fasta.idx in my current directory. And I can 
use it in my later script
by saying 
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");

So now everything is OK. But I don't understand why I have to use that 
syntax. And why the syntax provided in the document didn't work?


>From: Jason Stajich <jason at bioperl.org>
>To: zhihua li <lzhtom at hotmail.com>
>CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com
>Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
>Date: Thu, 1 Feb 2007 22:24:44 -0800
>
>I don't think BIOPERL_INDEX does anything in the module so that
>documentation is not quite right.  the env variable is used in the
>scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job
>went bad somewhere.
>
>you need to specify the full path you want with -filename - you can
>just prepen the BIOPERL_INDEX to the filename like.
>-filename => $ENV{BIOPERL_INDEX}."/$index"
>
>-jason
>On Feb 1, 2007, at 7:27 PM, zhihua li wrote:
>
> > Sorry guys, the former empty mail was sent out by mistake.
> >
> > I'm using Bio::index::Fasta to index a file containing lots of
> > sequences in fasta format. All is fine except one thing.
> >
> > According to the bioperl tutorial and the documents, the following
> > code will make a indexed file:
> >
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
> >                                     -write_flag => 1);
> >    $inx->make_index("test.fasta");
> >
> > And in another script I can access the indexed file by sayinig
> >
> > $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> > my $seq=$inx->fetch("ent1001");        #fetch the sequence named
> > ent1001
> >
> > However, after running the first script, I cannot find a new file
> > test.fasta.idx in my current directory. And not surprisingly, when
> > I ran the second script, perl told me it couldn't find
> > "test.fasta.idx".
> >
> > What's going on here?
> >
> > Thanks a lot!
> >
> > _________________________________________________________________
> > ???????????????????????????????????????? MSN Messenger:  http://
> > messenger.msn.com/cn
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>--
>Jason Stajich
>Miller Research Fellow
>University of California, Berkeley
>lab: 510.642.8441
>http://pmb.berkeley.edu/~taylor/people/js.html
>http://fungalgenomes.org/
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From johnston at biochem.ucl.ac.uk  Tue Feb  6 06:52:08 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
Message-ID: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>

Hello,

I've just joined the list - I'm a Bioinformatics PhD student at Essex
University doing transcriptomics-related things. Mainly microarray
analysis and more recently looking at RNA structure prediction.

I was thinking about having a go at writing a bioperl-run wrapper around
some of the structure prediction stuff, but according to the wiki this is
being done already (at least for the Vienna tools). I spoke to Albert
Vilella at the EBI the other day and he said Chris Fields was the man to
speak to. So could he (or anyone) let me know what the current state of
RNA structure prediction tools in bioperl is?

Cheers,
Cass xx


From marian.thieme at lycos.de  Tue Feb  6 08:52:10 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Tue, 06 Feb 2007 14:52:10 +0100
Subject: [Bioperl-l] dbSNP
Message-ID: <45C8880A.7030702@lycos.de>

Hello all,

I looked for a method/class/function/script in the docuementation which
provides the opportunity to generate a snp assay suited to submit to
dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/
http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html)
I didnt find those code, but I recognized that there is at least a xml
parser to read dbSNP reports.

Does anybody know if there is also an output class to generate dbSNP
reports ? I could imagine, that at least the snp assay section is worth
to be implemented.

This example is given by ncbi:


TYPE:SNPASSAY
HANDLE:WI
BATCH: 1.98
MOLTYPE:Genomic
METHOD:RESEQ
SYN NAMES:WI-SNP,DnaId,MapDna
COMMENT:
Here is where some public comment that applies to the entire
batch of SNPS could be put.
PRIVATE:
Here is where a note to NCBI regarding processing that would
not be seen by the outside, could be put.
Note that these are is not exactly real SNPs, as
the data were modified.
||
SNP:WI|WIAF-1234567
SYNONYM:EST4291092,EST8291092,EST7291092
ACCESSION:H30533
LENGTH:101
5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG
OBSERVED:C/T
3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA
||
SNP:WI|WIAF-1722
SYNONYM:STS-T17494,STS-T17494,STS-T17494
ACCESSION:T17494
LENGTH:269
5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT
5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC
OBSERVED:A/T
3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA
3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT
    GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT
    CCCGGGCGTAGGCATTGCTCAAGTACCGAT
||


Regards,
Marian

P.S. this is not in contradiction to my first request about the brackets 
notation. We need both formats.


From cjfields at uiuc.edu  Tue Feb  6 11:45:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Feb 2007 10:45:36 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
Message-ID: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>

On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote:

> Hello,
>
> I've just joined the list - I'm a Bioinformatics PhD student at Essex
> University doing transcriptomics-related things. Mainly microarray
> analysis and more recently looking at RNA structure prediction.
>
> I was thinking about having a go at writing a bioperl-run wrapper  
> around
> some of the structure prediction stuff, but according to the wiki  
> this is
> being done already (at least for the Vienna tools). I spoke to Albert
> Vilella at the EBI the other day and he said Chris Fields was the  
> man to
> speak to. So could he (or anyone) let me know what the current  
> state of
> RNA structure prediction tools in bioperl is?
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Actually, the only RNA tool wrappers I have made are ones for ERPIN,  
RNAMotif, and Infernal (the only one in bioperl-run CVS at this time  
is RNAMotif).  I am planning on writing up wrappers for Vienna,  
UNAFold, and a few others but haven't really started in.  Here's  
where I'm at right now...

I am writing up a new set of AnnotationI classes which positionally  
describe data (Meta) which I hope will help deal with this stuff.   
These would be similar in nature to Heikki's Bio::Seq::Meta classes:

http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html

I would use a regular Bio::SeqI and store the structural data and  
anything else (such as energy calculations, etc) as Annotation  
objects in an AnnotationCollection, and then write up a series of  
SeqIO modules to get data into/out of the designated structure  
formats, like UNAfold ct, RNAML, and so on.  Each sequence would then  
be capable of holding more than one structural Annotation (i.e. could  
represent different folding pathways, alternative RNA folds, and so on).

At this point I represent the data as an array of hashes where $array 
[0] is nt 1 and the hash keys indicate the type of interaction, base  
interacted with, etc.  The text representation would be as simple  
Eddy WUSS (Rfam-like) format by default, which is capable of  
representing some complex data (pseudoknots, for instance), is  
compact, and is documented (via the Infernal manual).  Tags will  
probably switch to more ontologically relevant terms (probably from  
RNAML or RNA Ontology), but in general it is something like this:

[
  {'interaction' => 'WC',
    'base'  => 24},
  {'interaction' => 'WC',
    'base'  => 23},
  {'interaction' => 'SS'},
...
]

In this implementation every seq position would have some kind of  
interaction designation, though that's open for debate as it could  
just be simple text or undef for single-stranded regions.

This is also scalable based on complexity of the data: if one wanted  
to add tert/quaternary interactions, location, base modifications,  
remote sequence interactions, etc., extra key/value pairs could be  
used.  Comversely, if one only wanted sec structure (for drawing RNA  
structures, for example), then only that data would be parsed.

If you (or anyone listening) have any suggestions I would greatly  
appreciate them.

chris


From johnsonm at gmail.com  Tue Feb  6 18:53:49 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 6 Feb 2007 17:53:49 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
Message-ID: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>

Okay, I need to get something going for a project I'm working on.  Options:

1) Stick it all in one module:  This can get a bit ugly, as Glimmer, as
opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in
the prediction report.  You can pick up on some unique things in the output
file, but you don't know what you've got until you're actually parsing it.
Unless you require a format argument up front, then you can split the
parsing code up into different functions.
2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3.
With or without an abstract dispatch front end.

I suppose at this point, after getting my hands dirty, I'd prefer 1), with
an explicit -format => Glimmer2/3/M/HMM arg required in the constructor.
Though I'm not opposed to 2) if that is what it takes to get it into
Bioperl.

If we can achieve some sort of consensus without too much bloodshed, I'll
shoot y'all some patches and we can consider this issue checked off the
list.

On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     I think it's going to be at least two modules, one for the
> prokaryotic stuff and one for the eukaryotic.  And really, the
> prokaryotic stuff is different enough to warrant two modules. So three
> different parsers.  Could do it in one, but it would be ugly and
> nasty.  However, this does not preclude three parsers and one abstract
> interface, which is your excellent suggestion.
>     Oh, and excuse me, but I have a bit of a rant here, after dealing
> with parsers and pipelines for the last few months.  Parsers should
> not load the whole input file into RAM to parse it.  And Pipelines
> using the parsers (Ensembl / biopipe) should not stuff the whole
> result set from the parser into a single array.  When you're trying to
> annotate assemblies, it sucks to have to split up contigs/supercontigs
> because the whole result set won't fit into RAM on a 12 gig blade.
> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
> tiny (by comparison to vertebrates).  There, sorry, been saving up
> that frustration for a while.  No offense meant, hope I didn't tick
> anybody off.  8)
>     Torsten:  You sound like you know what you're doing with respect
> to Bioperl more than I do, and I know I don't have CVS access, so I'll
> defer to you.  I'd be happy to help out, though.
>
>
> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> >
> > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
> >
> > > I'm not sure whether to
> > >
> > > 1. parse them all under the same module, perhaps with a
> > > -format=>'glimmerXXX' parameter
> > >
> > > 2. create a single new module  Glimmer2 and Glimmer3
> > >
> > > 3. create two new modules, one for Glimmer2 and one for Glimmer3,
> > > given
> > > they are different outputs both in syntax and number of output files
> > >
> > > Any advice from Bioperl 'old timers' appreciated ;-)
> > >
> >
> > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
> > example for how this can work.
> >
> > If this would amount to basically 4 modules stringed together into
> > one file (because the parsing code can't share much if anything
> > between the flavors), it'd still be advantageous to have a single
> > frontend module that would then dispatch.
> >
> >         -hilmar
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>


From jason at bioperl.org  Tue Feb  6 19:33:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Feb 2007 16:33:11 -0800
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>

I definitely vote for 1) - worst case you have 4 separate methods if  
there is no good way to condense the parsing for each format and  
require the user to specify the format.

I have no problem with requiring user to specify what program she  
used - if we can be fancy and guess the format later (i.e. guess  
format in SeqIO) -then that's icing.

-jason
On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote:

> Okay, I need to get something going for a project I'm working on.   
> Options:
>
> 1) Stick it all in one module:  This can get a bit ugly, as  
> Glimmer, as
> opposed to GlimmerM and GlimmerHMM, does not explicitly identify  
> itself in
> the prediction report.  You can pick up on some unique things in  
> the output
> file, but you don't know what you've got until you're actually  
> parsing it.
> Unless you require a format argument up front, then you can split the
> parsing code up into different functions.
> 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ 
> Glimmer3.
> With or without an abstract dispatch front end.
>
> I suppose at this point, after getting my hands dirty, I'd prefer  
> 1), with
> an explicit -format => Glimmer2/3/M/HMM arg required in the  
> constructor.
> Though I'm not opposed to 2) if that is what it takes to get it into
> Bioperl.
>
> If we can achieve some sort of consensus without too much  
> bloodshed, I'll
> shoot y'all some patches and we can consider this issue checked off  
> the
> list.
>
> On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>>
>>     I think it's going to be at least two modules, one for the
>> prokaryotic stuff and one for the eukaryotic.  And really, the
>> prokaryotic stuff is different enough to warrant two modules. So  
>> three
>> different parsers.  Could do it in one, but it would be ugly and
>> nasty.  However, this does not preclude three parsers and one  
>> abstract
>> interface, which is your excellent suggestion.
>>     Oh, and excuse me, but I have a bit of a rant here, after dealing
>> with parsers and pipelines for the last few months.  Parsers should
>> not load the whole input file into RAM to parse it.  And Pipelines
>> using the parsers (Ensembl / biopipe) should not stuff the whole
>> result set from the parser into a single array.  When you're  
>> trying to
>> annotate assemblies, it sucks to have to split up contigs/ 
>> supercontigs
>> because the whole result set won't fit into RAM on a 12 gig blade.
>> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
>> tiny (by comparison to vertebrates).  There, sorry, been saving up
>> that frustration for a while.  No offense meant, hope I didn't tick
>> anybody off.  8)
>>     Torsten:  You sound like you know what you're doing with respect
>> to Bioperl more than I do, and I know I don't have CVS access, so  
>> I'll
>> defer to you.  I'd be happy to help out, though.
>>
>>
>> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>
>>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
>>>
>>>> I'm not sure whether to
>>>>
>>>> 1. parse them all under the same module, perhaps with a
>>>> -format=>'glimmerXXX' parameter
>>>>
>>>> 2. create a single new module  Glimmer2 and Glimmer3
>>>>
>>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3,
>>>> given
>>>> they are different outputs both in syntax and number of output  
>>>> files
>>>>
>>>> Any advice from Bioperl 'old timers' appreciated ;-)
>>>>
>>>
>>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
>>> example for how this can work.
>>>
>>> If this would amount to basically 4 modules stringed together into
>>> one file (because the parsing code can't share much if anything
>>> between the flavors), it'd still be advantageous to have a single
>>> frontend module that would then dispatch.
>>>
>>>         -hilmar
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From torsten.seemann at infotech.monash.edu.au  Tue Feb  6 21:36:54 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 7 Feb 2007 13:36:54 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <a79f6a4b0702061836l7e63933bs3f065b773054c9c4@mail.gmail.com>

> I definitely vote for 1) - worst case you have 4 separate methods if
> there is no good way to condense the parsing for each format and
> require the user to specify the format.

And make the defaut -format to be what is currently parses, ie.
GlimmerM/GlimmerHMM

> I have no problem with requiring user to specify what program she
> used - if we can be fancy and guess the format later (i.e. guess
> format in SeqIO) -then that's icing.

Agreed.

>> Okay, I need to get something going for a project I'm working on.

I would normally try to help but I am so swamped with work-work at the
moment. Just a reminder that last year I added examples of the
different Glimmer outputs to the CVS repository:

./t/data/Glimmer3.predict
./t/data/Glimmer3.detail
./t/data/GlimmerHMM.out
./t/data/Glimmer2.out
./t/data/GlimmerM.out
./t/data/glimmer.out (this was the original one)

Thanks for taking this on!

--Torsten


From mitch_skinner at berkeley.edu  Tue Feb  6 23:37:35 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Tue, 06 Feb 2007 20:37:35 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
Message-ID: <45C9578F.2060802@berkeley.edu>

Hello,

I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), 
where we're pre-rendering entire chromosomes by breaking them up into 
tiles.  One of the problems we have is that it takes a long time to 
render all those tiles.  One of the things that's slowing the process 
down (and using lots of RAM) is rendering the gridlines, and it would 
make things a lot easier (and faster) for us if we could assume that the 
gridlines were the same for each tile.  Since we're only rendering at a 
particular set of zoom levels (that we have control over), I think this 
is a reasonable assumption.

Given the right set of zoom levels, the assumption works almost all the 
time, except for one specific case.  It has to do with the way draw_grid 
and map_pt in Bio::Graphics::Panel work for the very first gridline.

Here's how draw_grid (in CVS HEAD) computes the first gridline:

    my $first_tick = $minor * int($self->start/$minor);

$first_tick, $minor and $self->start are in base-pair space, which is 
1-based.  However, if ($self->start < $minor) then $first_tick is 0.  
This might not be a problem, except that $first_tick is translated into 
pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here 
are the relevant lines in map_pt:

    my $val = $flip 
      ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
      : int (0.5 + ($_-$offset-1) * $scale);

This style of rounding only works for positive numbers; rounding 0.6 by 
doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing 
int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0, 
10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates 
false, and pad left is 0) they're drawn at pixels 0, 9, and 19.

I think that there should be gridlines at pixels 0, 10, and 20.  The 
fact that currently the first interval is 9 pixels and the second is 10 
pixels is breaking my hopeful assumption about the gridlines.

AFAICT my problems are solved if we make two changes:
change the above line from draw_grid to this:
    my $first_tick = 1 + $minor * int(($start - 1)/$minor);
and change the lines from map_pt to this:

    my $val = $flip 
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

Does this make sense?  If people agree that these changes are right then 
I can also produce a proper patch if y'all would prefer that.

Regards,
Mitch


From lstein at cshl.edu  Wed Feb  7 07:17:22 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:17:22 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>

Hi Mitch,

Zero is not a forbidden coordinate, since gbrowse also works on genetic maps
which have negative and floating point coordinates. You've simply picked up
a boundary case where the rounding isn't working properly. I will fix this
now.

Lincoln


On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed Feb  7 07:18:40 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:18:40 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>

However, I'm also very interested in why grid-drawing takes so long. When
I've profiled drawing, neither grid drawing nor map_pt() consume any
significant amount of time.

Lincoln

On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From johnsonm at gmail.com  Wed Feb  7 11:50:05 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 7 Feb 2007 10:50:05 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>

    Well, each format has some unique features.  If the user declines to
specify the format, I can figure it out, but it will probably involve
scanning the input file twice.  I'll take a look.
    I can do all the parsing in one function, in fact I have, just to see
how nasty it would end up being.  I just can't stomach having the code that
tightly coupled and hard to read.  In the end it'll probably be three
functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
Glimmer3 aren't *that* different, either.

On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>
> I definitely vote for 1) - worst case you have 4 separate methods if there
> is no good way to condense the parsing for each format and require the user
> to specify the format.
>
> I have no problem with requiring user to specify what program she used -
> if we can be fancy and guess the format later (i.e. guess format in SeqIO)
> -then that's icing.
>
> -jason
>
>


From adsj at novozymes.com  Wed Feb  7 12:11:32 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Wed, 07 Feb 2007 18:11:32 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
Message-ID: <8764adoptn.fsf@topper.koldfront.dk>

  Hi.


I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
to features in Bio::Seq objects have stopped appearing when I output
them as EMBL or GenBank-files.

Below is a test-script that exercises the problem.

I guess I should be doing something else when adding qualifiers, now
with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
again of course works perfectly), but I can't deduce what from perldoc
Bio::SeqFeature::Generic - it still lists the add_tag_value method,
and calling it doesn't croak nor warn.

I have found some comments on this in the release notes of 1.5.0? on
the Bioperl wiki, but I must admit I wasn't able to extract what
methods I should be calling instead.

If someone could point me to the relevant documentation or tell me
what method to use instead, I would be happy as a clam.


  Best regards,

    Adam

== =
use Test::More tests=>2;

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqFeature::Generic;
use IO::String;
use Bio::SeqIO;

my $seq=Bio::Seq->new(
                      -seq=>'actgactgactg',
                     );

$seq->display_id('D27');
$seq->accession_number('DB:D27');

my $seq_feature=Bio::SeqFeature::Generic->new(
                                              -strand=>1,
                                              -primary=>'source',
                                             );
$seq_feature->set_attributes(-start=>2, -end=>8);
$seq_feature->add_tag_value(note=>'TEST');
$seq_feature->add_tag_value(db_xref=>'DB:D27');

$seq->add_SeqFeature($seq_feature);

my $raw='';
my $fh=IO::String->new($raw);
my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh);
$out->write_seq($seq);

ok($raw=~m!/note!, 'Qualifier note found');
ok($raw=~m!/db_xref!, 'Qualifier db_xref found');
== =


? <http://www.bioperl.org/wiki/Core_1.4.0_1.5.0_delta>

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com


From cjfields at uiuc.edu  Wed Feb  7 12:50:13 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 11:50:13 -0600
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk>
References: <8764adoptn.fsf@topper.koldfront.dk>
Message-ID: <C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>


On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote:

>   Hi.
>
>
> I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
> to features in Bio::Seq objects have stopped appearing when I output
> them as EMBL or GenBank-files.
>
> Below is a test-script that exercises the problem.
>
> I guess I should be doing something else when adding qualifiers, now
> with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
> again of course works perfectly), but I can't deduce what from perldoc
> Bio::SeqFeature::Generic - it still lists the add_tag_value method,
> and calling it doesn't croak nor warn.
>
> I have found some comments on this in the release notes of 1.5.0? on
> the Bioperl wiki, but I must admit I wasn't able to extract what
> methods I should be calling instead.
>
> If someone could point me to the relevant documentation or tell me
> what method to use instead, I would be happy as a clam.
>
>
>   Best regards,
>
>     Adam

...

This works for me using bioperl-live (Mac OS X):

ok 1 - Qualifier note found
ok 2 - Qualifier db_xref found

If I print the string I get:

ID   DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP.
XX
AC   DB:D27;
XX
XX
FH   Key             Location/Qualifiers
FH
FT   source          2..8
FT                   /db_xref="DB:D27"
FT                   /note="TEST"
XX
SQ   Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other;
      actgactgac  
tg                                                            12
//

GenBank also works:

LOCUS       D27                       12 bp    dna     linear   UNK
ACCESSION   DB:D27
FEATURES             Location/Qualifiers
      source          2..8
                      /db_xref="DB:D27"
                      /note="TEST"
BASE COUNT        3 a      3 c      3 g      3 t
ORIGIN
         1 actgactgac tg
//

If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
mixing the two versions (you can check by using 'perldoc -l  
Bio::Root::Root').

chris


From cjfields at uiuc.edu  Wed Feb  7 13:04:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 12:04:33 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu>


On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote:

>     Well, each format has some unique features.  If the user  
> declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just  
> to see
> how nasty it would end up being.  I just can't stomach having the  
> code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.

I don't see a problem with passing off the parse to a defined class  
method either right off or mid-parse.  I'm doing something like this  
with a revamped GenBank parser:

# declare local to module

my %GLIMMER_METHODS = (
     'GlimmerHMM' => '_parsehmm',
     'Glimmer'  => '_parsenormal',
     ....others if needed
     '_DEFAULT_' => '_parseabnormal'
);

...

Then either preparse part of file using _readline() to determine  
format, or use -format and bypass preparsing:

sub next_thingy {
    ...
    if (!$format) {
        while (my $line = $self->_readline()) {
            if ($line =~ m{(something)}) {
                $format = $1; $self->_pushback($line); last;
            }
        }
    }
    my $method =  (exists $GLIMMER_METHODS($format)) ?  
$GLIMMER_METHODS($format) :
                  ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one

    return $self->$method() # hand off parsing flow to to proper parser
    ...
}

# all parser variants would have this structure:

sub _parsehmm {
    my $self = shift;
    ... init stuff here
    while (my $line = $self->_readline()) {
        ... do stuff until END of next prediction/report
    }
    ... return data if any
}

chris

> On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> I definitely vote for 1) - worst case you have 4 separate methods  
>> if there
>> is no good way to condense the parsing for each format and require  
>> the user
>> to specify the format.
>>
>> I have no problem with requiring user to specify what program she  
>> used -
>> if we can be fancy and guess the format later (i.e. guess format  
>> in SeqIO)
>> -then that's icing.
>>
>> -jason
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Wed Feb  7 13:56:52 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
In-Reply-To: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>

Thanks Chris.

Storing the interaction data as a hash according to an ontology and using
an extended bracket notation as the string representation seems to make
sense, but I'm still unsure how this is supposed to be
attached to the Seq objects. You reckon it should be an AnnotationI?

I'm not sure I understand the distinction between annotations and
features. From the docs I got the impression that Features were like
annotation on bits of sequences and had a reference to the sequence to
which they belong, whereas annotations don't. If that's the case though,
why would RNA structure be an annotation rather than a feature? If not,
what is the distinction between them? Are the positional Annotation
subclasses you're developing intended to replace features? Have I got the
wrong end of the stick entirely?

Cheers,
Cass


On Tue, 6 Feb 2007, Chris Fields wrote:

> Actually, the only RNA tool wrappers I have made are ones for ERPIN,
> RNAMotif, and Infernal (the only one in bioperl-run CVS at this time
> is RNAMotif).  I am planning on writing up wrappers for Vienna,
> UNAFold, and a few others but haven't really started in.  Here's
> where I'm at right now...
>
> I am writing up a new set of AnnotationI classes which positionally
> describe data (Meta) which I hope will help deal with this stuff.
> These would be similar in nature to Heikki's Bio::Seq::Meta classes:
>
> http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html
>
> I would use a regular Bio::SeqI and store the structural data and
> anything else (such as energy calculations, etc) as Annotation
> objects in an AnnotationCollection, and then write up a series of
> SeqIO modules to get data into/out of the designated structure
> formats, like UNAfold ct, RNAML, and so on.  Each sequence would then
> be capable of holding more than one structural Annotation (i.e. could
> represent different folding pathways, alternative RNA folds, and so on).
>
> At this point I represent the data as an array of hashes where $array
> [0] is nt 1 and the hash keys indicate the type of interaction, base
> interacted with, etc.  The text representation would be as simple
> Eddy WUSS (Rfam-like) format by default, which is capable of
> representing some complex data (pseudoknots, for instance), is
> compact, and is documented (via the Infernal manual).  Tags will
> probably switch to more ontologically relevant terms (probably from
> RNAML or RNA Ontology), but in general it is something like this:
>
> [
>   {'interaction' => 'WC',
>     'base'  => 24},
>   {'interaction' => 'WC',
>     'base'  => 23},
>   {'interaction' => 'SS'},
> ...
> ]
>
> In this implementation every seq position would have some kind of
> interaction designation, though that's open for debate as it could
> just be simple text or undef for single-stranded regions.
>
> This is also scalable based on complexity of the data: if one wanted
> to add tert/quaternary interactions, location, base modifications,
> remote sequence interactions, etc., extra key/value pairs could be
> used.  Comversely, if one only wanted sec structure (for drawing RNA
> structures, for example), then only that data would be parsed.
>
> If you (or anyone listening) have any suggestions I would greatly
> appreciate them.
>
> chris
>
>


From cjfields at uiuc.edu  Wed Feb  7 17:15:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 16:15:44 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
	<Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu>


On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote:

> Thanks Chris.
>
> Storing the interaction data as a hash according to an ontology and  
> using
> an extended bracket notation as the string representation seems to  
> make
> sense, but I'm still unsure how this is supposed to be
> attached to the Seq objects. You reckon it should be an AnnotationI?

As long as it describes everything in the object and that there is a  
reasonable way of textually representing the data, I think you can  
attach anything as annotation.  A recent example is the addition of  
trees as annotation.  Also, Annotation can be used to describe  
alignments (such as the structure consensus string in Rfam  
alignments), or added to SeqFeatures.  The class just needs to  
implement AnnotatableI.

> I'm not sure I understand the distinction between annotations and
> features. From the docs I got the impression that Features were like
> annotation on bits of sequences and had a reference to the sequence to
> which they belong, whereas annotations don't. If that's the case  
> though,
> why would RNA structure be an annotation rather than a feature? If  
> not,
> what is the distinction between them? Are the positional Annotation
> subclasses you're developing intended to replace features? Have I  
> got the
> wrong end of the stick entirely?
>
> Cheers,
> Cass

The key distinction between seqfeatures and annotations is that  
annotations are normally associated with the entire sequence record,  
while seqfeatures normally describe a part of the sequence (and thus  
have a location on the sequence).  There are a few exceptions, but in  
general that's that case.  The HOWTO gives a bit more background:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Using annotations or seqfeatures in a case like this may be  
completely dependent on one's point of view.  For instance, one  
implementation I had considered was adding an interface to Bio::Seq  
which would allow Seq objects to also have Bio::Structure objects/  
since my view is that any sequence could (optionally) have a  
structure associated with it.  However, I reasoned that a sequence  
could actually have multiple structures (RNA, ssDNA, and protein can  
have several alternative folds or different folding pathways, for  
instance).   Instead of splitting up each structure into individual  
seqfeatures (where each which would have to be tagged with the  
relevant structure and score info), I could have one class encompass  
all of that data in a reasonable way.  Hence I used Annotation.

BTW, this isn't meant to replace features in any way.  It would be  
primarily used to describe (1) a sequence as a whole, such as a tRNA  
sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in  
a genome sequence, or (3) a conserved structure in an alignment, such  
as Rfam stockholm output.

I'll add that the option of splitting the data into seqfeatures isn't  
ruled out.  It would be a matter of using a helper method, maybe in  
SeqUtils or directly in Annotation::Meta or whatever I end up calling  
it.  I plan on adding something along those lines at some point.

chris


From mitch_skinner at berkeley.edu  Wed Feb  7 18:26:53 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:26:53 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
Message-ID: <45CA603D.1070901@berkeley.edu>

Lincoln Stein wrote:
> Zero is not a forbidden coordinate, since gbrowse also works on 
> genetic maps which have negative and floating point coordinates. 
> You've simply picked up a boundary case where the rounding isn't 
> working properly. I will fix this now.
Thanks for the fix.  What do you think of the following case?.  This is 
something I actually ran into.  Suppose you have:
the original draw_grid:

    my $first_tick = $minor * int($self->start/$minor);

and my version of map_pt:

    my $val = $flip
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10.
Our tiles are currently 1000px wide.  So the first gridline will be at 
0bp => -1px and the 200th gridline will be at 2000bp => 1000px.  So the 
first tile will not have a gridline at it's 0th pixel but the second 
tile will have one there.  Last night I was thinking that this was an 
artifact of having gridlines start at 0bp but now I'm thinking this is 
just because rounding half-pixels leaves an extra space when crossing 
zero.  Which is not unreasonable; it just invalidates the assumption I 
was hoping to make that the gridlines are the same for each tile.  Maybe 
it's just unreasonable to think that floating point calculations will 
give pixel-exact results.

Or I may just be barking up the wrong tree entirely.  Perhaps it's time 
to reconsider at a higher level (see my next message).

Mitch


From mitch_skinner at berkeley.edu  Wed Feb  7 18:28:11 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:28:11 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
Message-ID: <45CA608B.80907@berkeley.edu>

Lincoln Stein wrote:
> However, I'm also very interested in why grid-drawing takes so long. 
> When I've profiled drawing, neither grid drawing nor map_pt() consume 
> any significant amount of time.
Well, the approach that we've been taking is to hand 
Bio::Graphics::Panel a fake GD object that stores all of the graphical 
primitives (line, rectangle, filledRectangle, etc. + their parameters) 
and then draws them later in chunks (for each tile, we draw all the 
primitives that overlap its pixel boundaries).  We're doing this because 
trying to create a real GD object that's hundreds of millions of pixels 
wide takes too much RAM.  But storing all the gridlines (for a whole 
chromosome, at a high zoom level) also takes a lot of RAM, and getting 
the gridlines for the current tile and translating their coordinates 
into the coordinate space of the tile also takes a fair amount of CPU.  
The gridline hack I've been experimenting with (that prompted these 
emails) was motivated by the hope that the gridlines were regular enough 
that we wouldn't have to store them explicitly, but just draw the same 
gridlines over and over again.  It runs almost twice as fast as the 
version that explicitly stores the gridlines.

So the main slowdown is not in draw_grid or map_pt, but in our code 
that's storing/retrieving and translating the gridlines.  Which we are 
also looking into speeding up.  But the memory usage is harder to 
reduce; I've experimented with trying to compress the gridline data but 
it seems easier to just have the panel draw the grid directly.

The more I read the Panel code, the more I think it would be nice to 
make more use of it.  One of the reasons that we're trying to fool it 
right now is that there seem to be a number of behaviors in it (and/or 
in the glyphs?) that take the current image boundaries into account 
(drawing an arrow where a feature runs off the edge of the image, 
etc.).  But in our browser each tile is supposed to mesh seamlessly with 
its neighbor, so if there's an easy way to turn off those edge-aware 
behaviors that would be pretty interesting.

Ian has also suggested that it might be better to store less information 
than the full set of graphics primitives.  For example, we could just 
store the Panel's glyph boxes and use their (pixel bound)->feature 
information to decide which features need to be drawn for each tile.

I'm going to be spending some time reading the Bio::Graphics code in 
more depth.  I'd also welcome suggestions from you or anyone on the list.

Thanks,
Mitch


From sdbrown at annular.org  Wed Feb  7 18:41:13 2007
From: sdbrown at annular.org (Steven Brown)
Date: Wed, 7 Feb 2007 15:41:13 -0800
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>

The module seems to have trouble handling the cut-site specifiers  
that surround the sequence that the enzyme is specific for.  The error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad end parameter (22). End must be less than the total length  
of sequence (total=6)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/PrimarySeq.pm:371
STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 
5.8.6/Bio/Restriction/Analysis.pm:369
STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
---snip (my script line)---
-----------------------------------------------------------

The offending enzyme:

---snip---
<1>AcuI
<2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
<3>CTGAAG(16/14)
---snip---

If I get rid of the (16/14) the error disappears and the right  
sequence site is matched.  It seems like maybe a decision was made  
not analyze enzymes with remote cut positions, or the code wouldn't  
throw the error...?  Any information on this would be helpful.

Thanks,
Steve


From adsj at novozymes.com  Thu Feb  8 03:55:50 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Thu, 08 Feb 2007 09:55:50 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
References: <8764adoptn.fsf@topper.koldfront.dk>
	<C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>
Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk>

On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote:

> This works for me using bioperl-live (Mac OS X):

> ok 1 - Qualifier note found
> ok 2 - Qualifier db_xref found

*slaps forehead*

Thanks for the test - your diagnose was spot on:

> If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
> mixing the two versions (you can check by using 'perldoc -l  
> Bio::Root::Root').

I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in
my @INC (added, and promptly forgotten, writing the patch mentioned
here: <http://article.gmane.org/gmane.comp.lang.perl.bio.general/13349/>).

Removing those and patching 1.5.2 fixed my self-inflicted problem.


  Thanks again!

     Adam

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com


From heikki at sanbi.ac.za  Thu Feb  8 04:39:47 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Feb 2007 11:39:47 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
Message-ID: <200702081139.48125.heikki@sanbi.ac.za>

The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an 
existing sequence. Maybe your sequence has a restriction site that is near 
the end of your sequence?

This is a special case which has not been into account in 
Bio::Restriction::Analysis::_cuts method. 

The question is : should the site be be detected if its cut site is not within 
the studied sequence?

Please submit a bugzilla bug, so this gets solved. I probably do not have time 
to tweak the code myself.

	-Heikki


On Thursday 08 February 2007 01:41:13 Steven Brown wrote:
> The module seems to have trouble handling the cut-site specifiers
> that surround the sequence that the enzyme is specific for.  The error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bad end parameter (22). End must be less than the total length
> of sequence (total=6)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/PrimarySeq.pm:371
> STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
> STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
> STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/
> 5.8.6/Bio/Restriction/Analysis.pm:369
> STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
> ---snip (my script line)---
> -----------------------------------------------------------
>
> The offending enzyme:
>
> ---snip---
> <1>AcuI
> <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
> <3>CTGAAG(16/14)
> ---snip---
>
> If I get rid of the (16/14) the error disappears and the right
> sequence site is matched.  It seems like maybe a decision was made
> not analyze enzymes with remote cut positions, or the code wouldn't
> throw the error...?  Any information on this would be helpful.
>
> Thanks,
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Thu Feb  8 09:20:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Feb 2007 08:20:26 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
Message-ID: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>

All,

BLAST XML parsing should now work for any CPAN-based XML::SAX parser!

XML::SAX::PurePerl (comes with XML::SAX, the slowest)
XML::SAX::Expat
XML::SAX::ExpatXS (the fastest)
XML::LibXML::SAX
XML::LibXML::SAX::Parser

Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl  
bug, so using that parser will necessitate an XML::SAX upgrade.  I  
had also found a bug in the SAX handler which chopped off a large  
chunk of the description for hits which is now fixed in CVS.

If Sendu is out there, I think we can safely remove any dependencies  
beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
modify Build.PL?

chris


From lstein at cshl.edu  Thu Feb  8 10:51:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 8 Feb 2007 10:51:49 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45CA608B.80907@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com>

Hi,

I like the approach you're taking (creating a fake GD object that stores the
graphics primitives). Perhaps the best thing to do is to subclass Panel
itself so that it doesn't draw the gridlines (or turn gridlines off
completely). Then you can draw gridlines after the fact in each tile as
needed.

Lincoln

On 2/7/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Lincoln Stein wrote:
> > However, I'm also very interested in why grid-drawing takes so long.
> > When I've profiled drawing, neither grid drawing nor map_pt() consume
> > any significant amount of time.
> Well, the approach that we've been taking is to hand
> Bio::Graphics::Panel a fake GD object that stores all of the graphical
> primitives (line, rectangle, filledRectangle, etc. + their parameters)
> and then draws them later in chunks (for each tile, we draw all the
> primitives that overlap its pixel boundaries).  We're doing this because
> trying to create a real GD object that's hundreds of millions of pixels
> wide takes too much RAM.  But storing all the gridlines (for a whole
> chromosome, at a high zoom level) also takes a lot of RAM, and getting
> the gridlines for the current tile and translating their coordinates
> into the coordinate space of the tile also takes a fair amount of CPU.
> The gridline hack I've been experimenting with (that prompted these
> emails) was motivated by the hope that the gridlines were regular enough
> that we wouldn't have to store them explicitly, but just draw the same
> gridlines over and over again.  It runs almost twice as fast as the
> version that explicitly stores the gridlines.
>
> So the main slowdown is not in draw_grid or map_pt, but in our code
> that's storing/retrieving and translating the gridlines.  Which we are
> also looking into speeding up.  But the memory usage is harder to
> reduce; I've experimented with trying to compress the gridline data but
> it seems easier to just have the panel draw the grid directly.
>
> The more I read the Panel code, the more I think it would be nice to
> make more use of it.  One of the reasons that we're trying to fool it
> right now is that there seem to be a number of behaviors in it (and/or
> in the glyphs?) that take the current image boundaries into account
> (drawing an arrow where a feature runs off the edge of the image,
> etc.).  But in our browser each tile is supposed to mesh seamlessly with
> its neighbor, so if there's an easy way to turn off those edge-aware
> behaviors that would be pretty interesting.
>
> Ian has also suggested that it might be better to store less information
> than the full set of graphics primitives.  For example, we could just
> store the Panel's glyph boxes and use their (pixel bound)->feature
> information to decide which features need to be drawn for each tile.
>
> I'm going to be spending some time reading the Bio::Graphics code in
> more depth.  I'd also welcome suggestions from you or anyone on the list.
>
> Thanks,
> Mitch
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Kevin.M.Brown at asu.edu  Thu Feb  8 10:28:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 08:28:30 -0700
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu>

> The more I read the Panel code, the more I think it would be 
> nice to make more use of it.  One of the reasons that we're 
> trying to fool it right now is that there seem to be a number 
> of behaviors in it (and/or in the glyphs?) that take the 
> current image boundaries into account (drawing an arrow where 
> a feature runs off the edge of the image, etc.).  But in our 
> browser each tile is supposed to mesh seamlessly with its 
> neighbor, so if there's an easy way to turn off those 
> edge-aware behaviors that would be pretty interesting.

I think the glyphs try to deal with edges because if they didn't, then
they would flow out into whatever right or left padding had been placed
around the image when the panel was created.  Something I've noticed is
that when I create tiles for the chromosomes I'm working on the panels
don't line up because the bump position in one panel is not accounted
for when the next panel is drawn.


From sheris at eps.berkeley.edu  Thu Feb  8 12:42:27 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Thu, 08 Feb 2007 09:42:27 -0800
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>

Hi,
I'm a newbie to BioPerl so apologies if this is a very basic 
question. I am trying to parse GenBank files with the goal of 
creating concatenated gene lists in nucleic acid or amino acid 
format. It is working fine, except for one thing: I need to create 
gene labels incorporating information on whether the gene is on the 
complementary strand or not ("complement" in the CDS tag). How can I 
get Bioperl to tell me whether the CDS tag value includes the word 
"complement"?

Thanks
Sheri


From george.heller at yahoo.com  Thu Feb  8 13:54:41 2007
From: george.heller at yahoo.com (George Heller)
Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST)
Subject: [Bioperl-l] Perl script to extract from ncbi
Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com>

Hi all, 
   
  I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. 
   
  I have about 1500 records for which I need to extract the names from ncbi. 
   
  Any ideas of how I can go about writing a perl script for extracting this information from ncbi?
   
  Thanks!
  George.

 
---------------------------------
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.


From Kevin.M.Brown at asu.edu  Thu Feb  8 14:11:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 12:11:50 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu>

When you extract the features, just look at the strand method on the
returned sequence to find out.

@features = $seq->all_SeqFeatures;
# sort features by their primary tags
for my $f (@features)
{
	my $tag = $f->primary_tag;
	if ($tag eq 'CDS')
	{
		print $f->strand ."\n";
	}
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sheri Simmons
> Sent: Thursday, February 08, 2007 10:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl newbie needs help with 
> extracting cds info
> 
> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic 
> question. I am trying to parse GenBank files with the goal of 
> creating concatenated gene lists in nucleic acid or amino 
> acid format. It is working fine, except for one thing: I need 
> to create gene labels incorporating information on whether 
> the gene is on the complementary strand or not ("complement" 
> in the CDS tag). How can I get Bioperl to tell me whether the 
> CDS tag value includes the word "complement"?
> 
> Thanks
> Sheri
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From barry.moore at genetics.utah.edu  Thu Feb  8 14:35:03 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 8 Feb 2007 12:35:03 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <E6200600-30F2-4471-9107-29A355F543F9@genetics.utah.edu>

Sheri-

The Bio::SeqFeature::Generic object has a 'strand' method, so you can  
just call strand on the CDS (or any other) feature like this.

   my @features = grep { $_->primary_tag eq 'CDS' } $seq- 
 >get_SeqFeatures();
   for my $feature (@features) {
	  my $strand = $feature->strand;
  }

Barry

On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote:

> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic
> question. I am trying to parse GenBank files with the goal of
> creating concatenated gene lists in nucleic acid or amino acid
> format. It is working fine, except for one thing: I need to create
> gene labels incorporating information on whether the gene is on the
> complementary strand or not ("complement" in the CDS tag). How can I
> get Bioperl to tell me whether the CDS tag value includes the word
> "complement"?
>
> Thanks
> Sheri
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Feb  8 23:18:33 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 9 Feb 2007 15:18:33 +1100
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>

Chris,

> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
> XML::SAX::Expat
> XML::SAX::ExpatXS (the fastest)
> XML::LibXML::SAX
> XML::LibXML::SAX::Parser

That's excellent news - thanks for all the work you have put in on
this one. I'm impressed.

This is a good opportunity to encourage people who use Bio::SearchIO
for BLAST parsing to switch to 'blastxml' format over 'blast'.
Although the latter is more human readable, it perenially requires
parser source changes to cope with the variations and new formatting
introduced with each new NCBI BLAST release. Best to use "-m 7" XML
format, and convert as appropriate using one of the
Bio::Search::Writer:: classes.

--Torsten


From cjfields at uiuc.edu  Fri Feb  9 08:58:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 07:58:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu>

On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote:

> Chris,
>
>> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
>> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
>> XML::SAX::Expat
>> XML::SAX::ExpatXS (the fastest)
>> XML::LibXML::SAX
>> XML::LibXML::SAX::Parser
>
> That's excellent news - thanks for all the work you have put in on
> this one. I'm impressed.

Jason did most of the hard work; I just tinkered with it until it  
worked (and pestered a few perl XML guys along the way).  Thanks  
Grant and Bj?rn!

> This is a good opportunity to encourage people who use Bio::SearchIO
> for BLAST parsing to switch to 'blastxml' format over 'blast'.
> Although the latter is more human readable, it perenially requires
> parser source changes to cope with the variations and new formatting
> introduced with each new NCBI BLAST release. Best to use "-m 7" XML
> format, and convert as appropriate using one of the
> Bio::Search::Writer:: classes.
>
> --Torsten

I'll try getting some benchmarks for the different parsers up today  
on the wiki if I have time.

Strangely enough, NCBI changed a few things about BLAST XML a few  
releases back w/o mentioning it to anyone (it was a silent bug in  
BLAST XML parsing which I fixed recently).  If you sent in multiple  
queries in older versions of BLAST you would get all of the BLAST XML  
reports concatenated together, which required preparsing the reports  
to carve up the XML prior to parsing.  Now they treat it like PSI- 
BLAST where multiple queries = multiple iterations, so you get one  
long XML BLAST report where each iteration=Result.

The current parser should handle both as it just caches the other  
results and returns them one at a time prior to new parses, but I  
wouldn't recommend parsing a huge BLAST XML file with hundreds of  
queries as you'll quickly run out of memory!

If they get Perl SAX2 up to date with Expat they'll eventually add  
parse_chunk() and pause_parse() for each parser.  Until then...

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cuiw at ncbi.nlm.nih.gov  Fri Feb  9 09:20:10 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Fri, 9 Feb 2007 09:20:10 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
References: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov>

This is an example for fetching two GenBank records
(id=124504630,110665734) in XML format. Organism names like
'<GBSeq_organism>Rattus norvegicus</GBSeq_organism>' can be parsed from
the XML. 

 
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i
d=124504630,110665734&retmode=xml&rettype=gb

 
Or you can get TaxIds and translate them into real names:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide
&id=124504630,110665734&retmode=xml

 
Wenwu Cui, PhD

 
-----Original Message-----
From: George Heller [mailto:george.heller at yahoo.com] 
Sent: Thursday, February 08, 2007 1:55 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Perl script to extract from ncbi

 
Hi all, 

   
  I have a question regarding extracting data from Ncbi. I have a
database to store the sequence data, but the files I have loaded into
it, dont have a proper description line specified. Based on the
accession number, I need to find out what is the genus and species name
() from ncbi. 

   
  I have about 1500 records for which I need to extract the names from
ncbi. 

   
  Any ideas of how I can go about writing a perl script for extracting
this information from ncbi?

   
  Thanks!

  George.

 
---------------------------------

Now that's room service! Choose from over 150,000 hotels 

in 45,000 destinations on Yahoo! Travel to find your fit.

_______________________________________________

Bioperl-l mailing list

Bioperl-l at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Feb  9 12:51:19 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 09 Feb 2007 12:51:19 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <C1F21EC7.CBAA%bosborne11@verizon.net>

George,

http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat
abase

Brian O.


On 2/8/07 1:54 PM, "George Heller" <george.heller at yahoo.com> wrote:

> Hi all, 
>    
>   I have a question regarding extracting data from Ncbi. I have a database to
> store the sequence data, but the files I have loaded into it, dont have a
> proper description line specified. Based on the accession number, I need to
> find out what is the genus and species name (organism name) from ncbi.
>    
>   I have about 1500 records for which I need to extract the names from ncbi.
>    
>   Any ideas of how I can go about writing a perl script for extracting this
> information from ncbi?
>    
>   Thanks!
>   George.
> 
>  
> ---------------------------------
> Now that's room service! Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnston at biochem.ucl.ac.uk  Fri Feb  9 14:23:41 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
Message-ID: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>

Hi,

Could WrapperBase::executable warn you if it doesn't find the exe in
program_path? At the moment it just silently goes ahead and uses one in
the system path if it exists.

Cass.

I've never used diff, so not sure if this is right, but:

305,308c305,314
<        if( $prog_path && -e $prog_path && -x $prog_path ) {
<            $self->{'_pathtoexe'} = $prog_path;
<        } else {
<            my $exe;
---
>        if($prog_path){
>        if(-e $prog_path && -x $prog_path){
>          $self->{'_pathtoexe'} = $prog_path;
>        }
>        else{
>          $self->warn("executable not found in $prog_path, trying system
path...") if $warn;
>        }
>        }
>        unless ($self->{_path_to_exe}){
>        my $exe;
335a342


From bix at sendu.me.uk  Fri Feb  9 17:38:59 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:38:59 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
Message-ID: <45CCF803.9030004@sendu.me.uk>

Caroline Johnston wrote:
> Hi,
> 
> Could WrapperBase::executable warn you if it doesn't find the exe in
> program_path? At the moment it just silently goes ahead and uses one in
> the system path if it exists.

No, I think not. That would be very annoying when using wrappers for 
programs that you just have in your system path.

What specific problem are you encountering with the current behaviour?


From bix at sendu.me.uk  Fri Feb  9 17:40:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:40:33 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <45CCF861.8030000@sendu.me.uk>

Chris Fields wrote:
> If Sendu is out there, I think we can safely remove any dependencies  
> beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
> modify Build.PL?

Sure, good to hear.


From cjfields at uiuc.edu  Fri Feb  9 22:42:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 21:42:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45CCF861.8030000@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
Message-ID: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>


On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> If Sendu is out there, I think we can safely remove any dependencies
>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>> modify Build.PL?
>
> Sure, good to hear.

I added a version dependency for XML::SAX (v. 0.15) for the PurePerl  
fix.  That likely obviates the need for a Bundle for XML::Simple.   
Not too pressing; we can determine that before the next release.

chris


From johnston at biochem.ucl.ac.uk  Sat Feb 10 11:27:53 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <45CCF803.9030004@sendu.me.uk>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
	<45CCF803.9030004@sendu.me.uk>
Message-ID: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>

> No, I think not. That would be very annoying when using wrappers for
> programs that you just have in your system path.
>

Hmm, maybe I misundertood what the program_path was for? The executable
method goes straight to the system path unless program_path is set, so I
assumed you would only set program_path if you specifically wanted it to
look somewhere else. You wouldn't get a warning if you didn't specify a
program_path and just left it to look in the system path.

> What specific problem are you encountering with the current behaviour?

One version of an executable in /usr/local, another version - which I
wanted to use in my home directory.
The program_path method gets a path from an environment variable, which
was set to ~/.
I didn't realise I had the wrong permissions on the
executable though, and it was silently failing to use my version and using
the one in /usr/local instead.


Cass


From george.heller at yahoo.com  Sat Feb 10 15:35:18 2007
From: george.heller at yahoo.com (George Heller)
Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST)
Subject: [Bioperl-l] Error while parsing
Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com>

Hi all,
   
  I am in the process of parsing a few files, actually blast results, but happen to get the following error:
   
  ------------- EXCEPTION  -------------
MSG: Can't get HSPs: data not collected.
STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
STACK toplevel parser.pl:31
  --------------------------------------

  I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. 
   
  Thanks!
  George.

 
---------------------------------
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.


From cjfields at uiuc.edu  Sat Feb 10 17:56:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Feb 2007 16:56:19 -0600
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>

On Feb 10, 2007, at 2:35 PM, George Heller wrote:

> Hi all,
>
>   I am in the process of parsing a few files, actually blast  
> results, but happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing  
> wrong. Any pointers are appreciated.
>
>   Thanks!
>   George.

We'll need more to go on than that.  If the bioperl version is  
v1.5.2, please file a bug via the bioperl bugzilla:

http://bugzilla.open-bio.org/

Don't forget to attach a test file which triggers the bug using the  
'Create a new attachment' link after the report has been filed.

chris


From sac at bioperl.org  Sat Feb 10 22:56:10 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Feb 2007 19:56:10 -0800
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com>

Your report may be lacking HSP alignments for the hit you are attempting to
process. Note that by default, blast will report twice as many one-line
descriptions as it will alignments:

  -v  Number of database sequences to show one-line descriptions for (V)
[Integer]
    default = 500
  -b  Number of database sequence to show alignments for (B) [Integer]
    default = 250

Verify that this isn't the case for your error. If not, go ahead and file a
bug report. Attach the report (zipped if big) as well as the relevant
portion of your processing script.

Steve

On 2/10/07, George Heller <george.heller at yahoo.com> wrote:
>
> Hi all,
>
>   I am in the process of parsing a few files, actually blast results, but
> happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp
> /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing wrong.
> Any pointers are appreciated.
>
>   Thanks!
>   George.
>
>
> ---------------------------------
> No need to miss a message. Get email on-the-go
> with Yahoo! Mail for Mobile. Get started.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jay at jays.net  Sun Feb 11 09:24:55 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 08:24:55 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>

Just a heads-up --

I wanted to check the "E-mail me when a page I'm watching is changed"  
box in my preferences

http://www.bioperl.org/wiki/Special:Preferences

But I can't. Even if I change nothing and hit the Save button I get  
this:

----------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "User::saveSettings". MySQL returned error  
"1054: Unknown column 'user_newpass_time' in 'field list' (localhost)".
----------

(Yes, it literally says "(SQL query hidden)". That wasn't me for the  
purposes of this email. -grin-)

Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


Username:	Jhannah
User ID:	51


From jay at jays.net  Sun Feb 11 10:16:13 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 09:16:13 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>

Hmm.... The error appears to not be limited to changing preferences.  
I tried to update a couple different pages and got errors like this:

------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "Article::updateRedirectOn". MySQL returned  
error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
------

So all changes to the wiki aren't working right now?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Sun Feb 11 15:18:15 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 12:18:15 -0800
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>

Should be fine now - I did an upgrade to mediawiki 1.9 this weekend  
and i think the upgrade script didn't finish.

In the future system support requests should go to support - AT -  
open-bio.org so we can track them.

-jason
On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:

> Hmm.... The error appears to not be limited to changing preferences.
> I tried to update a couple different pages and got errors like this:
>
> ------
> Database error
> A database query syntax error has occurred. This may indicate a bug
> in the software. The last attempted database query was:
>
>      (SQL query hidden)
>
> from within function "Article::updateRedirectOn". MySQL returned
> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
> ------
>
> So all changes to the wiki aren't working right now?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From cjfields at uiuc.edu  Sun Feb 11 15:51:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 11 Feb 2007 14:51:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
Message-ID: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>

Is there a good place on the main wiki page to prominently display  
this?  I wanted to place something at the top of the main page but I  
didn't know if we wanted to post the support email address on the  
page itself.

chris

On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote:

> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend
> and i think the upgrade script didn't finish.
>
> In the future system support requests should go to support - AT -
> open-bio.org so we can track them.
>
> -jason
> On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:
>
>> Hmm.... The error appears to not be limited to changing preferences.
>> I tried to update a couple different pages and got errors like this:
>>
>> ------
>> Database error
>> A database query syntax error has occurred. This may indicate a bug
>> in the software. The last attempted database query was:
>>
>>      (SQL query hidden)
>>
>> from within function "Article::updateRedirectOn". MySQL returned
>> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
>> ------
>>
>> So all changes to the wiki aren't working right now?
>>
>> j
>> seqlab.net
>> http://www.bioperl.org/wiki/User:Jhannah
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Sun Feb 11 15:56:53 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 14:56:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
	<E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
Message-ID: <CAF40EBD-F0E2-434C-91F4-2B766B20E734@jays.net>

On Feb 11, 2007, at 2:51 PM, Chris Fields wrote:
> Is there a good place on the main wiki page to prominently display  
> this?  I wanted to place something at the top of the main page but  
> I didn't know if we wanted to post the support email address on the  
> page itself.

I added it here:

http://www.bioperl.org/wiki/About_site

Which is linked from all pages via the left-hand bar:  community |  
About this site

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From agd27 at cornell.edu  Sun Feb 11 12:47:03 2007
From: agd27 at cornell.edu (Adam Diehl)
Date: Sun, 11 Feb 2007 12:47:03 -0500
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
Message-ID: <45CF5697.60703@cornell.edu>

Good morning folks,

I've got sort of a newbie question regarding how to get gff's out of 
Bio::Tools:GFF objects that are formatted according to the UCSC browser 
conventions, described here:

http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
(Ignore the custom track headers and what-not. I just need the fields to 
be set up according to the descriptions in 1 - 9).

The write_feature($feature) method isn't doing it for me, as I get lines 
like the following (newlines excepted):

chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +       
.       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +       
.       
EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_
id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT
LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN

As you can see, field 8, which should be frame according to UCSC 
conventions is blank, and field 9, group according to UCSC, has frame, 
along with ID, etc. All this extra stuff causes the UCSC browser to 
choke. First off, it can't identify which features are the same (it does 
this by matching the group field), and second, it can't interpret the 
CDS's into translated proteins because it lacks frame data.

Basically what I need to do is, for CDS features, extract frame (or 
codon_start, as it were), from the last field, parse out the integer 
value and store that in field 8 (as frame), then parse out locus_tag 
from the last field, clear out everything else and store the locus_tag 
only in that field (preferably without the qualifier locus_tag=). For 
feature type gene, I just want to do the last step, so that the gene and 
CDS features for the same feature have matching group fields that are as 
simple as possible. Let me know if this is not clear.

The way I've been trying to do this is by stringifying each gff object, 
splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the 
following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to 
parse out the bits I need with regular expressions and store back to 
@tmp1[n].  -- This does not work, because perl wants to interpret every 
/ + etc. as a metacharacter!

I am assuming there's a simple way to get at each value in the last 
field of the gff object using methods supplied by Bio::Tools::GFF, but 
the API docs seem a bit lacking in this area. Could anyone steer me 
towards what I need to know to do this? Please let me know if I can 
clarify any details!

Cheers,
Adam Diehl


From jason at bioperl.org  Sun Feb 11 18:29:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 15:29:16 -0800
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
In-Reply-To: <45CF5697.60703@cornell.edu>
References: <45CF5697.60703@cornell.edu>
Message-ID: <F6B017A7-E91F-4739-9688-F1212EC857C8@bioperl.org>

I assume you are getting your features from a Bio::SeqIO parse of a  
Genbank file?

you get back a Bio::SeqFeature::Generic objects  so you want to look  
at the docs for that module to see what the API is.
you will need to set frame via
$feature->frame($frame)
You are going to have to determine the frame yourself if that isn't  
part of the feature, we don't calculate it for you.

For the 9th column, this is available through the tags methods  
has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag
so you can remove all the tags you don't want through remove_tag (or  
if you want to remove them all)
my $locus;
for my $tag ( $feature->get_all_tags ) {
  if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it
   ($locus) = $feature->get_tag_values($tag);
  }
  $feature->remove_tag($tag);
}

You will also want to set the GFF format when you call  
Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I  
don't know exactly how you set the tag then when they aren't paired  
with key=>value, you'll need to set the tag to 'group' so
$feature->add_tag_value('group', $locus);

If this is all unsatistfactory you can easily write your own GFF  
write for your flavor of the data with the
print join("\t",
                  $feat->seq_id,
                  $feat->source_tag,
                  $feat->primary_tag,
                  $feat->start,
                  $feat->end,
                  $feat->score,
                  $feat->strand > 0 ? '+' : '-',
                  $feat->frame,
		$locus), "\n";


-jason
On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote:

> Good morning folks,
>
> I've got sort of a newbie question regarding how to get gff's out of
> Bio::Tools:GFF objects that are formatted according to the UCSC  
> browser
> conventions, described here:
>
> http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
> (Ignore the custom track headers and what-not. I just need the  
> fields to
> be set up according to the descriptions in 1 - 9).
>
> The write_feature($feature) method isn't doing it for me, as I get  
> lines
> like the following (newlines excepted):
>
> chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +
> .       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
> chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +
> .
> EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: 
> 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase 
> +III%2C+beta+chain;protein_
> id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA 
> IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK 
> EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI 
> VLSNHKDFKAVATDSHRMSQRLIT
> LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE 
> TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP 
> TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN
>
> As you can see, field 8, which should be frame according to UCSC
> conventions is blank, and field 9, group according to UCSC, has frame,
> along with ID, etc. All this extra stuff causes the UCSC browser to
> choke. First off, it can't identify which features are the same (it  
> does
> this by matching the group field), and second, it can't interpret the
> CDS's into translated proteins because it lacks frame data.
>
> Basically what I need to do is, for CDS features, extract frame (or
> codon_start, as it were), from the last field, parse out the integer
> value and store that in field 8 (as frame), then parse out locus_tag
> from the last field, clear out everything else and store the locus_tag
> only in that field (preferably without the qualifier locus_tag=). For
> feature type gene, I just want to do the last step, so that the  
> gene and
> CDS features for the same feature have matching group fields that  
> are as
> simple as possible. Let me know if this is not clear.
>
> The way I've been trying to do this is by stringifying each gff  
> object,
> splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the
> following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally,  
> trying to
> parse out the bits I need with regular expressions and store back to
> @tmp1[n].  -- This does not work, because perl wants to interpret  
> every
> / + etc. as a metacharacter!
>
> I am assuming there's a simple way to get at each value in the last
> field of the gff object using methods supplied by Bio::Tools::GFF, but
> the API docs seem a bit lacking in this area. Could anyone steer me
> towards what I need to know to do this? Please let me know if I can
> clarify any details!
>
> Cheers,
> Adam Diehl
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From bix at sendu.me.uk  Sun Feb 11 18:39:15 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 11 Feb 2007 23:39:15 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>	<45CCF803.9030004@sendu.me.uk>
	<Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
Message-ID: <45CFA923.8010201@sendu.me.uk>

Caroline Johnston wrote:
>> No, I think not. That would be very annoying when using wrappers for
>> programs that you just have in your system path.
> 
> Hmm, maybe I misundertood what the program_path was for? The executable
> method goes straight to the system path unless program_path is set, so I
> assumed you would only set program_path if you specifically wanted it to
> look somewhere else. You wouldn't get a warning if you didn't specify a
> program_path and just left it to look in the system path.

Yes, sorry. Having now actually looked at your patch it seems fine. I'll 
commit it unless someone beats me to it.


From flope004 at hotmail.com  Sun Feb 11 21:40:08 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 03:40:08 +0100
Subject: [Bioperl-l] TreeIO, how it works?
Message-ID: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>

Hi,

I have a problem. I don't understand how TreeIO reads the trees:
my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));

An unrooted tree with 4 tips and 2 internal nodes.
when I asked for:
print "Total number of nodes ",$tree->number_nodes;

I get 6 but when I ask for:
foreach my $node (@nodes) {
	print $node->internal_id,",";
}
I get 6,0,1,2,3,4,5. Total 7.

The root is number 6 and 2 and 5 are my internal nodes.
If I set the root to be number 5 this node 6 is still present.
Why? what is the node 6?

when I try the following:
  $node5 = $tree->find_node(-internal_id => '5');
  $node6 = $tree->find_node(-internal_id => '6');
  $node2 = $tree->find_node(-internal_id => '2');
  $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
  $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
  $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
  or any other distance I get 2 warnings:
  -------------------- WARNING ---------------------
MSG: Must provide a valid array reference for -nodes
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Could not find distance!
---------------------------------------------------
What am I doing incorrectly?

I am practicing with AlignIO and TreeIO to calculate the maximum likelihood 
for a given tree. So,
other information about that would be of great help. I am practicing with 
this to see how Bioperl can
help me with more complex problems.

Thank you very much for your help!

_________________________________________________________________
Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos 
incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos


From jason at bioperl.org  Sun Feb 11 22:05:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 19:05:18 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
References: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>


On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote:

> Hi,
>
> I have a problem. I don't understand how TreeIO reads the trees:
> my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));
>
> An unrooted tree with 4 tips and 2 internal nodes.
> when I asked for:
> print "Total number of nodes ",$tree->number_nodes;
>
> I get 6 but when I ask for:
> foreach my $node (@nodes) {
> 	print $node->internal_id,",";
> }
> I get 6,0,1,2,3,4,5. Total 7.
>
> The root is number 6 and 2 and 5 are my internal nodes.
> If I set the root to be number 5 this node 6 is still present.
> Why? what is the node 6?

Node 6 is to hold the root or a fake root with a trifurcation for  
unrooted trees.  Did you actually call the reroot method to set the  
root to node 5?

>
> when I try the following:
>   $node5 = $tree->find_node(-internal_id => '5');
>   $node6 = $tree->find_node(-internal_id => '6');
>   $node2 = $tree->find_node(-internal_id => '2');
>   $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
>   $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
>   $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
>   or any other distance I get 2 warnings:
>   -------------------- WARNING ---------------------
> MSG: Must provide a valid array reference for -nodes
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: Could not find distance!
> ---------------------------------------------------
> What am I doing incorrectly?
>
The distance method is just summing branch lengths on the path  
between two nodes.  Is that what are you trying to do?

The error message you report doesn't make sense as
"Must provide a valid array reference for -nodes"
is only printed when you call is_monophyletic or is_paraphyletic as  
far as I can tell.

what version of bioperl are you using?

> I am practicing with AlignIO and TreeIO to calculate the maximum  
> likelihood
> for a given tree. So,other information about that would be of great  
> help. I am practicing with
> this to see how Bioperl can help me with more complex problems.
>
You are trying to calculate the likelihood of a tree or are you  
trying to generate a ML tree from an alignment?

> Thank you very much for your help!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos
> incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis.
> http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From er at xs4all.nl  Mon Feb 12 08:03:06 2007
From: er at xs4all.nl (Erik)
Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET)
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>

Hi,


The bioperl wiki changes rss / atom feed has two leading empty lines which
invalidate the xml:

XML Parsing Error: xml declaration not at start of external entity
Location:
http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss
Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
^

Could those be removed? (I didn't see a way to do it myself). Might be a
useful feed :)


thanks,

Erik


From cjfields at uiuc.edu  Mon Feb 12 09:52:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Feb 2007 08:52:44 -0600
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
	<20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
Message-ID: <DA1A57C0-32B5-4095-AB80-318B5F529730@uiuc.edu>

I have forwarded this to support at open-bio.org, which should take  
care of it.

chris

On Feb 12, 2007, at 7:03 AM, Erik wrote:

> Hi,
>
>
> The bioperl wiki changes rss / atom feed has two leading empty  
> lines which
> invalidate the xml:
>
> XML Parsing Error: xml declaration not at start of external entity
> Location:
> http://www.bioperl.org/w/index.php? 
> title=Special:Recentchanges&feed=rss
> Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
> ^
>
> Could those be removed? (I didn't see a way to do it myself). Might  
> be a
> useful feed :)
>
>
> thanks,
>
> Erik
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sm8 at sanger.ac.uk  Mon Feb 12 12:12:00 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 17:12:00 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF830@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From sm8 at sanger.ac.uk  Mon Feb 12 11:04:41 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 16:04:41 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From flope004 at hotmail.com  Mon Feb 12 13:07:12 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 19:07:12 +0100
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>
Message-ID: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>

thanks for your reply!

I am using Bioperl 1.4.

>Node 6 is to hold the root or a fake root with a trifurcation for
>unrooted trees.  Did you actually call the reroot method to set the
>root to node 5?

Yes, I tried the following with the same result:
$tree->reroot($tree->find_node(-internal_id => '5'));
or
$tree->set_root_node($tree->find_node(-internal_id => '5'));

Even if I use a rooted tree: 
(((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1);
I get the node #6. So, is it always present? Am I not representing properly 
a rooted tree  in newick format?

>The distance method is just summing branch lengths on the path
>between two nodes.  Is that what are you trying to do?
>
>The error message you report doesn't make sense as
>"Must provide a valid array reference for -nodes"
>is only printed when you call is_monophyletic or is_paraphyletic as
>far as I can tell.

I do not know yet what I was doing incorrectly but now It works. Yes, I was 
using the distance method to know where the node 6 was located. For the 
unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree 
node 6 was 0.1 from the mouse leaf and the internal node (root).
The error message: "Must provide a valid array reference for -nodes" is 
shown if I indicate a node which is not present in the tree.

>You are trying to calculate the likelihood of a tree or are you
>trying to generate a ML tree from an alignment?

I am trying to calculate the likelihood of a tree, as a practice. Probably 
there are other  bioperl modules, besides AlignIO and TreeIO, which can help 
me in the process and I do not know them.

Again, thank you for your time!

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From dmessina at wustl.edu  Mon Feb 12 12:49:49 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 12 Feb 2007 11:49:49 -0600
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu>

Stephen,

Great, thanks for this. Could you submit it to Bugzilla as an  
enhancement?

http://bugzilla.open-bio.org/


Thanks,
Dave


From jason at bioperl.org  Mon Feb 12 13:38:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 12 Feb 2007 10:38:11 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
References: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
Message-ID: <BD0EF8B4-69A9-468E-A722-1110B02D0EF7@bioperl.org>

I would definitely suggest getting ahold of bioperl 1.5.2 as I seem  
to remember there are several fixes in the tree module code for re- 
rooting a tree.
-jason

On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote:

> thanks for your reply!
>
> I am using Bioperl 1.4.
>
>> Node 6 is to hold the root or a fake root with a trifurcation for
>> unrooted trees.  Did you actually call the reroot method to set the
>> root to node 5?
>
> Yes, I tried the following with the same result:
> $tree->reroot($tree->find_node(-internal_id => '5'));
> or
> $tree->set_root_node($tree->find_node(-internal_id => '5'));
>
> Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): 
> 0.1,mouse:0.1);
> I get the node #6. So, is it always present? Am I not representing  
> properly a rooted tree  in newick format?
>
>> The distance method is just summing branch lengths on the path
>> between two nodes.  Is that what are you trying to do?
>>
>> The error message you report doesn't make sense as
>> "Must provide a valid array reference for -nodes"
>> is only printed when you call is_monophyletic or is_paraphyletic as
>> far as I can tell.
>
> I do not know yet what I was doing incorrectly but now It works.  
> Yes, I was using the distance method to know where the node 6 was  
> located. For the unrooted tree, node 6 was node 5 (an internal  
> node) and for the rooted tree node 6 was 0.1 from the mouse leaf  
> and the internal node (root).
> The error message: "Must provide a valid array reference for - 
> nodes" is shown if I indicate a node which is not present in the tree.
>
>> You are trying to calculate the likelihood of a tree or are you
>> trying to generate a ML tree from an alignment?
>
> I am trying to calculate the likelihood of a tree, as a practice.  
> Probably there are other  bioperl modules, besides AlignIO and  
> TreeIO, which can help me in the process and I do not know them.
>
> Again, thank you for your time!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet.  
> Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
>

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From johnsonm at gmail.com  Mon Feb 12 18:13:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 12 Feb 2007 17:13:09 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>

On 2/7/07, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     Well, each format has some unique features.  If the user declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just to see
> how nasty it would end up being.  I just can't stomach having the code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.


    I've got a 4-in-1 parser roughed in per Chris Fields' suggestion.   Two
actual parsing routines (prokaryotic and eukaryotic).  You can specify
-format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it
will look through the input until it can figure out what it is looking at.
    I've got one main issue to solve, the rest is just stuff like updating
the POD.  Torsten Seemann very helpfully added example output for all 4
formats to t/data.  Looking at GlimmerHMM.out, the first line is
'GlimmerHMM'.  However, I think there is a bug in the existing
_parse_predictions:

Shouldn't this:

} elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }

be this instead:

} elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }


I lifted that bit of code to do format detection...we don't have GlimmerHMM
installed locally, so I'm assuming Torsten's output is correct and the above
is a bug.  Guess I'll go check bugzilla...


From torsten.seemann at infotech.monash.edu.au  Mon Feb 12 21:07:40 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 13 Feb 2007 13:07:40 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
Message-ID: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>

Mark,

>     I've got one main issue to solve, the rest is just stuff like updating
> the POD.  Torsten Seemann very helpfully added example output for all 4
> formats to t/data.  Looking at GlimmerHMM.out, the first line is
> 'GlimmerHMM'.  However, I think there is a bug in the existing
> _parse_predictions:
> Shouldn't this:
> } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
> be this instead:
> } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version

I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
Here's why:

I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
parse GlimmerM. I noted that GlimmerHMM was the same output format as
GlimmerM, except for the first line. So in rev 1.5 I modified the
regexp to match both ie. \S* . This would also hopefully match any
other Glimmer-clone formats that arose. I also fixed the pdocs to say
this, and added tests to t/Genpred.t.
% cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
% cvs diff -r 1.15 -r 1.16 t/Genpred.t

I then planned to extend support to Glimmer2 and Glimmer3. I added the
4 test files (t/Glimmer*.out) but never wrote the code. This is where
you have come in Mark :-)

> I lifted that bit of code to do format detection...we don't have GlimmerHMM
> installed locally, so I'm assuming Torsten's output is correct and the above
> is a bug.  Guess I'll go check bugzilla...

I'm pretty sure my 4 test files are correct - I spent a lot of time
ensuring they were consistent etc, as I was getting very confused with
the different "glimmer" versions!

Hope this all helps,

--Torsten


From avilella at gmail.com  Tue Feb 13 08:20:15 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 13 Feb 2007 13:20:15 +0000
Subject: [Bioperl-l] number of gaps for the other sequences in an alignment
Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com>

Hi,

It would be great if we could have a method to count, given one
sequence in an alignment, the number of gaps present in the rest of
the sequences of the alignment. That is, for each
nucleotide/aminoacidic position of the sequence of interest, look at
the column in the alignment, count the gaps, then sum them over for
the rest of the non-gapped columns in the sequence of interest.

Has anyone tried this before?

My idea is to end up having a coefficient of indel contribution for
each of the sequences in the alignment, with this coefficient being
high when one sequences forces a lot of gaps to be inserted in the
final alignment, in order to accommodate this given sequence.

I would say that the best place for this is either using methods
already available in SimpleAlign, or have something new added there.

Looking forward to your comments,

Cheers,

    Albert.


From bix at sendu.me.uk  Tue Feb 13 11:09:09 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 13 Feb 2007 16:09:09 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
Message-ID: <45D1E2A5.6060104@sendu.me.uk>

I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database 
and wanted to associated some basic information with them, like exon 
positions. I thought of creating Bio::SeqFeature::Gene::Transcript 
objects and storing them so I could later use features() to see what 
other features overlapped exons. I ran into a fatal error that can be 
replicated with the following simplified one-liner:

perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e 
'$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => 
"dbi:mysql:test"); $trans = 
Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id 
=> "test"); $db->store($trans); @trans = $db->features(-seqid => $id, 
-type => "transcript"); print "@trans\n";'

code sub {
     package Bio::SeqFeature::Generic;
     use strict 'refs';
     my $self = shift @_;
     foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
         $f = undef;
     }
     $$self{'_gsf_seq'} = undef;
     foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
         $$self{'_gsf_tag_hash'}{$t} = undef;
         delete $$self{'_gsf_tag_hash'}{$t};
     }
} did not evaluate to a subroutine reference, at 
/.../Bio/DB/SeqFeature/Store.pm line 2280


Is this a bug? Or am I taking the wrong approach?


From johnsonm at gmail.com  Tue Feb 13 15:10:23 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 13 Feb 2007 14:10:23 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
Message-ID: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>

    You're quite correct.  I wasn't paying enough attention.  That does work
just fine.  I fat-fingered something somewhere else, broke my version of the
module for GlimmerHMM, hallucinated and confused \S and \s.  8)
    All I have left now is to fixup the POD documentation and such and then
I can send the module along and somebody can make whatever tweaks and check
it in.  Shall I open a ticket in Bugzilla for this and attach diffs, or just
send them along to somebody to take care of directly?
    Oh, one thing I have not mentioned.  I also added a -seqname argument.
Glimmer2 does not provide any kind of sequence identifier in the output, and
only processes the first sequence in a fasta file.  It would be tedious to
have to code around this by fixing up the predictions after they are
produced, so I added the option to provide this missing info up front,
hopefully allowing downstream code to not have to care as much and have a
special case for fixing up Glimmer2 predictions.

On 2/12/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:

> I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
> Here's why:
>
> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
> parse GlimmerM. I noted that GlimmerHMM was the same output format as
> GlimmerM, except for the first line. So in rev 1.5 I modified the
> regexp to match both ie. \S* . This would also hopefully match any
> other Glimmer-clone formats that arose. I also fixed the pdocs to say
> this, and added tests to t/Genpred.t.
> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>
> I then planned to extend support to Glimmer2 and Glimmer3. I added the
> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
> you have come in Mark :-)
>
> > I lifted that bit of code to do format detection...we don't have
> GlimmerHMM
> > installed locally, so I'm assuming Torsten's output is correct and the
> above
> > is a bug.  Guess I'll go check bugzilla...
>
> I'm pretty sure my 4 test files are correct - I spent a lot of time
> ensuring they were consistent etc, as I was getting very confused with
> the different "glimmer" versions!
>
> Hope this all helps,
>
> --Torsten
>


From cjfields at uiuc.edu  Tue Feb 13 15:47:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 14:47:19 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
Message-ID: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>

You'll also want to update whatever relevant tests there are for  
Glimmer; looks like they are in GenPred.t.

chris

On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote:

>     You're quite correct.  I wasn't paying enough attention.  That  
> does work
> just fine.  I fat-fingered something somewhere else, broke my  
> version of the
> module for GlimmerHMM, hallucinated and confused \S and \s.  8)
>     All I have left now is to fixup the POD documentation and such  
> and then
> I can send the module along and somebody can make whatever tweaks  
> and check
> it in.  Shall I open a ticket in Bugzilla for this and attach  
> diffs, or just
> send them along to somebody to take care of directly?
>     Oh, one thing I have not mentioned.  I also added a -seqname  
> argument.
> Glimmer2 does not provide any kind of sequence identifier in the  
> output, and
> only processes the first sequence in a fasta file.  It would be  
> tedious to
> have to code around this by fixing up the predictions after they are
> produced, so I added the option to provide this missing info up front,
> hopefully allowing downstream code to not have to care as much and  
> have a
> special case for fixing up Glimmer2 predictions.
>
> On 2/12/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>
>> I think it should be what it says, or perhaps now /^(Glimmer(M| 
>> HMM))/.
>> Here's why:
>>
>> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
>> parse GlimmerM. I noted that GlimmerHMM was the same output format as
>> GlimmerM, except for the first line. So in rev 1.5 I modified the
>> regexp to match both ie. \S* . This would also hopefully match any
>> other Glimmer-clone formats that arose. I also fixed the pdocs to say
>> this, and added tests to t/Genpred.t.
>> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
>> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>>
>> I then planned to extend support to Glimmer2 and Glimmer3. I added  
>> the
>> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
>> you have come in Mark :-)
>>
>>> I lifted that bit of code to do format detection...we don't have
>> GlimmerHMM
>>> installed locally, so I'm assuming Torsten's output is correct  
>>> and the
>> above
>>> is a bug.  Guess I'll go check bugzilla...
>>
>> I'm pretty sure my 4 test files are correct - I spent a lot of time
>> ensuring they were consistent etc, as I was getting very confused  
>> with
>> the different "glimmer" versions!
>>
>> Hope this all helps,
>>
>> --Torsten
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thokeller at gmail.com  Tue Feb 13 17:00:06 2007
From: thokeller at gmail.com (Thomas Keller)
Date: Tue, 13 Feb 2007 14:00:06 -0800
Subject: [Bioperl-l] update/install problem
Message-ID: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>

Could someone suggest a workaround or fix for this error?

$ sudo fink update bioperl-pm586
Information about 5850 packages read in 2 seconds.
The package 'bioperl-pm586' will be built and installed.
The package 'xml-sax-pm586' will be installed.
The package 'xml-sax-writer-pm586' will be built and installed.
The package 'xml-filter-buffertext-pm586' will be built and installed.
The following package will be installed or updated:
 bioperl-pm586
The following 3 additional packages will be installed:
 xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
Do you want to continue? [Y/n] Y
/sw/bin/dpkg-lockwait -i
/sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin-
powerpc.deb
(Reading database ... 48029 files and directories currently installed.)
Preparing to replace xml-sax-pm586 0.13-2 (using
.../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
Unpacking replacement xml-sax-pm586 ...
Setting up xml-sax-pm586 (0.13-2) ...
update-perl586-sax-parsers: adding Perl SAX parser module info file of
XML::SAX::PurePerl...
Can't locate object method "save_parsers_debian" via package "XML::SAX" at
/sw/sbin/update-perl586-sax-parsers line 96.
/sw/bin/dpkg: error processing xml-sax-pm586 (--install):
 subprocess post-installation script returned error exit status 22
Errors were encountered while processing:
 xml-sax-pm586
### execution of /sw/bin/dpkg-lockwait failed, exit code 1
Failed: can't install package xml-sax-pm586-0.13-2


-- 
Tom Keller
"Ecrasez l'Infame!" -- Voltaire


From sac at bioperl.org  Tue Feb 13 18:00:46 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 13 Feb 2007 15:00:46 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>

I noticed that Bio::Root::Utilities was purged from bioperl-live for the
1.5.2 release, but I'd like us to consider adding it back. I agree that the
other purged Root modules were ancient relics of the past, but Bio::Root::
Utilities.pm still has signs of life (at least I still find occasion to use
it, or refer to code in it).

I know that it's not currently used by any other modules in Bioperl, but
there are likely some legacy scripts out there that rely on it. Probably
most of those scripts are ones I've written, but there have been substantive
commits by others in the not-to-distant past (Dec 2005), so at least some
folks besides myself are using it and may hesitate to upgrade their bioperl
installation if it's absent.

I'm all for avoiding bloat in the codebase and am eager to see Bioperl be
more lean and mean, but I'd like to keep this module around. I'll agree to
add some tests for it as well as clean some things up (e.g., use
Bio::Root::IO to get temp file name).

Cheers,
Steve
--
Steve Chervitz
sac at bioperl.org


From cjfields at uiuc.edu  Tue Feb 13 20:29:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 19:29:03 -0600
Subject: [Bioperl-l] update/install problem
In-Reply-To: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
Message-ID: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>

On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote:

> Could someone suggest a workaround or fix for this error?
>
> $ sudo fink update bioperl-pm586
> Information about 5850 packages read in 2 seconds.
> The package 'bioperl-pm586' will be built and installed.
> The package 'xml-sax-pm586' will be installed.
> The package 'xml-sax-writer-pm586' will be built and installed.
> The package 'xml-filter-buffertext-pm586' will be built and installed.
> The following package will be installed or updated:
>  bioperl-pm586
> The following 3 additional packages will be installed:
>  xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
> Do you want to continue? [Y/n] Y
> /sw/bin/dpkg-lockwait -i
> /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ 
> xml-sax-pm586_0.13-2_darwin-
> powerpc.deb
> (Reading database ... 48029 files and directories currently  
> installed.)
> Preparing to replace xml-sax-pm586 0.13-2 (using
> .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
> Unpacking replacement xml-sax-pm586 ...
> Setting up xml-sax-pm586 (0.13-2) ...
> update-perl586-sax-parsers: adding Perl SAX parser module info file of
> XML::SAX::PurePerl...
> Can't locate object method "save_parsers_debian" via package  
> "XML::SAX" at
> /sw/sbin/update-perl586-sax-parsers line 96.
> /sw/bin/dpkg: error processing xml-sax-pm586 (--install):
>  subprocess post-installation script returned error exit status 22
> Errors were encountered while processing:
>  xml-sax-pm586
> ### execution of /sw/bin/dpkg-lockwait failed, exit code 1
> Failed: can't install package xml-sax-pm586-0.13-2

The fink installation seems to be hanging on XML::SAX, not bioperl.   
You could try installing XML::SAX (now at v. 0.15) via CPAN using  
'sudo cpan'; I updated just recently w/o problems.

As an aside, you could similarly install bioperl directly from CPAN  
(which I also haven't had any problems with).  The installation  
allows for installing optional modules.

chris


From cjfields at uiuc.edu  Tue Feb 13 22:41:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 21:41:31 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>


On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:

> I noticed that Bio::Root::Utilities was purged from bioperl-live  
> for the
> 1.5.2 release, but I'd like us to consider adding it back. I agree  
> that the
> other purged Root modules were ancient relics of the past, but  
> Bio::Root::
> Utilities.pm still has signs of life (at least I still find  
> occasion to use
> it, or refer to code in it).
>
> I know that it's not currently used by any other modules in  
> Bioperl, but
> there are likely some legacy scripts out there that rely on it.  
> Probably
> most of those scripts are ones I've written, but there have been  
> substantive
> commits by others in the not-to-distant past (Dec 2005), so at  
> least some
> folks besides myself are using it and may hesitate to upgrade their  
> bioperl
> installation if it's absent.
>
> I'm all for avoiding bloat in the codebase and am eager to see  
> Bioperl be
> more lean and mean, but I'd like to keep this module around. I'll  
> agree to
> add some tests for it as well as clean some things up (e.g., use
> Bio::Root::IO to get temp file name).
>
> Cheers,
> Steve
> --
> Steve Chervitz
> sac at bioperl.org

I don't have a problem with adding it back, esp. if tests are added.   
Everything in Bio::Root* not tied to a module was yanked out when no  
one spoke up about cleaning up Bio::Root* modules:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ 
focus=12839

Maybe others disagree?

chris


From bix at sendu.me.uk  Wed Feb 14 03:00:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 08:00:35 +0000
Subject: [Bioperl-l] update/install problem
In-Reply-To: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
	<C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
Message-ID: <45D2C1A3.9060300@sendu.me.uk>

Chris Fields wrote:
> As an aside, you could similarly install bioperl directly from CPAN  
> (which I also haven't had any problems with).

Indeed. If you follow the unix instructions at 
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have 
a problem-free complete install under Mac OS X.


From bix at sendu.me.uk  Wed Feb 14 09:08:22 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:08:22 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
Message-ID: <45D317D6.5070903@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> If Sendu is out there, I think we can safely remove any dependencies
>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>> modify Build.PL?
>>
>> Sure, good to hear.
> 
> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl 
> fix.  That likely obviates the need for a Bundle for XML::Simple.  Not 
> too pressing; we can determine that before the next release.

The bundle is now obsolete. Does anything in Bioperl, or any of its 
dependencies, now make use of the expat library? If not, I can remove 
mention of it from the install documentation.


From bix at sendu.me.uk  Wed Feb 14 09:02:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:02:39 +0000
Subject: [Bioperl-l] DB.t failures
Message-ID: <45D3167F.2000608@sendu.me.uk>

DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer 
getting sequences back from NCBI in the order we requested them in batch 
mode.

Is this a change at NCBI? Is there some way we can make sure to return 
the sequences in the expected order? Or shouldn't the order be expected 
(should the test script be altered)?


From cjfields at uiuc.edu  Wed Feb 14 09:37:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:37:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu>

Confirmed on this end.

It's possible that the default sort order from eutils is different  
now though I haven't seen anything on the eutils mail list.  There  
may be a way to set the sort order via the base URL; I'll check into  
it later today; I'm still digging myself out from the midwest blizzard.

chris

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:

> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.
>
> Is this a change at NCBI? Is there some way we can make sure to return
> the sequences in the expected order? Or shouldn't the order be  
> expected
> (should the test script be altered)?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Feb 14 09:42:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:42:05 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45D317D6.5070903@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
	<45D317D6.5070903@sendu.me.uk>
Message-ID: <E9611B3C-658E-4CBC-A2ED-1990F929A130@uiuc.edu>


On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> If Sendu is out there, I think we can safely remove any  
>>>> dependencies
>>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>>> modify Build.PL?
>>>
>>> Sure, good to hear.
>>
>> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl
>> fix.  That likely obviates the need for a Bundle for XML::Simple.   
>> Not
>> too pressing; we can determine that before the next release.
>
> The bundle is now obsolete. Does anything in Bioperl, or any of its
> dependencies, now make use of the expat library? If not, I can remove
> mention of it from the install documentation.

I'll try getting something up about XML::SAX on the wiki today.   
XML::Parser, though, still requires expat AFAIK:

http://www.bioperl.org/wiki/BioPerl_Dependencies

chris


From kellert at ohsu.edu  Tue Feb 13 17:43:24 2007
From: kellert at ohsu.edu (Thomas J Keller)
Date: Tue, 13 Feb 2007 14:43:24 -0800
Subject: [Bioperl-l] HowTo:SearchIO
Message-ID: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>

Greetings,
I've been away from programming and informatics for many months.  
Hoping to get back into it, I thought it would be good to review the  
tutorials.
I tried the code in the tutorial on the sample blast report in the  
tutorial and it worked fine. So I ran a blastx search and saved the  
results and tried to parse them: It gave the "... parsing" message,  
but no other results get reported.

Any suggestions?

Thanks,
Tom

Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From mrouard at gmail.com  Wed Feb 14 06:23:47 2007
From: mrouard at gmail.com (Mathieu Rouard)
Date: Wed, 14 Feb 2007 12:23:47 +0100
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
Message-ID: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>

Dear all,

I am starting to use the bioperl API to parse multiple alignments and I am
wondering what is the most effective way to splice all the columns from an
alignment (all the AA at the postion 1, position 2 etc.). I quickly
implemented this simple code but it becomes quite slow when the length of
sequences increases.

my $stream  = Bio::AlignIO->new(-file => $inputfilename,
                        '-format' => 'stockholm');

my $aln = $stream->next_aln();

my $length = $aln->length();
my %column;

for (my $i=1;$i<=$length;$i++) {
       my $aa;
        foreach my $seq ($aln->each_seq()) {
          my $obj = $seq->trunc($i,$i);
          $aa .=$obj->seq;
        }
     # need to track the column number and the sequence of the column
     push $column,  $aa;
}

Would you have any other suggestion?

thanks
Mathieu


From avilella at gmail.com  Wed Feb 14 10:29:02 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 14 Feb 2007 15:29:02 +0000
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
In-Reply-To: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
References: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com>

there is a slice method:

  $mini_aln = $aln->slice(20,30);  # get a block of columns

 Title     : slice
 Usage     : $aln2 = $aln->slice(20,30)
 Function  : Creates a slice from the alignment inclusive of start and
             end columns, and the first column in the alignment is denoted 1.
             Sequences with no residues in the slice are excluded from the
             new alignment and a warning is printed. Slice beyond the length of
             the sequence does not do padding.
 Returns   : A Bio::SimpleAlign object
 Args      : Positive integer for start column, positive integer for end column,
             optional boolean which if true will keep gap-only columns
in the newly
             created slice. Example:

             $aln2 = $aln->slice(20,30,1)

but I don't know how well it behaves for lots of sequences :)


On 2/14/07, Mathieu Rouard <mrouard at gmail.com> wrote:
> Dear all,
>
> I am starting to use the bioperl API to parse multiple alignments and I am
> wondering what is the most effective way to splice all the columns from an
> alignment (all the AA at the postion 1, position 2 etc.). I quickly
> implemented this simple code but it becomes quite slow when the length of
> sequences increases.
>
> my $stream  = Bio::AlignIO->new(-file => $inputfilename,
>                         '-format' => 'stockholm');
>
> my $aln = $stream->next_aln();
>
> my $length = $aln->length();
> my %column;
>
> for (my $i=1;$i<=$length;$i++) {
>        my $aa;
>         foreach my $seq ($aln->each_seq()) {
>           my $obj = $seq->trunc($i,$i);
>           $aa .=$obj->seq;
>         }
>      # need to track the column number and the sequence of the column
>      push $column,  $aa;
> }
>
> Would you have any other suggestion?
>
> thanks
> Mathieu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Wed Feb 14 11:59:49 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 14 Feb 2007 08:59:49 -0800
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>

As always, reporting the version of BLAST and Bioperl you have  
installed will help someone diagnose if this is a fixed problem or  
not.  If you trawl through the list archives you'll chris and others  
have been playing cat and mouse with the text version output from  
NCBI BLAST which appears to be an ever evolving beast.

So the best advice right now is to get the latest bioperl from CVS   
to insure you have all the patches that might parse this version.  If  
it still fails then the standard response will be to submit the  
report as an attachment to a new bug report on the bugzilla.

thanks,
-jason


On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:

> Greetings,
> I've been away from programming and informatics for many months.
> Hoping to get back into it, I thought it would be good to review the
> tutorials.
> I tried the code in the tutorial on the sample blast report in the
> tutorial and it worked fine. So I ran a blastx search and saved the
> results and tried to parse them: It gave the "... parsing" message,
> but no other results get reported.
>
> Any suggestions?
>
> Thanks,
> Tom
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From dmessina at wustl.edu  Wed Feb 14 11:58:45 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 10:58:45 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu>

Hi Tom,

Could you tell us what version of BioPerl you are using, and what  
specific example is failing for  you? And could you post your code?

That would make it easier to diagnose the problem.

Thanks,
Dave

-- 
Dave Messina
Senior Programmer/Analyst, Assembly Group
WashU Genome Sequencing Center
dmessina a t  wustl.edu
314-286-1415


From cjfields at uiuc.edu  Wed Feb 14 12:28:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 11:28:24 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>

I would also strongly encourage switching to using XML-based parsing,  
which is much more stable now.  Here's the link to the NCBI response  
re: BLAST report parsing:

http://bioperl.org/wiki/NCBI_Blast_email

chris (taking a break from shoveling snow...)

On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote:

> As always, reporting the version of BLAST and Bioperl you have
> installed will help someone diagnose if this is a fixed problem or
> not.  If you trawl through the list archives you'll chris and others
> have been playing cat and mouse with the text version output from
> NCBI BLAST which appears to be an ever evolving beast.
>
> So the best advice right now is to get the latest bioperl from CVS
> to insure you have all the patches that might parse this version.  If
> it still fails then the standard response will be to submit the
> report as an attachment to a new bug report on the bugzilla.
>
> thanks,
> -jason
>
>
> On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:
>
>> Greetings,
>> I've been away from programming and informatics for many months.
>> Hoping to get back into it, I thought it would be good to review the
>> tutorials.
>> I tried the code in the tutorial on the sample blast report in the
>> tutorial and it worked fine. So I ran a blastx search and saved the
>> results and tried to parse them: It gave the "... parsing" message,
>> but no other results get reported.
>>
>> Any suggestions?
>>
>> Thanks,
>> Tom
>>
>> Tom Keller, Ph.D.
>> kellert at ohsu.edu
>> 503-494-2442
>> 6339b Basic Science Bldg
>> http://www.ohsu.edu/research/core
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sac at bioperl.org  Wed Feb 14 13:20:17 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 14 Feb 2007 10:20:17 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:
>
> > I noticed that Bio::Root::Utilities was purged from bioperl-live
> > for the
> > 1.5.2 release, but I'd like us to consider adding it back. I agree
> > that the
> > other purged Root modules were ancient relics of the past, but
> > Bio::Root::
> > Utilities.pm still has signs of life (at least I still find
> > occasion to use
> > it, or refer to code in it).
> >
> > I know that it's not currently used by any other modules in
> > Bioperl, but
> > there are likely some legacy scripts out there that rely on it.
> > Probably
> > most of those scripts are ones I've written, but there have been
> > substantive
> > commits by others in the not-to-distant past (Dec 2005), so at
> > least some
> > folks besides myself are using it and may hesitate to upgrade their
> > bioperl
> > installation if it's absent.
> >
> > I'm all for avoiding bloat in the codebase and am eager to see
> > Bioperl be
> > more lean and mean, but I'd like to keep this module around. I'll
> > agree to
> > add some tests for it as well as clean some things up (e.g., use
> > Bio::Root::IO to get temp file name).
> >
> > Cheers,
> > Steve
> > --
> > Steve Chervitz
> > sac at bioperl.org
>
> I don't have a problem with adding it back, esp. if tests are added.
> Everything in Bio::Root* not tied to a module was yanked out when no
> one spoke up about cleaning up Bio::Root* modules:
>
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
> focus=12839
>
> Maybe others disagree?
>
> chris
>

Sorry I missed out on that thread. I had some trouble with my bioperl-l
email delivery getting disabled due to excessive bounces, and it took me a
while to catch it.

Bio::Root::Utilities is quite a grab bag of miscellaneous general functions
that are occasionally useful for perl scripting (e.g., determining
end-of-line characters, sending email, etc.). The code could definitely use
a review, and maybe an example script to advertise it. I can look into this,
and suggestions are welcome.

Steve


From dmessina at wustl.edu  Wed Feb 14 13:55:18 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 12:55:18 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>


On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:

> I would also strongly encourage switching to using XML-based parsing,

Unless anyone objects, I would be happy to update the HOWTO to  
suggest people make the switch and give an example of XML parsing.

The Bio::SearchIO synopsis is already an XML example. However,  
there's no warning about text-based parsing nor a suggestion to use  
XML that I can see -- perhaps should be added?

Dave


From cjfields at uiuc.edu  Wed Feb 14 15:12:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 14:12:21 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
Message-ID: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>


On Feb 14, 2007, at 12:55 PM, David Messina wrote:

>
> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:
>
>> I would also strongly encourage switching to using XML-based parsing,
>
> Unless anyone objects, I would be happy to update the HOWTO to
> suggest people make the switch and give an example of XML parsing.
>
> The Bio::SearchIO synopsis is already an XML example. However,
> there's no warning about text-based parsing nor a suggestion to use
> XML that I can see -- perhaps should be added?
>
> Dave

We should probably add something specifically for BLAST, yes.  Other  
text parsers should be fine.

Personally, I use XML or tabular output parsing simply b/c they are  
faster and do what I need.  I think we'll need to retain the  
capability for text-based BLAST parsing, but it will become extremely  
bloated long-term if we plan on continuing support for parsing all  
versions and flavors of BLAST, particularly if NCBI continues to  
change the output.

chris


From dmessina at wustl.edu  Wed Feb 14 15:46:31 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 14:46:31 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
	<C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu>

On Feb 14, 2007, at 2:12 PM, Chris Fields wrote:

> We should probably add something specifically for BLAST, yes.   
> Other text parsers should be fine.

Good point -- I'll make it clear it's only pertinent to BLAST.


> I think we'll need to retain the capability for text-based BLAST  
> parsing,

Agreed. Through the 1.6 release at least, I would think.


> particularly if NCBI continues to change the output.

Well, clearly the solution is not to use the NCBI flavor of BLAST. :)


Dave
(look at my email address)


From jay at jays.net  Thu Feb 15 08:08:56 2007
From: jay at jays.net (Jay Hannah)
Date: Thu, 15 Feb 2007 07:08:56 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.

Is this the same result you get?


DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
         Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
okay, 85.84%)
Failed Test Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
8 subtests skipped.


Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From bix at sendu.me.uk  Thu Feb 15 08:37:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 13:37:32 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
Message-ID: <45D4621C.6040309@sendu.me.uk>

Jay Hannah wrote:
> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>> getting sequences back from NCBI in the order we requested them in  
>> batch
>> mode.
> 
> Is this the same result you get?
> 
> 
> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
> okay, 85.84%)
> Failed Test Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
> 8 subtests skipped.

Yes, those fails are all caused by results in the wrong order (I believe).


From cjfields at uiuc.edu  Thu Feb 15 09:22:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:22:09 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <CF92D281-CAC2-415C-91A9-CBA0893336B9@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

I'm fixing those now so it doesn't depend on order and will commit in  
the next few minutes.

chris


From bix at sendu.me.uk  Thu Feb 15 09:37:00 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 14:37:00 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
Message-ID: <45D4700C.8020305@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
> 
>> Jay Hannah wrote:
>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>>> getting sequences back from NCBI in the order we requested them in
>>>> batch mode.
 >
> Okay, I committed a fix for that.  I hope there are many users who 
> depend on the returned sequence order for anything!

s/are/aren't/ ?

I suspect there might be, and its certainly a reasonable assumption to 
make. Did you not see an easy way of maintaining the order?


From cjfields at uiuc.edu  Thu Feb 15 09:28:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:28:46 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

Okay, I committed a fix for that.  I hope there are many users who  
depend on the returned sequence order for anything!

chris


From michael.watson at bbsrc.ac.uk  Thu Feb 15 09:44:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 15 Feb 2007 14:44:27 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

OK I have some great images out of this glyph, but I can't see the axis,
and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
publication.  The docs say:

"NOTE: -gc_window=>'auto' gives nice results and is recommended for
drawing GC content. The GC content axes draw slightly outside the
panel, so you may wish to add some extra padding on the right and
left. "

Any idea how to do this?

Basically, I want a nice GC graph with the axis quite clearly labelled,
and a nice "%GC" title next to it :)

Thanks

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From nehadnahar at yahoo.co.in  Thu Feb 15 10:28:42 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>
Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com>

Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine.

Regards,
Neha.

Jason Stajich <jason at bioperl.org> wrote: Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich  wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich  wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>      
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> 


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From cjfields at uiuc.edu  Thu Feb 15 10:44:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 09:44:23 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4700C.8020305@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>


On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>
>>> Jay Hannah wrote:
>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no  
>>>>> longer
>>>>> getting sequences back from NCBI in the order we requested them in
>>>>> batch mode.
>>
>> Okay, I committed a fix for that.  I hope there are many users who
>> depend on the returned sequence order for anything!
>
> s/are/aren't/ ?

Yes, my oops.

> I suspect there might be, and its certainly a reasonable assumption to
> make. Did you not see an easy way of maintaining the order?

I haven't looked (been busy the last few days), but I think there is  
a way via efetch.

We could add in something to the default base URL if there is  
something or (probably better) add a sort_order() method to designate  
a particular sort order, defaulting to the old order if not set.

chris


From lstein at cshl.edu  Thu Feb 15 13:53:13 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Feb 2007 13:53:13 -0500
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>

Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20);

This will leave enough room on the left and right for you to see the Y axis.
Otherwise it runs off the edge of the image (ok, this is a mis-design, but
it was the only way to solve a chicken-and-egg problem about who gets to say
how wide the panel is)

Lincoln

On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote:
>
> Hi
>
> OK I have some great images out of this glyph, but I can't see the axis,
> and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
> publication.  The docs say:
>
> "NOTE: -gc_window=>'auto' gives nice results and is recommended for
> drawing GC content. The GC content axes draw slightly outside the
> panel, so you may wish to add some extra padding on the right and
> left. "
>
> Any idea how to do this?
>
> Basically, I want a nice GC graph with the axis quite clearly labelled,
> and a nice "%GC" title next to it :)
>
> Thanks
>
> Mick
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful.
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute.
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From johnsonm at gmail.com  Thu Feb 15 14:24:08 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 13:24:08 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
Message-ID: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>

Done.  Bug opened in Bugzilla, diffs attached including new/updated tests:

http://bugzilla.open-bio.org/show_bug.cgi?id=2206

Can somebody grab that, take a look, tweak to taste, test and commit?  Tests
pass on my end presently.

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> You'll also want to update whatever relevant tests there are for
> Glimmer; looks like they are in GenPred.t.
>
> chris
>


From cjfields at uiuc.edu  Thu Feb 15 14:37:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:37:22 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
	<ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu>


On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote:

> Done.  Bug opened in Bugzilla, diffs attached including new/updated  
> tests:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2206
>
> Can somebody grab that, take a look, tweak to taste, test and  
> commit?  Tests
> pass on my end presently.
>
> On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> You'll also want to update whatever relevant tests there are for
>> Glimmer; looks like they are in GenPred.t.
>>
>> chris

Done; everything passed on this end as well, no tweaking necessary.   
If there are problems we'll definitely hear about it down the road  
(Glimmer is a popular tool), but I think you'll be fine.

Thanks Mark!

chris


From cjfields at uiuc.edu  Thu Feb 15 14:46:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:46:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
	<809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
Message-ID: <FA9F2E96-064B-4C8F-87BB-D72A7D6F6910@uiuc.edu>


On Feb 15, 2007, at 9:44 AM, Chris Fields wrote:

>
> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>>
>>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>>
>>>> Jay Hannah wrote:
>>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no
>>>>>> longer
>>>>>> getting sequences back from NCBI in the order we requested  
>>>>>> them in
>>>>>> batch mode.
>>>
>>> Okay, I committed a fix for that.  I hope there are many users who
>>> depend on the returned sequence order for anything!
>>
>> s/are/aren't/ ?
>
> Yes, my oops.
>
>> I suspect there might be, and its certainly a reasonable  
>> assumption to
>> make. Did you not see an easy way of maintaining the order?
>
> I haven't looked (been busy the last few days), but I think there is
> a way via efetch.
>
> We could add in something to the default base URL if there is
> something or (probably better) add a sort_order() method to designate
> a particular sort order, defaulting to the old order if not set.
>
> chris

Delving in to it further, the problem only occurs when using  
get_seq_stream() directly in batch mode, which is likely only used by  
developers for testing.  The sort issue only pops up when eposting  
IDs using that mode; retrieved seqs are returned in a different order  
than through a direct efetch query (the default with get_Stream* or  
get_Seq* methods).  No use of the 'sort' parameter works to get  
around that problem, not a complete surprise since it is supposed to  
only work for PubMed, but since the method is rarely used I'll just  
leave the bullet-proofed tests alone.

chris


From letondal at pasteur.fr  Thu Feb 15 15:23:55 2007
From: letondal at pasteur.fr (Catherine Letondal)
Date: Thu, 15 Feb 2007 21:23:55 +0100
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
Message-ID: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>

Hi bioperlers,

I have a script called protal2dna 
(http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see 
attachment #1) that realign DNA sequences giving their sequences + the 
corresponding protein alignment (sequences have to be in the same order 
or named equivalently). We have a parsing problem reported from the 
AlignIO class when users enter some clustalw file (see attachment #2 
for an example):

% protal2dna alig-protal2dna.dat dna-protal2dna.data
no alignment available in 'clustalw' format from file 
'alig-protal2dna.dat'
%

I have tried with bioperl 1.4. I have looked in the archive and in the 
BUGS, but found nothing?
Is there any bug fix for this? I also provide the DNA sequences file if 
you want to test.

Thanks a lot in advance,

--
Catherine Letondal -- Institut Pasteur
www.pasteur.fr/~letondal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: protal2dna
Type: application/octet-stream
Size: 11093 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0006.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: alig-protal2dna.dat
Type: application/octet-stream
Size: 12022 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0007.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dna-protal2dna.data
Type: application/octet-stream
Size: 7739 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0008.obj>

From Kevin.M.Brown at asu.edu  Thu Feb 15 16:38:25 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 15 Feb 2007 14:38:25 -0700
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
In-Reply-To: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
References: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu>

Did you try Bioperl 1.5.2 to see if updates to it might fix the issue?
IIRC 1.4 is nearly 2 years old now.  1.5.2 was released within the last
few months.

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Catherine Letondal
> Sent: Thursday, February 15, 2007 1:24 PM
> To: bioperl-l
> Cc: Catherine Letondal; Katja Schuerer
> Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
> 
> Hi bioperlers,
> 
> I have a script called protal2dna
> (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, 
> see attachment #1) that realign DNA sequences giving their 
> sequences + the corresponding protein alignment (sequences 
> have to be in the same order or named equivalently). We have 
> a parsing problem reported from the AlignIO class when users 
> enter some clustalw file (see attachment #2 for an example):
> 
> % protal2dna alig-protal2dna.dat dna-protal2dna.data no 
> alignment available in 'clustalw' format from file 
> 'alig-protal2dna.dat'
> %
> 
> I have tried with bioperl 1.4. I have looked in the archive 
> and in the BUGS, but found nothing?
> Is there any bug fix for this? I also provide the DNA 
> sequences file if you want to test.
> 
> Thanks a lot in advance,
> 
> --
> Catherine Letondal -- Institut Pasteur
> www.pasteur.fr/~letondal
> 
> 


From cjfields at uiuc.edu  Thu Feb 15 16:50:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:50:54 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
	<8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
Message-ID: <C53B465C-8BBA-4DE7-92BC-FFC5DDBEB4AA@uiuc.edu>


On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote:
...

>>
>> I don't have a problem with adding it back, esp. if tests are added.
>> Everything in Bio::Root* not tied to a module was yanked out when no
>> one spoke up about cleaning up Bio::Root* modules:
>>
>> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
>> focus=12839
>>
>> Maybe others disagree?
>>
>> chris
>>
>
> Sorry I missed out on that thread. I had some trouble with my  
> bioperl-l
> email delivery getting disabled due to excessive bounces, and it  
> took me a
> while to catch it.
>
> Bio::Root::Utilities is quite a grab bag of miscellaneous general  
> functions
> that are occasionally useful for perl scripting (e.g., determining
> end-of-line characters, sending email, etc.). The code could  
> definitely use
> a review, and maybe an example script to advertise it. I can look  
> into this,
> and suggestions are welcome.
>
> Steve

Steve,

I have added Root::Utilities back to CVS but I didn't know if I  
should add back the other related Root modules (didn't know what your  
future plans were for them).  Could the Bio::Root::Global and  
Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or  
would that be too problematic?  None of the other Bio* modules  
currently use them.

Personally, I use Date::Manip for anything that requires date/time  
manipulation (updating seq records based on dates, for instance).   
Some of the other utilities could come in handy, though.  Don't know  
if that helps...

chris


From cjfields at uiuc.edu  Thu Feb 15 16:51:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:51:58 -0600
Subject: [Bioperl-l] XEMBL deprecation
Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>

I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService  
both for deprecation in the wiki and in CVS (though I haven't set any  
timeline):

http://www.bioperl.org/wiki/Deprecated_modules

The XEMBL web services are no longer available, and it looks like  
everything is running through DBFetch now.  The XEMBL tests are  
skipped if no server is detected, so they shouldn't cause any  
problems with Bioperl installations.

Lincoln, was there anything to salvage from these?  I noticed they  
used SOAP::Lite, so maybe we could convert these over to a SOAP-based  
interface to DBFetch web services?

chris


From johnsonm at gmail.com  Thu Feb 15 17:29:37 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 16:29:37 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Glimmer?
Message-ID: <ebf5eb170702151429w233ec66dkfb89743a4b8e687e@mail.gmail.com>

    Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3
output, I suppose I might as well go and write Bio::Tools::Run::Glimmer.  I
suspect another 4-in-1 module may be possible.  Now that I think about it,
I'll need one for GeneMark, too.
    Comments?  Suggestions on a good module to use as a template?


From hlapp at gmx.net  Thu Feb 15 20:18:56 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:18:56 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>


On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:

> The XEMBL web services are no longer available

What happens if someone invokes the module? Should it maybe return  
nothing and warn()? I don't think it's a good idea if the module just  
silently does not function because its backend is no more.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Feb 15 20:48:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:48:12 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>

On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:

> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>
>> The XEMBL web services are no longer available
>
> What happens if someone invokes the module? Should it maybe return  
> nothing and warn()? I don't think it's a good idea if the module  
> just silently does not function because its backend is no more.
>
> 	-hilmar

Yes, I thought the same.  I have added a warn() noting the  
deprecation to the XEMBL constructor and removed XEMBL tests from  
CVS.  The modules are still there for the time being.

I actually worry more about the internals; it would be a shame to  
toss them altogether.  Would it be worth it to shift this towards a  
SOAP-based interface to DBFetch?  Or, more precisely, how much  
trouble would it be to do so?

chris


From hlapp at gmx.net  Thu Feb 15 20:54:29 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:54:29 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
Message-ID: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>

Well, if dbFetch dosn't have a SOAP based interface, how would you  
want to do this?

	-hilmar

On Feb 15, 2007, at 8:48 PM, Chris Fields wrote:

> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:
>
>> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>>
>>> The XEMBL web services are no longer available
>>
>> What happens if someone invokes the module? Should it maybe return  
>> nothing and warn()? I don't think it's a good idea if the module  
>> just silently does not function because its backend is no more.
>>
>> 	-hilmar
>
> Yes, I thought the same.  I have added a warn() noting the  
> deprecation to the XEMBL constructor and removed XEMBL tests from  
> CVS.  The modules are still there for the time being.
>
> I actually worry more about the internals; it would be a shame to  
> toss them altogether.  Would it be worth it to shift this towards a  
> SOAP-based interface to DBFetch?  Or, more precisely, how much  
> trouble would it be to do so?
>
> chris

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Feb 15 20:59:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:59:46 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
	<FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu>


On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote:

> Well, if dbFetch dosn't have a SOAP based interface, how would you  
> want to do this?
>
> 	-hilmar

DBfetch has a SOAP-based interface:

http://www.ebi.ac.uk/Tools/webservices/services/dbfetch

Just not sure how easy it would be to switch XEMBL code over to using  
it.  We already have Bio::DB::DBFetch so it may be redundant, but I  
don't recall any other SOAP-based tools in BioPerl beyond some stuff  
in bioperl-run (and I'm not sure how up-to-date the DBFetch module is).

chris


From jimhu at tamu.edu  Fri Feb 16 00:20:09 2007
From: jimhu at tamu.edu (Jim Hu)
Date: Thu, 15 Feb 2007 23:20:09 -0600
Subject: [Bioperl-l] Pathway tools output parser
In-Reply-To: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
References: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu>

Hi Chris,

I need to check the list more often!  I never got an answer here, but  
Eric Just pointed out a perl api at TAIR that's linked from the  
BioCyc site.  I've used the lisp parser functions from that to move  
the data to a perl array of arrays, and I'm working on creating  
object classes for BioCyc objects, starting with genes and products.

I need to look at the appropriate ways to link this up to the  
existing codebase for interconverting to Chado and other BioPerl data  
types.

Jim
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote:

>
> Hi Jim
>
> Did you ever get an answer to this? I'm interested in storing  
> pathway data
> in Chado & I remember enough lisp to get it into something perl- 
> manageable
> like XML
>
> On Thu, 25 Jan 2007, Jim Hu wrote:
>
>> Is there a module to parse the lisp object files from Peter Karp's
>> Pathway Tools?   I need a parser to convert the gene and protein
>> objects in EcoCyc releases into something that can be imported into
>> Chado.
>> =====================================
>> Jim Hu
>> Associate Professor
>> Dept. of Biochemistry and Biophysics
>> 2128 TAMU
>> Texas A&M Univ.
>> College Station, TX 77843-2128
>> 979-862-4054
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From lstein at cshl.edu  Fri Feb 16 08:35:19 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:35:19 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D1E2A5.6060104@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>

Hi,

Older versions of Storable can't deal with features that contain subroutine
refs. You should get the current version from CPAN. Note that there is a
slight security problem here if you don't trust the objects stored in the
database. If they contain code refs, the code will be evaluated during
deserialization.

Lincoln

On 2/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database
> and wanted to associated some basic information with them, like exon
> positions. I thought of creating Bio::SeqFeature::Gene::Transcript
> objects and storing them so I could later use features() to see what
> other features overlapped exons. I ran into a fatal error that can be
> replicated with the following simplified one-liner:
>
> perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e
> '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn =>
> "dbi:mysql:test"); $trans =
> Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id
> => "test"); $db->store($trans); @trans = $db->features(-seqid => $id,
> -type => "transcript"); print "@trans\n";'
>
> code sub {
>      package Bio::SeqFeature::Generic;
>      use strict 'refs';
>      my $self = shift @_;
>      foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
>          $f = undef;
>      }
>      $$self{'_gsf_seq'} = undef;
>      foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
>          $$self{'_gsf_tag_hash'}{$t} = undef;
>          delete $$self{'_gsf_tag_hash'}{$t};
>      }
> } did not evaluate to a subroutine reference, at
> /.../Bio/DB/SeqFeature/Store.pm line 2280
>
>
> Is this a bug? Or am I taking the wrong approach?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:47:29 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:47:29 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com>

Hi Sendu,

I'll do a little digging and let you know.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:52:30 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:52:30 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>

It looks like 2.05 or higher is the Storable version to use. It requires
B::Deparse, which is (I think) standard on perl 5.6 or higher.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:55:06 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:06 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>

I like the idea of converting these over to use DBFetch's SOAP services. On
the other hand, it isn't llikely that I'm going to have time to do this
anytime soon.

Probably the best thing to do is to issue a warning and return undef if
someone tries to use othe XEMBL module. I'll make that change.

Lincoln

On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:55:47 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:47 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>

Oh, looks like someone has inserted the warnings already. Good.

Lincoln

On 2/16/07, Lincoln Stein <lstein at cshl.edu> wrote:
>
> I like the idea of converting these over to use DBFetch's SOAP services.
> On the other hand, it isn't llikely that I'm going to have time to do this
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return undef if
> someone tries to use othe XEMBL module. I'll make that change.
>
> Lincoln
>
> On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> > both for deprecation in the wiki and in CVS (though I haven't set any
> > timeline):
> >
> > http://www.bioperl.org/wiki/Deprecated_modules
> >
> > The XEMBL web services are no longer available, and it looks like
> > everything is running through DBFetch now.  The XEMBL tests are
> > skipped if no server is detected, so they shouldn't cause any
> > problems with Bioperl installations.
> >
> > Lincoln, was there anything to salvage from these?  I noticed they
> > used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> > interface to DBFetch web services?
> >
> > chris
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Fri Feb 16 08:56:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:56:50 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>	
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>	
	<45D5B42A.1080303@sendu.me.uk>
	<6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
Message-ID: <45D5B822.6080908@sendu.me.uk>

Lincoln Stein wrote:
> It looks like 2.05 or higher is the Storable version to use. It requires 
> B::Deparse, which is (I think) standard on perl 5.6 or higher.

Thanks, now recommended in Build.PL


From cjfields at uiuc.edu  Fri Feb 16 09:05:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Feb 2007 08:05:08 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
	<6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
Message-ID: <ACAF9E26-CBDD-43AC-8D3E-0CADFF5B9576@uiuc.edu>

I added the warning yesterday.

We can add something to the project priority list on modifying XEMBL  
to use DBFetch instead; I like the SOAP-based interface.  I am  
thinking of a similar interface for NCBI eutils but I haven't had  
time to work on it.

chris

On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote:

> Oh, looks like someone has inserted the warnings already. Good.
>
> Lincoln
>
> On 2/16/07, Lincoln Stein <lstein at cshl.edu > wrote:I like the idea  
> of converting these over to use DBFetch's SOAP services. On the  
> other hand, it isn't llikely that I'm going to have time to do this  
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return  
> undef if someone tries to use othe XEMBL module. I'll make that  
> change.
>
> Lincoln
>
>
> On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone  
> ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Feb 16 08:39:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:39:54 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
Message-ID: <45D5B42A.1080303@sendu.me.uk>

Lincoln Stein wrote:
> Hi,
> 
> Older versions of Storable can't deal with features that contain 
> subroutine refs. You should get the current version from CPAN.

Do you have any idea which version of Storable first supported this? I 
can specify that version in Bioperl's Build.PL.

(else I just just specify the latest version)


From eu at otelo-online.de  Sat Feb 17 07:55:08 2007
From: eu at otelo-online.de (eu at otelo-online.de)
Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET)
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18>

Hello @all,

i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.

Can somebody help me? I dont know  whether it is  possible?
Because i need for each amino acid a positive, negative charge and unchargedly.

thx
 

Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig
und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer,
nur  44,85 ?  inkl. DSL- und ISDN-Grundgeb?hr!
http://www.arcor.de/rd/emf-dsl-2


From The_Polymorph at rocketmail.com  Sun Feb 18 14:08:34 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST)
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
Message-ID: <148421.50501.qm@web50801.mail.yahoo.com>

Hi.

In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
1.5.2_100, I noticed the ppm was not found on the activestate
repositories. 

Thanks,

~Caitlin


____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 


From bix at sendu.me.uk  Sun Feb 18 15:36:03 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 18 Feb 2007 20:36:03 +0000
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com>
References: <148421.50501.qm@web50801.mail.yahoo.com>
Message-ID: <45D8B8B3.4000408@sendu.me.uk>

Caitlin wrote:
> Hi.
> 
> In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
> 1.5.2_100, I noticed the ppm was not found on the activestate
> repositories. 

Follow the install instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Its not in the normal activestate repository, but on bioperl.org.


From t.nugent at cs.ucl.ac.uk  Mon Feb 19 12:29:48 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 19 Feb 2007 17:29:48 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk>

Hi everyone,

I've written a perl module to display transmembrane protein topology 
using GD. There are various options, including labels, helix/loop 
dimensions, colour schemes etc but it only requires a string or array 
containing the protein topology (e.g. transmembrane helix start/stop 
points). It produces output like this:

http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png

using the code at the bottom.

Here is a the module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm

I've never submitted anything to Bioperl before - is this sort of thing 
likely to be of use to others? I imagine it would sit alongside some of 
the Bio::Graphics stuff.

Best wishes,

Tim

#!/usr/bin/perl

use strict;
use warnings;
use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
use DrawTransmembrane;

my @topology = (20,45,59,70,86,109,145,168,194,220);

my %labels = ('5' => '5 - Sulphation Site',
               '21' => '1st Helix',
               '47' => '40 - Mutation',
               '60' => 'Voltage Sensor',
               '72' => '72 - Mutation 2',
               '73' => '73 - Mutation 3',
               '138' => '138 - Glycosylation Site',
               '170' => '170 - Phosphorylation Site',
               '200' => 'Last Helix');

my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
cartoon displaying transmembrane helices.',
                                                -topology => \@topology,
                                                -n_terminal => 'out',
                                                -helix_width => 48,
                                                -helix_height => 125,
                                                -short_loop_limit => 10,
                                                -long_loop_limit => 35,
                                                -loop_width => 25,
                                                -colour_scheme => 'yellow',
                                                -labels => \%labels,
                                                -text_offset => -10);

## print the .png file
my $output = 'test.png';
open(OUTPUT, ">$output");
binmode OUTPUT;
print OUTPUT $im->png;
close OUTPUT;

my $system = `display $output`;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From bix at sendu.me.uk  Mon Feb 19 12:42:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 19 Feb 2007 17:42:23 +0000
Subject: [Bioperl-l] t/FeatureHolder.x
Message-ID: <45D9E17F.4030302@sendu.me.uk>

Is this supposed to work? It doesn't get run in the test suite normally 
because of its name.

With a live checkout I get:
./Build test --test_files t/FeatureHolder.x --verbose
t/FeatureHolder....1..6
ok 1
ok 2
Set group tag to: locus_tag
GROUPS:
   GROUP [?]:source

[snip]

   resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) 
Bio::SeqFeature::Generic=HASH(0x1362830)
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [BG:DS07721.3]:gene mRNA CDS
UNFLATTENING GROUP:
   GROUP [BG:DS07721.6]:gene mRNA CDS

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: DUPLICATE ID: AAF53399.1
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359
STACK: 
Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175
STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245
STACK: t/FeatureHolder.x:68
-----------------------------------------------------------
dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay
Failed Test       Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/FeatureHolder.x  255 65280     6    8  3-6
Failed 1/1 test scripts. 4/6 subtests failed.
Files=1, Tests=6,  1 wallclock secs ( 0.55 cusr +  0.04 csys =  0.59 CPU)
Failed 1/1 test programs. 4/6 subtests failed.


It also fails quite differently with 1.5.2.


From cjfields at uiuc.edu  Mon Feb 19 15:04:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 14:04:20 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <45D9E17F.4030302@sendu.me.uk>
References: <45D9E17F.4030302@sendu.me.uk>
Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>

Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know  
if he's stalking the mail list.

Wonder if this has anything to do the feature/annotation changes  
around rel 1.5.

(the other) chris

On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:

> Is this supposed to work? It doesn't get run in the test suite  
> normally
> because of its name.
>
> With a live checkout I get:
> ./Build test --test_files t/FeatureHolder.x --verbose
> t/FeatureHolder....1..6
...


From cjfields at uiuc.edu  Mon Feb 19 16:24:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 15:24:04 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>

I think this is pretty nice!  We can add the code and test script to  
bugzilla and (if someone has time) try to see where it might fit in,  
though Bio::Graphics sounds like a good spot.

Anyone else have ideas on where this could go?

chris

On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've written a perl module to display transmembrane protein topology
> using GD. There are various options, including labels, helix/loop
> dimensions, colour schemes etc but it only requires a string or array
> containing the protein topology (e.g. transmembrane helix start/stop
> points). It produces output like this:
>
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>
> using the code at the bottom.
>
> Here is a the module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>
> I've never submitted anything to Bioperl before - is this sort of  
> thing
> likely to be of use to others? I imagine it would sit alongside  
> some of
> the Bio::Graphics stuff.
>
> Best wishes,
>
> Tim
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
> use DrawTransmembrane;
>
> my @topology = (20,45,59,70,86,109,145,168,194,220);
>
> my %labels = ('5' => '5 - Sulphation Site',
>                '21' => '1st Helix',
>                '47' => '40 - Mutation',
>                '60' => 'Voltage Sensor',
>                '72' => '72 - Mutation 2',
>                '73' => '73 - Mutation 3',
>                '138' => '138 - Glycosylation Site',
>                '170' => '170 - Phosphorylation Site',
>                '200' => 'Last Helix');
>
> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
> cartoon displaying transmembrane helices.',
>                                                 -topology =>  
> \@topology,
>                                                 -n_terminal => 'out',
>                                                 -helix_width => 48,
>                                                 -helix_height => 125,
>                                                 -short_loop_limit  
> => 10,
>                                                 -long_loop_limit =>  
> 35,
>                                                 -loop_width => 25,
>                                                 -colour_scheme =>  
> 'yellow',
>                                                 -labels => \%labels,
>                                                 -text_offset => -10);
>
> ## print the .png file
> my $output = 'test.png';
> open(OUTPUT, ">$output");
> binmode OUTPUT;
> print OUTPUT $im->png;
> close OUTPUT;
>
> my $system = `display $output`;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Feb 19 17:23:56 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 19 Feb 2007 14:23:56 -0800
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
Message-ID: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>


On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:

> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
> if he's stalking the mail list.

occasionally..

> Wonder if this has anything to do the feature/annotation changes
> around rel 1.5.

possibly even before then.

there was a reason for the .x prefix... I think it was intended to  
denote requirements; tests that don't pass yet but should in the future

anyway, this file can go

> (the other) chris
>
> On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:
>
>> Is this supposed to work? It doesn't get run in the test suite
>> normally
>> because of its name.
>>
>> With a live checkout I get:
>> ./Build test --test_files t/FeatureHolder.x --verbose
>> t/FeatureHolder....1..6
> ...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From torsten.seemann at infotech.monash.edu.au  Mon Feb 19 18:20:48 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Feb 2007 10:20:48 +1100
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18>
References: <29037001.1171716908969.JavaMail.ngmail@webmail18>
Message-ID: <a79f6a4b0702191520l55625d6dif027df04b9841587@mail.gmail.com>

> i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
> OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.
> Can somebody help me? I dont know  whether it is  possible?
> Because i need for each amino acid a positive, negative charge and unchargedly.

The latest released Bioperl 1.5.x has a charge() function which does
what you want:

http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html

It returns A, N, C for the charges.

--Torsten


From bix at sendu.me.uk  Tue Feb 20 06:18:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Feb 2007 11:18:14 +0000
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
Message-ID: <45DAD8F6.1030409@sendu.me.uk>

Bio::Graphics::FeatureBase::seq_id is currently implemented as a 
read-only alias to ref():
sub seq_id          { shift->ref()         }


What is the reasoning behind this? Can it be made to handle setting of 
the value as well?:
sub seq_id          { shift->ref(@_)       }


Cheers,
Sendu.


From cjfields at uiuc.edu  Tue Feb 20 08:39:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:39:11 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
	<F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu>


On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote:

> On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:
>
>> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
>> if he's stalking the mail list.
>
> occasionally..
>
>> Wonder if this has anything to do the feature/annotation changes
>> around rel 1.5.
>
> possibly even before then.
>
> there was a reason for the .x prefix... I think it was intended to
> denote requirements; tests that don't pass yet but should in the  
> future
>
> anyway, this file can go

Chris,

I removed it from CVS.  Thanks!

(the other) chris besides chris D.

P.S. I may have some Data::Stag questions for you at some point.  I'm  
guessing you're still at fruitfly.org?


From cjfields at uiuc.edu  Tue Feb 20 08:29:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:29:20 -0600
Subject: [Bioperl-l] Fwd: help on remote blast
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu>

Sanjib,

You shouldn't email the developers directly.  Questions like this  
should go to the bioperl mail list in case I (or others) can't answer  
them immediately.

chris

Begin forwarded message:

> From: "Sanjib Kumar Gupta" <sanjib at bic.boseinst.ernet.in>
> Date: February 20, 2007 1:32:00 AM CST
> To: cjfields at uiuc.edu
> Subject: help on remote blast
>
> Dear Dr. Chris
> I am very new usedr to bioperl. and have been using the script for
> retrieving some blast sequences . But suddenly it has stopped  
> retrieving
> #perl n9.pl
> te.pep
> waiting........
> for a long time
>
> I am attaching the file. Can you please tell me what I should do so  
> that it
> again runs.
>
>
> --
> Sanjib Kumar Gupta
> Bioinformatics Centre
> Bose Institute
> Kolkata 700054, INDIA
> Phone  : +91-33-2355 6626, 2816, 2355 4766
> Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070220/02f96eab/attachment-0002.pl>
-------------- next part --------------

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 09:31:20 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 14:31:20 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
Message-ID: <45DB0638.1030001@cs.ucl.ac.uk>

Thanks Chris, glad it's appreciated.

Is there anything else I can do? If anyone has any requests/suggestions 
please let me know too.

Best wishes,

Tim

Chris Fields wrote:
> I think this is pretty nice!  We can add the code and test script to  
> bugzilla and (if someone has time) try to see where it might fit in,  
> though Bio::Graphics sounds like a good spot.
> 
> Anyone else have ideas on where this could go?
> 
> chris
> 
> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
> 
>> Hi everyone,
>>
>> I've written a perl module to display transmembrane protein topology
>> using GD. There are various options, including labels, helix/loop
>> dimensions, colour schemes etc but it only requires a string or array
>> containing the protein topology (e.g. transmembrane helix start/stop
>> points). It produces output like this:
>>
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>
>> using the code at the bottom.
>>
>> Here is a the module:
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>
>> I've never submitted anything to Bioperl before - is this sort of  
>> thing
>> likely to be of use to others? I imagine it would sit alongside  
>> some of
>> the Bio::Graphics stuff.
>>
>> Best wishes,
>>
>> Tim
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
>> use DrawTransmembrane;
>>
>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>
>> my %labels = ('5' => '5 - Sulphation Site',
>>                '21' => '1st Helix',
>>                '47' => '40 - Mutation',
>>                '60' => 'Voltage Sensor',
>>                '72' => '72 - Mutation 2',
>>                '73' => '73 - Mutation 3',
>>                '138' => '138 - Glycosylation Site',
>>                '170' => '170 - Phosphorylation Site',
>>                '200' => 'Last Helix');
>>
>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>> cartoon displaying transmembrane helices.',
>>                                                 -topology =>  
>> \@topology,
>>                                                 -n_terminal => 'out',
>>                                                 -helix_width => 48,
>>                                                 -helix_height => 125,
>>                                                 -short_loop_limit  
>> => 10,
>>                                                 -long_loop_limit =>  
>> 35,
>>                                                 -loop_width => 25,
>>                                                 -colour_scheme =>  
>> 'yellow',
>>                                                 -labels => \%labels,
>>                                                 -text_offset => -10);
>>
>> ## print the .png file
>> my $output = 'test.png';
>> open(OUTPUT, ">$output");
>> binmode OUTPUT;
>> print OUTPUT $im->png;
>> close OUTPUT;
>>
>> my $system = `display $output`;
>>
>> -- 
>> Tim Nugent (MRes)
>> Research Student
>> Bioinformatics Unit
>> Department of Computer Science
>> University College London
>> Gower Street
>> London WC1E 6BT
>> Tel: 020-7679-0410
>> t.nugent at ucl.ac.uk
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From marian.thieme at lycos.de  Tue Feb 20 08:34:24 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Tue, 20 Feb 2007 13:34:24 +0000
Subject: [Bioperl-l] Alignment
Message-ID: <188661178021328@lycos-europe.com>

Hi all,

perhaps somebody can give some comments in the following matter:

I have a series of sequences which should be aligned against a reference sequence.
In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.

Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
If yes how I have to understand the example in the doc:
use Bio::LocatableSeq;
my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);

Does the "-" sign represents a gap ? When this sequence starts at position 1
why it ends at position 7, because when considering the gap, there are 8 positions.
Does the SimpleAlign object can treat the gap ?


Thanks for your attention,
Marian

Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe

From cjfields at uiuc.edu  Tue Feb 20 09:40:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 08:40:38 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <E1D718F1-E0FA-496B-9798-7EC84E2D4439@uiuc.edu>

You can add the module and test code (the script) to bugzilla:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

Basically file a new bug report but note that it in an enhancement  
request when filling it out.  Attach the code and test script to the  
report after it is generated (note that it may be easier to add all  
of the files together as a zipped archive).  I think you could also  
add the graphical output as a binary file if they are huge files.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions please let me know too.
>
> Best wishes,
>
> Tim
>
> Chris Fields wrote:
>> I think this is pretty nice!  We can add the code and test script  
>> to  bugzilla and (if someone has time) try to see where it might  
>> fit in,  though Bio::Graphics sounds like a good spot.
>> Anyone else have ideas on where this could go?
>> chris
>> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
>>> Hi everyone,
>>>
>>> I've written a perl module to display transmembrane protein topology
>>> using GD. There are various options, including labels, helix/loop
>>> dimensions, colour schemes etc but it only requires a string or  
>>> array
>>> containing the protein topology (e.g. transmembrane helix start/stop
>>> points). It produces output like this:
>>>
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>>
>>> using the code at the bottom.
>>>
>>> Here is a the module:
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>>
>>> I've never submitted anything to Bioperl before - is this sort  
>>> of  thing
>>> likely to be of use to others? I imagine it would sit alongside   
>>> some of
>>> the Bio::Graphics stuff.
>>>
>>> Best wishes,
>>>
>>> Tim
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to  
>>> module
>>> use DrawTransmembrane;
>>>
>>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>>
>>> my %labels = ('5' => '5 - Sulphation Site',
>>>                '21' => '1st Helix',
>>>                '47' => '40 - Mutation',
>>>                '60' => 'Voltage Sensor',
>>>                '72' => '72 - Mutation 2',
>>>                '73' => '73 - Mutation 3',
>>>                '138' => '138 - Glycosylation Site',
>>>                '170' => '170 - Phosphorylation Site',
>>>                '200' => 'Last Helix');
>>>
>>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>>> cartoon displaying transmembrane helices.',
>>>                                                 -topology =>   
>>> \@topology,
>>>                                                 -n_terminal =>  
>>> 'out',
>>>                                                 -helix_width => 48,
>>>                                                 -helix_height =>  
>>> 125,
>>>                                                 - 
>>> short_loop_limit  => 10,
>>>                                                 -long_loop_limit  
>>> =>  35,
>>>                                                 -loop_width => 25,
>>>                                                 -colour_scheme  
>>> =>  'yellow',
>>>                                                 -labels => \%labels,
>>>                                                 -text_offset =>  
>>> -10);
>>>
>>> ## print the .png file
>>> my $output = 'test.png';
>>> open(OUTPUT, ">$output");
>>> binmode OUTPUT;
>>> print OUTPUT $im->png;
>>> close OUTPUT;
>>>
>>> my $system = `display $output`;
>>>
>>> -- 
>>> Tim Nugent (MRes)
>>> Research Student
>>> Bioinformatics Unit
>>> Department of Computer Science
>>> University College London
>>> Gower Street
>>> London WC1E 6BT
>>> Tel: 020-7679-0410
>>> t.nugent at ucl.ac.uk
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Feb 20 10:30:17 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 20 Feb 2007 15:30:17 +0000
Subject: [Bioperl-l] Alignment
In-Reply-To: <188661178021328@lycos-europe.com>
References: <188661178021328@lycos-europe.com>
Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>

I think the SimpleAlign object contains a set of sequences, each of
which is a LocatableSeq object.

These LocatableSeq objects will have gaps, represented by '-' or
whatever other symbol is specified (I think there are methods for it),
and then one can use methods like column_from_residue_number to map
the coordinates between the primary sequence and the aligned sequence.
The perldoc for LocatableSeq has some examples on how to use these
methods.

[Hopefully I haven't written any lie in this message],

Cheers,

    Albert.

On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
> Hi all,
>
> perhaps somebody can give some comments in the following matter:
>
> I have a series of sequences which should be aligned against a reference sequence.
> In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
> The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.
>
> Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
> If yes how I have to understand the example in the doc:
> use Bio::LocatableSeq;
> my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);
>
> Does the "-" sign represents a gap ? When this sequence starts at position 1
> why it ends at position 7, because when considering the gap, there are 8 positions.
> Does the SimpleAlign object can treat the gap ?
>
>
> Thanks for your attention,
> Marian
>
> Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Feb 20 10:30:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:30:15 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>

Sorry, I sent that last one off prematurely.

I could see this being used as a very useful utility if a Bioperl  
object had SeqFeatures which described transmembrane regions, or if  
output from something like TMHMM were parsed and used for input.   
Don't know if it's included, but if not you probably should allow  
labeling of the intracellular/extracellular space to designate  
periplasmic space, mitochondrial matrix, thylakoid, etc.

I think Bio::Graphics namespace is definitely the place to go.  If I  
ever get around to writing up the RNA structural stuff I may put  
something there myself.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions
> please let me know too.
>
> Best wishes,
>
> Tim


From cjfields at uiuc.edu  Tue Feb 20 10:49:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:49:56 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu>


On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:

> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.
>
> These LocatableSeq objects will have gaps, represented by '-' or
> whatever other symbol is specified (I think there are methods for it),
> and then one can use methods like column_from_residue_number to map
> the coordinates between the primary sequence and the aligned sequence.
> The perldoc for LocatableSeq has some examples on how to use these
> methods.
>
> [Hopefully I haven't written any lie in this message],
>
> Cheers,
>
>     Albert.

No lies.  The comparison methods are in SimpleAlign; if you look in  
SimpleAlign.t you'll see several demos on how to go abouot adding  
LocatableSeqs to a SimpleAlign object and then use SimpleAlign  
methods for them.

chris

PS (to marian): I'm a bit behind this week, so the bracket_strings  
stuff is lagging behind; I'm writing up some stuff on a deadline.


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 10:50:10 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 15:50:10 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk>

Labeling of inside/outside and membrane is already possible via 
-inside_label, -outside_label and -membrane_label tags, defaults are 
intracellular, extracellular and plasma membrane.

Was definitely going to add an input/parser for MEMSAT, developed here 
at UCL, and probably a few other popular TM predictors too, e.g. 
PHOBIUS, TMHMM etc. Can already accept topology in the string format 
used by OPM (http://opm.phar.umich.edu/).

Tim


Chris Fields wrote:
> Sorry, I sent that last one off prematurely.
> 
> I could see this being used as a very useful utility if a Bioperl object 
> had SeqFeatures which described transmembrane regions, or if output from 
> something like TMHMM were parsed and used for input.  Don't know if it's 
> included, but if not you probably should allow labeling of the 
> intracellular/extracellular space to designate periplasmic space, 
> mitochondrial matrix, thylakoid, etc.
> 
> I think Bio::Graphics namespace is definitely the place to go.  If I 
> ever get around to writing up the RNA structural stuff I may put 
> something there myself.
> 
> chris
> 
> On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:
> 
>> Thanks Chris, glad it's appreciated.
>>
>> Is there anything else I can do? If anyone has any requests/suggestions
>> please let me know too.
>>
>> Best wishes,
>>
>> Tim
> 
> 

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From cjfields at uiuc.edu  Tue Feb 20 11:09:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 10:09:00 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
	<45DB18B2.8070004@cs.ucl.ac.uk>
Message-ID: <FF7B4076-FA5A-4F44-ADE7-A44D2FCF4599@uiuc.edu>


On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote:

> Labeling of inside/outside and membrane is already possible via - 
> inside_label, -outside_label and -membrane_label tags, defaults are  
> intracellular, extracellular and plasma membrane.
>
> Was definitely going to add an input/parser for MEMSAT, developed  
> here at UCL, and probably a few other popular TM predictors too,  
> e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string  
> format used by OPM (http://opm.phar.umich.edu/).
>
> Tim

I'll definitely have to take a closer look at it when I have time.   
My guess is the best fit for data would be a seqfeatures, either in a  
collection or a Bio::Seq.  As for the parsers you can look at the  
Bio::Tools::Tmhmm module, which scans Tmhmm output and converts  
everything to seqfeatures.

chris


From lstein at cshl.edu  Tue Feb 20 12:25:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 20 Feb 2007 12:25:24 -0500
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
In-Reply-To: <45DAD8F6.1030409@sendu.me.uk>
References: <45DAD8F6.1030409@sendu.me.uk>
Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com>

Just an oversight. I'll fix it.

Lincoln

On 2/20/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Bio::Graphics::FeatureBase::seq_id is currently implemented as a
> read-only alias to ref():
> sub seq_id          { shift->ref()         }
>
>
> What is the reasoning behind this? Can it be made to handle setting of
> the value as well?:
> sub seq_id          { shift->ref(@_)       }
>
>
> Cheers,
> Sendu.
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From khan at cshl.edu  Tue Feb 20 15:42:12 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Tue, 20 Feb 2007 15:42:12 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>

Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


From michael.watson at bbsrc.ac.uk  Tue Feb 20 16:33:19 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 20 Feb 2007 21:33:19 -0000
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
References: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk>

Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Wed Feb 21 03:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Wed Feb 21 03:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive


From sdavis2 at mail.nih.gov  Wed Feb 21 06:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1


From sdavis2 at mail.nih.gov  Wed Feb 21 06:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1


From cjfields at uiuc.edu  Wed Feb 21 07:08:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 06:08:57 -0600
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu>


On Feb 21, 2007, at 5:17 AM, Sean Davis wrote:

> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
>> Hi All,
>>
>> I downloaded module
>> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
>> And I am trying to install it when I got the following error. Can  
>> someone
>> please guide me.
>
> You will probably need to read the INSTALL document.  You need to  
> install a
> couple of libraries first.  Looks like you don't have the staden io- 
> lib
> installed.

Just to note, this module isn't part of BioPerl (I don't even think  
it has a Bioperl interface).  You'll probably need to contact Lincoln  
for details on using this module.

One thing you may run into is errors with the version of io_lib  
installed (a problem I've encountered with bioperl-ext), probably  
from API changes.  If you run into problems with newer versions of  
io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12.


From neetisomaiya at gmail.com  Wed Feb 21 07:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Wed Feb 21 07:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive


From jay at jays.net  Tue Feb 20 19:27:01 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 20 Feb 2007 18:27:01 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>

> On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
>> I have a series of sequences which should be aligned against a 
>> reference sequence.
>> In this special case we dont need to calculate anything, we only need 
>> to represent the sequences and get for instance some columns of 
>> interest.
>> The problem now is, that some sequences have gaps and we need to 
>> represent gaps in the rewference sequence as well as in some 
>> individual sequences.

On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.

Fascinating. In my BLAST-centric universe I went and rolled my own 
solution for SeqLab where I hold onto the Bio::Seq from the reference 
sequences and then hold onto the Bio::Search::HSP::GenericHSP objects 
for all my BLAST hits. From that dataset I can write whatever reports I 
want and/or perform any subsequent actions. I wonder if I should have 
done that differently...

What typically creates .pfam files?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From cjfields at uiuc.edu  Wed Feb 21 08:36:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 07:36:02 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
	<cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu>


On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote:
...
>
> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
>> I think the SimpleAlign object contains a set of sequences, each of
>> which is a LocatableSeq object.
>
> Fascinating. In my BLAST-centric universe I went and rolled my own
> solution for SeqLab where I hold onto the Bio::Seq from the reference
> sequences and then hold onto the Bio::Search::HSP::GenericHSP objects
> for all my BLAST hits. From that dataset I can write whatever  
> reports I
> want and/or perform any subsequent actions. I wonder if I should have
> done that differently...
>
> What typically creates .pfam files?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah

Pfam alignments come in two formats (pfam and stockholm) that can  
both be parsed into SimpleAlign objects via Bio::AlignIO:

my $alnin = Bio::AlignIO->new(-format => 'stockholm',
                               -file => 'dho.sto');

while (my $aln = $alnin->next_aln) {
    # do stuff to $aln SimpleAlign
}

Personally I stick with Stockholm as it's a richer format (with  
annotations and so on), but the parser was rewritten recently (by  
moi!) so may have some bugs still.

I'm a bit confused as to what you do with BLAST files.  You can  
generate a SimpleAlign right from the HSP for most SearchIO parsers:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods

chris


From sanjib at bic.boseinst.ernet.in  Wed Feb 21 01:12:06 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Wed, 21 Feb 2007 11:42:06 +0530
Subject: [Bioperl-l] help on remote blast
In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in>
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors


waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
xx.pep
 
-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded
 
DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
 
<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>
 
---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.
--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070221/5a3382d6/attachment-0002.pl>

From granjeau at tagc.univ-mrs.fr  Wed Feb 21 08:50:39 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 21 Feb 2007 14:50:39 +0100
Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily
Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr>

Hello!

Not clear to me, but I find a work around by checking for empty list 
before adding, here is what I noticed. Adding as members an empty list 
() is not the same as adding a reference to an empty list [], of course, 
but could be thought to be the same. Calling get_members, for the second 
case, I got a list of 0 member, but in the first case I got of 1 member, 
which is not an object at all. I am warned now, but may be the 
documentation should emphasize on using by the reference call.

Best regards,
--Samuel


use Bio::Cluster::SequenceFamily;

$f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$f->add_members( () );
print scalar $f->get_members();
# 1
$g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$g->add_members( [] );
print scalar $g->get_members();
# 0


From stephen.marshall at novartis.com  Wed Feb 21 12:01:00 2007
From: stephen.marshall at novartis.com (stephen.marshall at novartis.com)
Date: Wed, 21 Feb 2007 12:01:00 -0500
Subject: [Bioperl-l] Parsing kegg files
Message-ID: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>

Hello
I"m trying to parse a Kegg file and I can't seem to get at the pathway 
information... Here's a snippet of my code. I only see dblink and 
description as annotation

use Bio::SeqIO;

my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');

while ( my $seq = $stream->next_seq() ) {
        # do something with $seq
        my $id = $seq->display_id();
        print "$id:";
        my $ann = $seq->annotation();
        foreach my $key ( $ann->get_all_annotation_keys() ) {
                my @values = $ann->get_Annotations($key);
                foreach my $value ( @values ) {
                        print "Annotation: ",$key," value: 
",$value->as_text,"\n";
                }
        }

}
_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From prateek.vit at gmail.com  Wed Feb 21 12:40:25 2007
From: prateek.vit at gmail.com (prateek singh yadav)
Date: Wed, 21 Feb 2007 23:10:25 +0530
Subject: [Bioperl-l] Problem in BioPerl Installation
Message-ID: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>

Hello all,

I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
shows this problem.


[root at HX342SBC054 Desktop]# cpan
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.7601)
ReadLine support available (try 'install Bundle::CPAN')

cpan> get bioperl
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
Warning: Found only 25 objects in /root/.cpan/Metadata
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Line-Count header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Last-Updated header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Going to read /root/.cpan/sources/modules/03modlist.data.gz
Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
 at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
        CPAN::Index::rd_modlist('CPAN::Index',
'/root/.cpan/sources/modules/03modlist.data.gz') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 3129
        CPAN::Index::reload('CPAN::Index') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 675
        CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
        CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2078
        CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2157
        CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 201
        eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
        CPAN::shell() called at /usr/bin/cpan line 193

cpan>

Can anyone give me direction  how to configure cpan again or how to install
BioPerl on linux with its complete dependencies. Because I think I have a
problem in CPAN configuration.

Regards,
Prateek

-- 
Prateek Singh
3rd year Bioinformatics(BTech)
Vellore Institute Of Technology
Vellore-632014
prateek.vit at gmail.com


From bosborne11 at verizon.net  Wed Feb 21 12:29:40 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 21 Feb 2007 12:29:40 -0500
Subject: [Bioperl-l] Parsing kegg files
In-Reply-To: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>
Message-ID: <C201EBB4.CEE7%bosborne11@verizon.net>

Stephen,

I don't know what your eventual goals are but you might want to take a look
at bioperl-network. However, there are problems with this package. One, it
only parses DIP tab-delimited and PSI-MI and it does this last one only
partially (you will get the graph though). Two, it seems to have only a
single developer interested in it, that's me, and few users. In my Bioperl
experience projects like this tend to fade away.

http://www.bioperl.org/wiki/Network_package


Brian O.


On 2/21/07 12:01 PM, "stephen.marshall at novartis.com"
<stephen.marshall at novartis.com> wrote:

> Hello
> I"m trying to parse a Kegg file and I can't seem to get at the pathway
> information... Here's a snippet of my code. I only see dblink and
> description as annotation
> 
> use Bio::SeqIO;
> 
> my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');
> 
> while ( my $seq = $stream->next_seq() ) {
>         # do something with $seq
>         my $id = $seq->display_id();
>         print "$id:";
>         my $ann = $seq->annotation();
>         foreach my $key ( $ann->get_all_annotation_keys() ) {
>                 my @values = $ann->get_Annotations($key);
>                 foreach my $value ( @values ) {
>                         print "Annotation: ",$key," value:
> ",$value->as_text,"\n";
>                 }
>         }
> 
> }
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivery of the
> message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, please
> notify the sender immediately by e-mail and delete the material from any
> computer.  Thank you.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Feb 21 13:18:37 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 21 Feb 2007 12:18:37 -0600
Subject: [Bioperl-l] Problem in BioPerl Installation
In-Reply-To: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
References: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx>

You can always rebuild your CPAN configuration by deleting the existing 
.cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke 
CPAN again from root's shell to rebuild the config:

# perl -MCPAN -e shell

Hope this helps.

Regards,
Mauricio.

prateek singh yadav wrote:
> Hello all,
> 
> I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
> shows this problem.
> 
> 
> [root at HX342SBC054 Desktop]# cpan
> Terminal does not support AddHistory.
> 
> cpan shell -- CPAN exploration and modules installation (v1.7601)
> ReadLine support available (try 'install Bundle::CPAN')
> 
> cpan> get bioperl
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
> Warning: Found only 25 objects in /root/.cpan/Metadata
> Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Line-Count header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Last-Updated header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
> Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
> you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
>  at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
>         CPAN::Index::rd_modlist('CPAN::Index',
> '/root/.cpan/sources/modules/03modlist.data.gz') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 3129
>         CPAN::Index::reload('CPAN::Index') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 675
>         CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
> called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
>         CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2078
>         CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2157
>         CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         CPAN::shell() called at /usr/bin/cpan line 193
> 
> cpan>
> 
> Can anyone give me direction  how to configure cpan again or how to install
> BioPerl on linux with its complete dependencies. Because I think I have a
> problem in CPAN configuration.
> 
> Regards,
> Prateek
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Wed Feb 21 13:33:17 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Feb 2007 13:33:17 -0500
Subject: [Bioperl-l] Adding empty member list in
	Bio::Cluster::SequenceFamily
In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr>
References: <45DC4E2F.4060804@tagc.univ-mrs.fr>
Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net>

Fixed in CVS HEAD. -hilmar

On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> Not clear to me, but I find a work around by checking for empty list
> before adding, here is what I noticed. Adding as members an empty list
> () is not the same as adding a reference to an empty list [], of  
> course,
> but could be thought to be the same. Calling get_members, for the  
> second
> case, I got a list of 0 member, but in the first case I got of 1  
> member,
> which is not an object at all. I am warned now, but may be the
> documentation should emphasize on using by the reference call.
>
> Best regards,
> --Samuel
>
>
> use Bio::Cluster::SequenceFamily;
>
> $f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $f->add_members( () );
> print scalar $f->get_members();
> # 1
> $g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $g->add_members( [] );
> print scalar $g->get_members();
> # 0
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Feb 21 14:12:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 13:12:57 -0600
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>

Dmitry,

I'm forwarding this to the mail list.  In the future please post/ 
respond to the regular mail list so other BioPerl developers/users  
can comment.  You'll get feedback much faster here (and maybe even  
some support!).

The issue at hand is whether we can support GenBank accessions/ 
display_id/version with your naming scheme.  My feeling is that  
support for nonalphanumerics was removed to be compliant with the  
GenBank standard for accessions, though I may be wrong.  Maybe  
someone who was around during bioperl 1.2 can elaborate more?

 From http://bugzilla.open-bio.org/show_bug.cgi?id=2214
--------------------------------------------------
....
Thanks for verbose explanation. It seems that I would need to apply
my local patches to the BioPerl module(s). With BioPerl-1.2 there was
no problem with '-' in sequence names.

The problem is that in the project we participate (Vizier project)  
following
sequence name convention was adopted:

VZ##<virus_ICTV>-(<GenBank LOCUS ID>or<strain designation>)-<$$>

VZ Stands for Vizier

## Your 2-digits Partner ID within the VIZIER consortium

<virus_ICTV> Virus name according to the ICTV nomenclature;

<GenBank LOCUS ID>,
<strain designation> If sequence has not been assigned a GenBank  
LOCUS ID,
available strain designation, short as possible, should be used

<$$> Unique 2-digits number on your discretion to label sequence variant
--------------------------------------------------

chris


From gabriel.cardona at uib.es  Thu Feb 22 04:33:14 2007
From: gabriel.cardona at uib.es (gcardona)
Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST)
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
Message-ID: <9096740.post@talk.nabble.com>


Hello,

I am trying to install Bioperl on a Windows system, following the
installation notes in 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
find the package and answers:
Downloading bioperl-1.5.2_100 ... not found

I've looked the contents of
http://bioperl.org/DIST
and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
folder the available version is bioperl-1.5.2_102
Is this a bug? or should I download and install manually?

Thank you in advance,

Gabriel Cardona
-- 
View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Thu Feb 22 07:35:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Feb 2007 12:35:14 +0000
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
In-Reply-To: <9096740.post@talk.nabble.com>
References: <9096740.post@talk.nabble.com>
Message-ID: <45DD8E02.1070404@sendu.me.uk>

gcardona wrote:
> Hello,
> 
> I am trying to install Bioperl on a Windows system, following the
> installation notes in 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
> find the package and answers:
> Downloading bioperl-1.5.2_100 ... not found
> 
> I've looked the contents of
> http://bioperl.org/DIST
> and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
> folder the available version is bioperl-1.5.2_102
> Is this a bug? or should I download and install manually?

Sorry, my mistake. I accidentally moved the ppm to a different folder. 
It should work now though.

I may make a 1.5.2_102 ppm at some point, but there are no relevant 
differences between _102 and _100 as far as Windows users are concerned.


From enrique_rulz at yahoo.com  Thu Feb 22 15:41:37 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
Message-ID: <9107936.post@talk.nabble.com>


Hi every1..
I m facing a great deal of problem in simple pattern matching between
sequence & a pattern ..Program shod be designed such a way that it shod be
able do two things 1) normal matching...For eg: GATCAAT....if TC is
entered... output shod be 2...2) matching using spl character..In same
example if C*T value is entered It shod give o/p as 3 & seq to b displayed
is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
problem..output I m gettin as 1 instead of 3...Code is really simple!

#!/usr/bin/perl
$alphabet = "GATCAAT";
$pattern=  "C*T ";

$alphabet =~ /($pattern)/i;

print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";

====================
OUTPUT!
The entire C*T match began at 1 and ended at 2
====================

but the o/p shod be 3????
& Is there n e chance I can get seq too..I mean instead of C*T'' i need
'CAAT'...????

Well..Its not compulsion to use regex....But I find it quite simple..can
there be n e other method??

Thanx in advance!
Kurt!    
 
-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Feb 22 16:01:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Feb 2007 15:01:03 -0600
Subject: [Bioperl-l] GenBank accession bug?
In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>
	<51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu>


On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:

>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
>
> Chris, I'm a little unsure of what you're saying here (which might  
> mean
> that you're already saying what I'm about to...say). Do you mean it  
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
>
> I would argue any arbitrary ID should be supported as long as that  
> ID is a
> contiguous non-space word (\S+).
>
> Actually the existing accession regex looks like it already  
> supports IDs
> with '-':
>
> /^ACCESSION\s+(\S.*\S)/
>
> It's only the version regex which doesn't (\w doesn't include '-'):
>
> /^\w+\.(\d+)/
>
>
> Anyone else have thoughts or comments on this? Off the top of my  
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
>
> Dave

You're right; the argument comes down simply to whether we would  
support \S+ or just \w+.  I'm neutral on this myself, but I wonder  
how allowing \S+ would affect other modules (for instance, indexing  
for a flat db), where one might just use \w+ for accessions,  
expecting them to be GenBank- or EMBL-like alphanumerics.  The fact  
that \S+ was supported in the past (as indicated in the bug report)  
and then wasn't post 1.2 makes me think there was a reason for  
someone going in and modifying it, but that was before my time on the  
group.

I'll have a look at the CVS history when I have time to see what I  
can dig up.

chris


From mkiwala at watson.wustl.edu  Thu Feb 22 15:36:33 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 22 Feb 2007 14:36:33 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
Message-ID: <45DDFED1.1090503@watson.wustl.edu>

Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?

I get the impression they are designed to do similar things.  If so is 
one deprecated and the other preferred?

If their responsibilities are orthogonal to each other, what sorts of 
tasks are suited to each?

Thanks,
Michael


From dmessina at wustl.edu  Thu Feb 22 15:53:01 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST)
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu>

> The issue at hand is whether we can support GenBank accessions/
> display_id/version with your naming scheme.

Chris, I'm a little unsure of what you're saying here (which might mean
that you're already saying what I'm about to...say). Do you mean it might
be tricky to support both the Genbank standard and Dmitry's
simultaneously?

I would argue any arbitrary ID should be supported as long as that ID is a
contiguous non-space word (\S+).

Actually the existing accession regex looks like it already supports IDs
with '-':

/^ACCESSION\s+(\S.*\S)/

It's only the version regex which doesn't (\w doesn't include '-'):

/^\w+\.(\d+)/


Anyone else have thoughts or comments on this? Off the top of my head, I
can't think of any issues that might arise from doing so (apart from
having to modify all of the SeqIO modules to support it).

Dave


From heikki at sanbi.ac.za  Fri Feb 23 03:25:39 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 23 Feb 2007 10:25:39 +0200
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9107936.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>
Message-ID: <200702231025.39416.heikki@sanbi.ac.za>

Kurt,

There are  few things in your code to note:

- regexp /C*T/ matches any T preceded by zero or more Cs,
  not what you meant
- $- and $+ are among the "expensive" perl functions worth 
  not using unless you have to. Using them once in your 
  code slows execution down considerable. There is always 
  an other way.
- Keep in mind what you want to use the match positions for: 
  Human readable locations usually start counting with 1 but
  perl code uses 0 as the first location. The code below assumes
  you want to print the locations out.

Study my example code below.

Yours,
	-Heikki

###################################################################
#!/usr/bin/perl
$seq = "GATCAAT";
#$pattern=  'C*T';
$pattern=  'C.*T';

while ($seq =~ m/($pattern)/gi) {

    $match = $1;
    $end = pos($seq);
    $start = $end - length($match) +1;

    print "$match : $start - $end\n";
}

###################################################################


On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> Hi every1..
> I m facing a great deal of problem in simple pattern matching between
> sequence & a pattern ..Program shod be designed such a way that it shod be
> able do two things 1) normal matching...For eg: GATCAAT....if TC is
> entered... output shod be 2...2) matching using spl character..In same
> example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> problem..output I m gettin as 1 instead of 3...Code is really simple!
>
> #!/usr/bin/perl
> $alphabet = "GATCAAT";
> $pattern=  "C*T ";
>
> $alphabet =~ /($pattern)/i;
>
> print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
>
> ====================
> OUTPUT!
> The entire C*T match began at 1 and ended at 2
> ====================
>
> but the o/p shod be 3????
> & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> 'CAAT'...????
>
> Well..Its not compulsion to use regex....But I find it quite simple..can
> there be n e other method??
>
> Thanx in advance!
> Kurt!


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From avilella at gmail.com  Fri Feb 23 04:59:49 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Feb 2007 09:59:49 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>

now that we are at this pattern matching thread, I was wondering if
any perl guru could enlighten me on the issue of matching exact
sequence patterns on a gapped target sequence. E.g.:

my $seq = "CGATCAACGAATCGTACGTACTC";
my $gapped_seq =
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

and one would like to get as a result:

"CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"

which is the match of $seq but in $gapped_seq.

Cheers,

    Albert.


On 2/23/07, Heikki Lehvaslaiho <heikki at sanbi.ac.za> wrote:
> Kurt,
>
> There are  few things in your code to note:
>
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth
>   not using unless you have to. Using them once in your
>   code slows execution down considerable. There is always
>   an other way.
> - Keep in mind what you want to use the match positions for:
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
>
> Study my example code below.
>
> Yours,
>         -Heikki
>
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
>
> while ($seq =~ m/($pattern)/gi) {
>
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
>
>     print "$match : $start - $end\n";
> }
>
> ###################################################################
>
>
> On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> > Hi every1..
> > I m facing a great deal of problem in simple pattern matching between
> > sequence & a pattern ..Program shod be designed such a way that it shod be
> > able do two things 1) normal matching...For eg: GATCAAT....if TC is
> > entered... output shod be 2...2) matching using spl character..In same
> > example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> > problem..output I m gettin as 1 instead of 3...Code is really simple!
> >
> > #!/usr/bin/perl
> > $alphabet = "GATCAAT";
> > $pattern=  "C*T ";
> >
> > $alphabet =~ /($pattern)/i;
> >
> > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
> >
> > ====================
> > OUTPUT!
> > The entire C*T match began at 1 and ended at 2
> > ====================
> >
> > but the o/p shod be 3????
> > & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> > 'CAAT'...????
> >
> > Well..Its not compulsion to use regex....But I find it quite simple..can
> > there be n e other method??
> >
> > Thanx in advance!
> > Kurt!
>
>
>
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From js5 at sanger.ac.uk  Fri Feb 23 06:34:37 2007
From: js5 at sanger.ac.uk (James Smith)
Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>

On Fri, 23 Feb 2007, Albert Vilella wrote:

> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
>
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>
> and one would like to get as a result:
>
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>
> which is the match of $seq but in $gapped_seq.

Try...

 my $seq = "CGATCAACGAATCGTACGTACTC";
 my $gapped_seq =
   "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

 my $regexp = '('.join('-*?',split//,$seq).')';

 if( $gapped_seq =~ /$regexp/ ) {
   print "Match is $1\n";
 } else {
   print "No match\n";
 }

 (not sure on the efficiency if $seq is long tho')
James

>
> Cheers,


From khoueiry at ibdm.univ-mrs.fr  Fri Feb 23 08:09:33 2007
From: khoueiry at ibdm.univ-mrs.fr (pierre)
Date: Fri, 23 Feb 2007 14:09:33 +0100
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <1172236173.4309.6.camel@ciona-pierre>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/0e08ebe6/attachment.pl>

From neetisomaiya at gmail.com  Fri Feb 23 07:27:28 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 23 Feb 2007 17:57:28 +0530
Subject: [Bioperl-l] need help urgently - needle output parsing
Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com>

Hi,

I am using needle alignment tool (standalone, on a linux machine), and then
I am using Bioperl to parse the output.
All data - sequence files and alignment outputs are attached with this mail.

I have 2 small sequences :- 693.seq and revcomp693.seq
I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and
80768-4291-5639.84809_84810_84810_1.scf.seq
All these are in fasta format

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84809_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 97
2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 91

All this is correct.

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84810_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is correct)
2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is incorrect, correct position is 330)


Part of my code is as follows :-
---------------------------------------------

# running needle
`$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen
10.0-gapextend
0.5 $output`;

# parsing needle output
my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output);
my $aln = $str->next_aln();
my $pos = $aln->column_from_residue_number('original',1);

$logger->info("Alignment pos is $pos");

####################################

 # running needle
`$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen
10.0 -gapextend 0.5 $comp_output`;

# parsing needle output
my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output);
my $comp_aln = $comp_str->next_aln();
my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1);

$logger->info("Alignment pos is $comp_pos");


Can someone please tell me what is going wrong here?


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data.zip
Type: application/zip
Size: 4456 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/21658b7d/attachment-0002.zip>

From bix at sendu.me.uk  Fri Feb 23 08:55:24 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Feb 2007 13:55:24 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
	<Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
Message-ID: <45DEF24C.1010303@sendu.me.uk>

James Smith wrote:
> On Fri, 23 Feb 2007, Albert Vilella wrote:
> 
>> now that we are at this pattern matching thread, I was wondering if
>> any perl guru could enlighten me on the issue of matching exact
>> sequence patterns on a gapped target sequence. E.g.:
>>
>> my $seq = "CGATCAACGAATCGTACGTACTC";
>> my $gapped_seq =
>> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>>
>> and one would like to get as a result:
>>
>> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>>
>> which is the match of $seq but in $gapped_seq.
> 
> Try...
> 
>  my $seq = "CGATCAACGAATCGTACGTACTC";
>  my $gapped_seq =
>    "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
> 
>  my $regexp = '('.join('-*?',split//,$seq).')';
> 
>  if( $gapped_seq =~ /$regexp/ ) {
>    print "Match is $1\n";
>  } else {
>    print "No match\n";
>  }

That's great stuff. If you were matching thousands of different $seq 
against the same very large $gapped_seq, and only needed the first match 
of $seq in $gapped_seq, the alternative to the above approach (remove 
the gaps from $gapped_seq and do index() matching) will be faster.

Here's one (overly long-winded) way of implementing it, that I found to 
take ~2s vs ~22s for the above regex approach when doing the job on 
999999 copies of $seq:

#!/usr/bin/perl -w
use strict;
use warnings;

my $gapped_seq = 
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

# note the total gap-length at position in gapless 0-based coords
my @gap_lengths;
my $gap_length = 0;
while ($gapped_seq =~ /(-+)/g) {
   my $match = $1;
   my $prev_length = $gap_length;
   $gap_length += length($match);
   my $end = pos($gapped_seq) - $gap_length - 1;
   push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths);
}
push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - 
@gap_lengths - $gap_length));

# remove the gaps
my $gapless_seq = $gapped_seq;
$gapless_seq =~ s/-//g;

# now for each of thousands of seqs...
my $seq = 'CGATCAACGAATCGTACGTACTC';
my @seqs;
for (1..999999) {
   push(@seqs, $seq);
}
foreach my $seq (@seqs) {
   my $start = index($gapless_seq, $seq);
   if ($start == -1) {
     print "No match found for seq '$seq'\n";
     next;
   }
   my $end = $start + length($seq) - 1;

   # calculate the coords in $gapped_seq
   $start = $start + $gap_lengths[$start];
   $end = $end + $gap_lengths[$end];

   my $result = substr($gapped_seq, $start, ($end - $start + 1));
   #print $result, "\n";
}

exit;


From MEC at stowers-institute.org  Fri Feb 23 10:54:57 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 09:54:57 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with
	multiple values
In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>

Lincoln, and other Bio::DB::SeqFeature wanderers:

I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
does not respect the following:
 
"Multiple attributes of the same type are indicated by separating the
values with the comma "," character"  (c.f.
http://www.sequenceontology.org/gff3.shtml)
 
This one-liner demonstrates the problem:
 
perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
"J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
-name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
J	A	PH	1	2	.	.	.
foo=bar;foo=blat;Name=mec

Do you agree this is a problem? 
 
The fix is in the post-sig patch to
/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
stylistic privilege of promoting any ID, Parent, or Name attribute to
the front of column 9, so output is now:

J	A	PH	1	2	.	.	.
Name=mec;foo=bar,blat

Do you agree this is better?

I am poised to commit it, as well as the functionally same patch to the
equivilent function in Bio/Graphics/FeatureBase.pm

All clear?

-- Malcolm Cook

  
*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,498 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     
!      push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     #push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
 

From MEC at stowers-institute.org  Fri Feb 23 12:08:11 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 11:08:11 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	withmultiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F509@exchkc02.stowers-institute.org>

Oy,

I hit send too soon.  The patch I send had my new attribute encoder
commented out.  It should've been: 


*** NormalizedFeature.pm	2 Feb 2007 21:05:42 -0000	1.25
--- NormalizedFeature.pm	23 Feb 2007 17:06:37 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,497 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     # push @result,join '=',$self->escape($t),$self->escape($_)
foreach @values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  

Malcolm


From lstein at cshl.edu  Fri Feb 23 12:16:01 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 12:16:01 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>

Hi Malcom,

You're quite right, and I appreciate your work in tracking down and fixing
it. Before you commit the patch, can you confirm that the loader is working
correctly so that comma-separated values are read back into the data
structure as multiple attributes?

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, and other Bio::DB::SeqFeature wanderers:
>
> I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> does not respect the following:
>
> "Multiple attributes of the same type are indicated by separating the
> values with the comma "," character"  (c.f.
> http://www.sequenceontology.org/gff3.shtml)
>
> This one-liner demonstrates the problem:
>
> perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> J       A       PH      1       2       .       .       .
> foo=bar;foo=blat;Name=mec
>
> Do you agree this is a problem?
>
> The fix is in the post-sig patch to
> /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> stylistic privilege of promoting any ID, Parent, or Name attribute to
> the front of column 9, so output is now:
>
> J       A       PH      1       2       .       .       .
> Name=mec;foo=bar,blat
>
> Do you agree this is better?
>
> I am poised to commit it, as well as the functionally same patch to the
> equivilent function in Bio/Graphics/FeatureBase.pm
>
> All clear?
>
> -- Malcolm Cook
>
>
>
> *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> ***************
> *** 481,494 ****
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   push @result,"ID=".$self->escape($id)                     if defined
> $id;
> !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> $parent;
> !   push @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
> --- 481,498 ----
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> !     # NO! Multiple attributes of the same type are indicated by
> !     # separating the values with the comma "," character - per
> !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> !     #push @result,join '=',$self->escape($t),join(',', map
> {$self->escape($_)} @values);
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   unshift @result,"ID=".$self->escape($id)                     if
> defined $id;
> !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> defined $parent;
> !   unshift @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From aaron.j.mackey at gsk.com  Fri Feb 23 09:36:18 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 23 Feb 2007 09:36:18 -0500
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <45DDFED1.1090503@watson.wustl.edu>
Message-ID: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>

The fundamental difference (in my mind) between a feature and an 
annotation, is that a feature has a location/range, and thus the 
information represented in the feature is applicable only to that 
location/range.  An annotation, on the other hand, is "global", or at 
least non-localizable (note: a feature with a "fuzzy" location of 
"somewhere along this sequence, but I'm not sure where" is still not 
global - if you did/could know the location, you'd describe it as a 
feature, so it shouldn't be represented with an annotation).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:

> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
> 
> I get the impression they are designed to do similar things.  If so is 
> one deprecated and the other preferred?
> 
> If their responsibilities are orthogonal to each other, what sorts of 
> tasks are suited to each?
> 
> Thanks,
> Michael
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From MEC at stowers-institute.org  Fri Feb 23 13:46:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 12:46:00 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>

Lincoln,
 
OK.  I'll do that...
 
...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... 
 
...ok - parse_attributes _looks_ right to me
 
...so, let's try it
 
#load a feature into a new database:
 
bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
-create -user test -pass test <(echo -e
"J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n")
 
#It loaded ok.  Now, let's print it out in GFF3:
 
perl -MBio::DB::SeqFeature::Store -e 'foreach
(Bio::DB::SeqFeature::Store->new(-dsn =>
"dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu
res(-type => "PH:A")) {print $_->gff3_string . "\n"}'
J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat

#output looks good to me

Note, I tried loading attributes foo=bar;foo=blat and it came back
foo=bar,blat.  So, you can load either way.

I'll commit later today.

--Malcolm  

 
________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, February 23, 2007 11:16 AM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes
with multiple values
	
	
	Hi Malcom,
	
	You're quite right, and I appreciate your work in tracking down
and fixing it. Before you commit the patch, can you confirm that the
loader is working correctly so that comma-separated values are read back
into the data structure as multiple attributes? 
	
	Lincoln
	
	
	On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, and other Bio::DB::SeqFeature wanderers:
		
		I find that generating GFF from a Bio::DB::SeqFeature
using gff3_string
		does not respect the following:
		
		"Multiple attributes of the same type are indicated by
separating the 
		values with the comma "," character"  (c.f.
		http://www.sequenceontology.org/gff3.shtml)
		
		This one-liner demonstrates the problem:
		
		perl -MBio::DB::SeqFeature -e 'print
Bio::DB::SeqFeature->new(-seq_id =>
		"J", -start => 1, -end => 2, -primary_tag => 'PH',
-source => 'A',
		-name => 'mec', -attributes => {foo =>  [qw(bar
blat)]})->gff3_string' 
		J       A       PH      1       2       .       .
.
		foo=bar;foo=blat;Name=mec
		
		Do you agree this is a problem?
		
		The fix is in the post-sig patch to
		/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also
took the 
		stylistic privilege of promoting any ID, Parent, or Name
attribute to
		the front of column 9, so output is now:
		
		J       A       PH      1       2       .       .
.
		Name=mec;foo=bar,blat
		
		Do you agree this is better? 
		
		I am poised to commit it, as well as the functionally
same patch to the
		equivilent function in Bio/Graphics/FeatureBase.pm
		
		All clear?
		
		-- Malcolm Cook
		
		
		*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
		--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
		***************
		*** 481,494 ****
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace 
		!
		!     push @result,join
'=',$self->escape($t),$self->escape($_) foreach
		@values;
		    }
		    my $id   = $self->primary_id;
		    my $name = $self->display_name;
		!   push @result,"ID=".$self->escape($id)
if defined 
		$id;
		!   push
@result,"Parent=".$self->escape($parent->primary_id) if defined
		$parent;
		!   push @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result; 
		  }
		
		--- 481,498 ----
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
		!
		!      push @result,join
'=',$self->escape($t),$self->escape($_) foreach 
		@values;
		!     # NO! Multiple attributes of the same type are
indicated by
		!     # separating the values with the comma ","
character - per
		!     # http://www.sequenceontology.org/gff3.shtml.  Do
it this way:
		!     #push @result,join '=',$self->escape($t),join(',',
map
		{$self->escape($_)} @values);
		    }
		    my $id   = $self->primary_id; 
		    my $name = $self->display_name;
		!   unshift @result,"ID=".$self->escape($id)
if
		defined $id;
		!   unshift
@result,"Parent=".$self->escape($parent->primary_id) if 
		defined $parent;
		!   unshift @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result;
		  }
		
		
	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Fri Feb 23 13:49:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Feb 2007 12:49:44 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
References: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
Message-ID: <FEDC420E-AE3A-4AD4-A30B-54F8DF904D84@uiuc.edu>

To add to that, there's a HOWTO describing the differences:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

I agree w/ Aaron; if it has a location it's a feature,  otherwise  
it's an annotation.

chris

On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote:

> The fundamental difference (in my mind) between a feature and an
> annotation, is that a feature has a location/range, and thus the
> information represented in the feature is applicable only to that
> location/range.  An annotation, on the other hand, is "global", or at
> least non-localizable (note: a feature with a "fuzzy" location of
> "somewhere along this sequence, but I'm not sure where" is still not
> global - if you did/could know the location, you'd describe it as a
> feature, so it shouldn't be represented with an annotation).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:
>
>> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
>>
>> I get the impression they are designed to do similar things.  If  
>> so is
>> one deprecated and the other preferred?
>>
>> If their responsibilities are orthogonal to each other, what sorts of
>> tasks are suited to each?
>>
>> Thanks,
>> Michael
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri Feb 23 16:20:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 16:20:26 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com>

Excellent!

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
>  Lincoln,
>
> OK.  I'll do that...
>
> ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ ....
>
> ...ok - parse_attributes _looks_ right to me
>
> ...so, let's try it
>
> #load a feature into a new database:
>
> bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
> -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,
> blat;Name=mec\n")
>
> #It loaded ok.  Now, let's print it out in GFF3:
>
> perl -MBio::DB::SeqFeature::Store -e 'foreach
> (Bio::DB::SeqFeature::Store->new(-dsn =>
> "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type
> => "PH:A")) {print $_->gff3_string . "\n"}'
> J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat
>
> #output looks good to me
>
> Note, I tried loading attributes foo=bar;foo=blat and it came back
> foo=bar,blat.  So, you can load either way.
>
> I'll commit later today.
>
> --Malcolm
>
>
>  ------------------------------
> *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On
> Behalf Of *Lincoln Stein
> *Sent:* Friday, February 23, 2007 11:16 AM
> *To:* Cook, Malcolm
> *Cc:* bioperl list; lstein at cshl.org
> *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with
> multiple values
>
> Hi Malcom,
>
> You're quite right, and I appreciate your work in tracking down and fixing
> it. Before you commit the patch, can you confirm that the loader is working
> correctly so that comma-separated values are read back into the data
> structure as multiple attributes?
>
> Lincoln
>
> On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
> >
> > Lincoln, and other Bio::DB::SeqFeature wanderers:
> >
> > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> > does not respect the following:
> >
> > "Multiple attributes of the same type are indicated by separating the
> > values with the comma "," character"  (c.f.
> > http://www.sequenceontology.org/gff3.shtml)
> >
> > This one-liner demonstrates the problem:
> >
> > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> > -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> > J       A       PH      1       2       .       .       .
> > foo=bar;foo=blat;Name=mec
> >
> > Do you agree this is a problem?
> >
> > The fix is in the post-sig patch to
> > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> > stylistic privilege of promoting any ID, Parent, or Name attribute to
> > the front of column 9, so output is now:
> >
> > J       A       PH      1       2       .       .       .
> > Name=mec;foo=bar,blat
> >
> > Do you agree this is better?
> >
> > I am poised to commit it, as well as the functionally same patch to the
> > equivilent function in Bio/Graphics/FeatureBase.pm
> >
> > All clear?
> >
> > -- Malcolm Cook
> >
> >
> >
> > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> > ***************
> > *** 481,494 ****
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> > @values;
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   push @result,"ID=".$self->escape($id)                     if defined
> >
> > $id;
> > !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> > $parent;
> > !   push @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> > --- 481,498 ----
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> >
> > @values;
> > !     # NO! Multiple attributes of the same type are indicated by
> > !     # separating the values with the comma "," character - per
> > !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> > !     #push @result,join '=',$self->escape($t),join(',', map
> > {$self->escape($_)} @values);
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   unshift @result,"ID=".$self->escape($id)                     if
> > defined $id;
> > !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> > defined $parent;
> > !   unshift @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From enrique_rulz at yahoo.com  Sat Feb 24 16:23:59 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <9137941.post@talk.nabble.com>


Heikki Lehvaslaiho wrote:
> 
> Kurt,
> 
> There are  few things in your code to note:
> 
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth 
>   not using unless you have to. Using them once in your 
>   code slows execution down considerable. There is always 
>   an other way.
> - Keep in mind what you want to use the match positions for: 
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
> 
> Study my example code below.
> 
> Yours,
> 	-Heikki
> 
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
> 
> while ($seq =~ m/($pattern)/gi) {
> 
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
> 
>     print "$match : $start - $end\n";
> }
> 
> ###################################################################
> 
> 


Thanx for the instant reply!...Sorry cudn reply earlier..

Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
& 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
the code which I need to write says T*A shod be only the input not T.*A..So
Can we use replacment reg ex...sumthing like 
$pattern =~  s/.*/*/...or sumthing else...
But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

N e ways thanx a lot again for the code...Hope to listen frm you soon!

Kurt!


-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biology0046 at hotmail.com  Sat Feb 24 23:14:51 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 04:14:51 +0000
Subject: [Bioperl-l] how to change align output format
Message-ID: <BAY109-F2409DB6CAA116F289F8F17B48C0@phx.gbl>

Dear all:

I have problems in changing the output format of clustal alignment.
I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an 
mulitple sequences alignment, then i use the Bio::AlignIO module to write 
out the alignment. Scripts like this:
my 
$aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw');

The output :
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dere_GLEANR_9270       
..............S.............................................
FBgn0000097            
..............S.............................................
dsec_GLEANR_671        
..............S.............................................
dsim_GLEANR_6613       
..............S.............................................
dyak_GLEANR_1669       
..............S.............................................
                                     .


dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dere_GLEANR_9270       
............................................................
FBgn0000097            
............................................................
dsec_GLEANR_671        
............................................................
dsim_GLEANR_6613       
............................................................
dyak_GLEANR_1669       
............................................................

But , I want to change the output format as below, which do not change the 
identical residues into "." character. 
dere_GLEANR_9270       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dyak_GLEANR_1669       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsec_GLEANR_671        
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsim_GLEANR_6613       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
FBgn0000097            
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
                       
**************.*********************************************

dere_GLEANR_9270       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dyak_GLEANR_1669       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsec_GLEANR_671        
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsim_GLEANR_6613       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
FBgn0000097            
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
                       
************************************************************

Are their any parameters in the package that can be changed so that i can 
get the postier output format? Thank you Sincerely!

Jiang

_________________________________________________________________
?????????????????????????????? MSN Hotmail??  http://www.hotmail.com  


From bix at sendu.me.uk  Sun Feb 25 05:53:48 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:53:48 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
Message-ID: <45E16ABC.3060405@sendu.me.uk>

Tels,

I've forwarded this to the author of the module, Nat Goodman, and to the 
Bioperl mailing list 
(http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list).

But actually we have Bio::Graph::* as tentatively deprecated:
http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules
so any further work on it doesn't seem worthwhile.


-------- Original Message --------
Subject: Bio::Graph::SimpleGraph
Date: Sat, 24 Feb 2007 12:07:31 +0100
From: Tels <nospam-abuse at bloodgate.com>

Moin,

I just stumble dover Bio::Graph::SimpleGraph and read this comment:

"This is a simple, hopefully fast undirected graph package. The only reason
this exists is that the standard CPAN Graph pacakge, Graph::Base, is
seriously broken."

Really sad to see people always reinventing the wheel :/

Anyway, I wonder if you would like to make your module support Graph::Easy
(http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit
patches and do testing/documention for that.

All the best,

Tels


From bix at sendu.me.uk  Sun Feb 25 05:45:21 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:45:21 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9137941.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>
	<9137941.post@talk.nabble.com>
Message-ID: <45E168C1.80306@sendu.me.uk>

Kurt Gobain wrote:
> Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
> If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
> o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
> & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
> the code which I need to write says T*A shod be only the input not T.*A..So
> Can we use replacment reg ex...sumthing like 
> $pattern =~  s/.*/*/...or sumthing else...
> But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

These aren't Bioperl questions. For regular expression help see:
http://perldoc.perl.org/perlretut.html

Basically, you want a non-greedy match, so T.*?A

You can convert T*A by doing s/\*/.*?/

Here are some more regexs for you:
s/sum/some/g
s/frm/from/g
s/n e/any/g
etc...


From biology0046 at hotmail.com  Sun Feb 25 08:28:34 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 13:28:34 +0000
Subject: [Bioperl-l] AlignIO problems
Message-ID: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>

hi, all,
I use the AlignIO module to convert the alignment file.
my original file is :
CLUSTAL W(1.81) multiple sequence alignment


dana_GLEANR_11249      
MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
dere_GLEANR_7213       
...V...................I....................................
dgri_GLEANR_6962       
.......................I....................................
FBgn0004638            
.......................I....................................
dmoj_GLEANR_6118       
...........N...........I....................................
dper_GLEANR_18885      
...V...................I....................................
dpse_GLEANR_14384      
...V...................I....................................
dsec_GLEANR_3096       
.................N.....I....................................
dsim_GLEANR_9744       
-----------------------------...............................
dvir_GLEANR_4811       
.......................I....................................
dwil_GLEANR_10869      
.......................I....................................
dyak_GLEANR_13576      
.......................I....................................


dana_GLEANR_11249      
YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
.................L..........................................
dper_GLEANR_18885      
............................................................
dpse_GLEANR_14384      
............................................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      
VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
..............................V.D...........................
dper_GLEANR_18885      
.......................E....................................
dpse_GLEANR_14384      
.......................E....................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
dere_GLEANR_7213       ...............................
dgri_GLEANR_6962       ...............................
FBgn0004638            ...............................
dmoj_GLEANR_6118       ............Q..................
dper_GLEANR_18885      ...............................
dpse_GLEANR_14384      ...............................
dsec_GLEANR_3096       ...............................
dsim_GLEANR_9744       ...............................
dvir_GLEANR_4811       ...............................
dwil_GLEANR_10869      ...............................
dyak_GLEANR_13576      ...............................


I want to change those "." characters back to alphabetic expression, then i 
write the code like this:
use Bio::AlignIO;
my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
                      -format => 'clustalw');
my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
                       -format =>'clustalw');
while (my $aln=$in->next_aln() ){
    $aln->unmatch();
    $aln->set_displayname_flat();
    $out->write_aln($aln);
}

but when i run the code, there are error message like:

-------------------- WARNING ---------------------
MSG: Got a sequence with no letters in it cannot guess alphabet []
---------------------------------------------------

------------- EXCEPTION  -------------
MSG: No sequence with name [dsim_GLEANR_9744/1-182]
STACK Bio::SimpleAlign::displayname 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307
STACK Bio::SimpleAlign::set_displayname_flat 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374
STACK toplevel aligntest.pl:11

--------------------------------------

I don't know where is the problem.

Jiang

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From cjfields at uiuc.edu  Sun Feb 25 14:58:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Feb 2007 13:58:23 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
References: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu>

Bio::AlignIO::clustalw doesn't work with masked sequences; it parses  
the output quite literally as is, so any [.-] are treated as gaps.   
If the seqs are 100% identical then you will have a seq with 100%  
gaps and no sequence, thus giving you the warnings you see.

The best way to accomplish what you want is to not mask the sequence  
alignment to begin with when running clustalw/muscle/whatever.   
Exactly how are you generating these?  When I use clustalw no  
identity masking occurs by default.

chris

On Feb 25, 2007, at 7:28 AM, ? ?? wrote:

> hi, all,
> I use the AlignIO module to convert the alignment file.
> my original file is :
> CLUSTAL W(1.81) multiple sequence alignment
>
>
> dana_GLEANR_11249       
> MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
> dere_GLEANR_7213       ...V...................I....................... 
> .............
> dgri_GLEANR_6962       .......................I....................... 
> .............
> FBgn0004638            .......................I....................... 
> .............
> dmoj_GLEANR_6118       ...........N...........I....................... 
> .............
> dper_GLEANR_18885      ...V...................I....................... 
> .............
> dpse_GLEANR_14384      ...V...................I....................... 
> .............
> dsec_GLEANR_3096       .................N.....I....................... 
> .............
> dsim_GLEANR_9744        
> -----------------------------...............................
> dvir_GLEANR_4811       .......................I....................... 
> .............
> dwil_GLEANR_10869      .......................I....................... 
> .............
> dyak_GLEANR_13576      .......................I....................... 
> .............
>
>
>
> dana_GLEANR_11249       
> YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       .................L............................. 
> .............
> dper_GLEANR_18885      ............................................... 
> .............
> dpse_GLEANR_14384      ............................................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249       
> VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       ..............................V.D.............. 
> .............
> dper_GLEANR_18885      .......................E....................... 
> .............
> dpse_GLEANR_14384      .......................E....................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
> dere_GLEANR_7213       ...............................
> dgri_GLEANR_6962       ...............................
> FBgn0004638            ...............................
> dmoj_GLEANR_6118       ............Q..................
> dper_GLEANR_18885      ...............................
> dpse_GLEANR_14384      ...............................
> dsec_GLEANR_3096       ...............................
> dsim_GLEANR_9744       ...............................
> dvir_GLEANR_4811       ...............................
> dwil_GLEANR_10869      ...............................
> dyak_GLEANR_13576      ...............................
>
>
> I want to change those "." characters back to alphabetic  
> expression, then i write the code like this:
> use Bio::AlignIO;
> my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
>                      -format => 'clustalw');
> my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
>                       -format =>'clustalw');
> while (my $aln=$in->next_aln() ){
>    $aln->unmatch();
>    $aln->set_displayname_flat();
>    $out->write_aln($aln);
> }
>
> but when i run the code, there are error message like:
>
> -------------------- WARNING ---------------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> ---------------------------------------------------
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name [dsim_GLEANR_9744/1-182]
> STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ 
> Bio/SimpleAlign.pm:2307
> STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ 
> bioperl-live/Bio/SimpleAlign.pm:2374
> STACK toplevel aligntest.pl:11
>
> --------------------------------------
>
> I don't know where is the problem.
>
> Jiang
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cristiangary at gmail.com  Sun Feb 25 16:04:57 2007
From: cristiangary at gmail.com (Cristian Gary)
Date: Sun, 25 Feb 2007 18:04:57 -0300
Subject: [Bioperl-l] problem with blast report to ncbi webpage
Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com>

i have a problem with the blast report to the ncbi server.  the time to wait
the Rids dont showme any result.
the problem is the ncbi server o the biperl version.?
pd: the same code works very well a 3 weeks ago.


-- 
"El conocimiento le pertecene  a la humanidad"

"Gnu/linux   -------- free your mind......
www.kubuntu.org


From granjeau at tagc.univ-mrs.fr  Mon Feb 26 04:17:15 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Mon, 26 Feb 2007 10:17:15 +0100
Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object
Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr>

Hello !

I would like to fill a BioSeq object with the output from a dbfetch
request at EI on UniParc database (which replies only XML code, as I am
interested in references). If somebody could tell which BioPerl object
to use or a way or convert it in Swiss format or could tell me the way
to do it or has got a piece of code (is
http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good
starting point), I would appreciate a lot.

Best regards,
--Samuel

<entry accession="UPI00004A0D4A">
<dbReferenceList>
    <dbReference db="EMBL" id="CAI39485" version="1" version_i="1" 
active="Y" created="04-Jan-2005" last="15-Dec-2006"/>
    <dbReference db="UniProtKB/TrEMBL" id="Q5JVT0" version="1" 
version_i="1" active="N" created="15-Feb-2005" last="06-Feb-2007"/>
    <dbReference db="ENSEMBL" id="ENSP00000352958" version_i="2" 
active="Y" created="03-Apr-2006" last="27-Nov-2006"/>
    <dbReference db="IPI" id="IPI00418471" version="4" version_i="4" 
active="N" created="07-Mar-2005" last="07-Mar-2005"/>
    <dbReference db="IPI" id="IPI00646867" version="1" version_i="1" 
active="N" created="06-Sep-2005" last="06-Oct-2006"/>
    <dbReference db="VEGA" id="OTTHUMP00000019225" version_i="1" 
active="N" created="15-Aug-2005" last="02-Dec-2005"/>
</dbReferenceList>
<sequence length="431" crc64="8913D1F04A71CCFB">
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV
YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK
VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE
DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE
EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE
AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD
TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS
LNLRGKHFISL
</sequence>
</entry>


From bix at sendu.me.uk  Mon Feb 26 06:46:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Feb 2007 11:46:39 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
In-Reply-To: <45E16ABC.3060405@sendu.me.uk>
References: <45E16ABC.3060405@sendu.me.uk>
Message-ID: <45E2C89F.1020402@sendu.me.uk>

Nat replied, but I messed up to To:s so his reply didn't make it to the
list. Here's what he said:


Nathan (Nat) Goodman wrote:
Hi Tels

I agree it's sad to reinvent the wheel, but I don't think that's what
happened here. Your module seems to be focused on rendering graphs while
my module is concerned with computations on graphs.

In any case, as Sendu notes, SimpleGraph is in the process of being
deprecated. I fully support this move. It was intended to be a stopgap
until the main Perl Graph module was fixed.  Since that has now
happened, it's time for SimpleGraph to retire.

For the benefit of anyone using Graph: last I checked (six months or
more ago), it had serious performance problems on large graphs (probably
not too much of a surprise), and also was inexplicably slow on graphs
with edge attributes.  I see that the latter bug is marked "resolved" in
CPAN, but there's no indication of when or how.  We've moved to Boost
for graphs as large as the human protein interaction network.

Best,
Nat


From sanjib at bic.boseinst.ernet.in  Mon Feb 26 00:23:36 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Mon, 26 Feb 2007 10:53:36 +0530
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors

waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
xx.pep

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp

<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>

---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.

On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote
> Mailing list subscription confirmation notice for mailing list
> Bioperl-l
> 
> We have received a request from 202.141.148.27 for subscription of
> your email address, "sanjib at bic.boseinst.ernet.in", to the
> bioperl-l at lists.open-bio.org mailing list.  To confirm that you want
> to be added to this mailing list, simply reply to this message,
> keeping the Subject: header intact.  Or visit this web page:
> 
>     http://lists.open-bio.org/mailman/confirm/bioperl-
l/d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Or include the following line -- and only the following line -- in a
> message to bioperl-l-request at lists.open-bio.org:
> 
>     confirm d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Note that simply sending a `reply' to this message should work from
> most mail readers, since that usually leaves the Subject: line in the
> right form (additional "Re:" text in the Subject: is okay).
> 
> If you do not wish to be subscribed to this list, please simply
> disregard this message.  If you think you are being maliciously
> subscribed to the list, or have any other questions, send them to
> bioperl-l-owner at lists.open-bio.org.

--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070226/86a0137c/attachment-0002.pl>

From cjfields at uiuc.edu  Mon Feb 26 09:59:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 08:59:21 -0600
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
	<20070226052336.M74918@bic.boseinst.ernet.in>
Message-ID: <C668C555-39ED-43A9-8B49-C7D0376D971F@uiuc.edu>

I tested this out and got BLAST to work for my test case (single  
fasta seq, since you didn't send any seqs for testing).  It keeps  
querying for the RID in what appears to be an infinite loop (i.e. it  
doesn't get rid of the RID properly); you can see this if you add '- 
verbose => 1' to your parameters.  I don't have time to delve into it  
but from a quick glance it may be due to your looping structure and  
how you are saving your rids.

As for your particular error, could it be something as simple as the  
server was overloaded or down?  It does happen from time to time...

Beyond that I can't make heads or tails of your script.  Was it  
cobbled together from a bunch of others?  If you are doing that you  
can probably expect some bugs to occur.

chris

On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote:

> Hi
> I have been running this script for some time and it was running  
> fine. I am
> using this linux machine with live IP(no proxy). But suudenly it  
> has stopped
> working with this errors
>
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> xx.pep
>
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
> Content-Length: 497
> Content-Type: application/x-www-form-urlencoded
>
> DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
> 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA 
> GDTLDVF
> TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT 
> AFTSLPV
> YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG 
> AAVIAMV
> HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S 
> TATISTI
> CS=off&EXPECT=1e-
> 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& 
> ENTREZ_
> QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
>
> <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Internal Server Error
> </BODY>
> </HTML>
>
> ---------------------------------------------------
>
> Though I am able to see the ncbi page from browser but am unable to  
> ping ot
> trace route to the server.
>
> Please help me.


From cjfields at uiuc.edu  Mon Feb 26 10:05:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 09:05:50 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
References: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu>

Make sure to keep this on the list, others may have some input.

You should be able to test the various sequence objects you're  
retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what  
you're expecting, then track down the problematic sequences.  My  
guess is the odd seqs are due to the way you are using Bio::DB::Fasta  
for each of the files.  I'm wondering if you are having problems with  
indices overwriting one another and are thus getting back blank seq  
objects.

You should probably consider just indexing all of your files  
together; according to the POD you can use a single Bio::DB::Fasta to  
index all of the files in one go (indicate the path and use '-glob')  
and retrieve what you need that way.  Either that or separating them  
into separate directories so the indices are also separate.

chris

On Feb 25, 2007, at 9:50 PM, ? ?? wrote:

> Thank you for your help!
> May be you are right, I use the following code to create my seq  
> object arrays:
>          my $outfilename=$dmel;
>          my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta");
>          my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta");
>          my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta");
>          my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta");
>          my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta");
>          my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta");
>          my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta");
>          my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta");
>          my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta");
>          my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta");
>          my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta");
>          my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta");
>          my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana);
>          my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana);
>          my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere);
>          my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere);
>          my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel);
>          my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel);
>          my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec);
>          my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec);
>          my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim);
>          my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim);
>          my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak);
>          my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak);
>          push @prots, $ana_pep_obj;
>          push @cdna, $ana_nuc_obj;
>          push @prots, $ere_pep_obj;
>          push @cdna, $ere_nuc_obj;
>          push @prots, $mel_pep_obj;
>          push @cdna, $mel_nuc_obj;
>          push @prots, $sec_pep_obj;
>          push @cdna, $sec_nuc_obj;
>          push @prots, $sim_pep_obj;
>          push @cdna, $sim_nuc_obj;
>          push @prots, $yak_pep_obj;
>          push @cdna, $yak_nuc_obj;
>
> then I use the @prots as input for  my  $aln=$aln_factory->align 
> (\@prots);
> This method will create align files with sequences masked.
>
> But if I use fasta files(not an object) which contain protein  
> sequences as input, $inputfile='FBgn0000097.pep';
> @params=('outorder'=>'INPUT');
> $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params);
> $aln=$factory->align($inputfile);
> #$aln->gap_char('-');
> $aln->map_chars('\.','-');
> $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw');
> $aln_out->write_aln($aln);
>
> This methods create files without masking~~~
> I think sequence objects created by "get_Seq_by_id" from sequence  
> databases directly are not appropriate.
>
> Thank you for your suggestion again!
>
> Jiang.
>
>> From: Chris Fields <cjfields at uiuc.edu>
>> To: ????? <biology0046 at hotmail.com>
>> Subject: Re: [Bioperl-l] AlignIO problems
>> Date: Sun, 25 Feb 2007 21:26:34 -0600
>>
>> I ran the same using a local fasta formatted file on my system  
>> which  works (no masking).
>>
>> Of note, the gaps were all marked as '.'.  You're gaps were both  
>> '.'  and '-',  which may mean that something is wrong with the seq  
>> objects  themselves.  Maybe SeqIO is misreading them?
>>
>> chris
>>
>> On Feb 25, 2007, at 7:34 PM, ????? wrote:
>>
>>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry  
>>> out  multiple alignment.
>>> my code is:
>>>         my @clustal_param=('outorder'=>'INPUT');
>>>         my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new  
>>> (@clustal_param);
>>>         my  $aln=$aln_factory->align(\@prots);###@prots is   
>>> array  of protein sequence objects
>>>         my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ 
>>> clustal/ ${outfilename}.aln",-format=>'clustalw');
>>>
>>>         $aln_out->write_aln($aln);
>>> This code produce alignment which mask identity residues.
>>> But if i use clustalW directly, the output is normal.
>>> Thank you for your help~
>>>
>>> Jiang
>>
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From michael.watson at bbsrc.ac.uk  Mon Feb 26 11:00:31 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon, 26 Feb 2007 16:00:31 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
	<6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi Lincoln/List
 
That's great, the axis now appears, but there are no labels.  This in
itself isn't a problem, as long as we can assume that the tick marks are
at 0, 50% and 100%?  If that's true, we can go with what we have,
otherwise I'm going to have to figure out a way to label the y-axis
 
Thanks
Mick

________________________________

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf
Of Lincoln Stein
Sent: 15 February 2007 18:53
To: michael watson (IAH-C)
Cc: BioPerl-List
Subject: Re: [Bioperl-l] The axis of GC content in
Bio::Graphics::glyph:dna


Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20); 

This will leave enough room on the left and right for you to see the Y
axis. Otherwise it runs off the edge of the image (ok, this is a
mis-design, but it was the only way to solve a chicken-and-egg problem
about who gets to say how wide the panel is) 

Lincoln


On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote: 

	Hi
	
	OK I have some great images out of this glyph, but I can't see
the axis,
	and nor is it labelled (ie does it go from 0 - 100%?) so isn't
great for
	publication.  The docs say:
	
	"NOTE: -gc_window=>'auto' gives nice results and is recommended
for 
	drawing GC content. The GC content axes draw slightly outside
the
	panel, so you may wish to add some extra padding on the right
and
	left. "
	
	Any idea how to do this?
	
	Basically, I want a nice GC graph with the axis quite clearly
labelled, 
	and a nice "%GC" title next to it :)
	
	Thanks
	
	Mick
	
	The information contained in this message may be confidential or
legally
	privileged and is intended solely for the addressee. If you have

	received this message in error please delete it & notify the
originator
	immediately.
	Unauthorised use, disclosure, copying or alteration of this
message is
	forbidden & may be unlawful.
	The contents of this e-mail are the views of the sender and do
not 
	necessarily represent the views of the Institute.
	This email and associated attachments has been checked locally
for
	viruses but we can accept no responsibility once it has left our
	systems.
	Communications on Institute computers are monitored to secure
the 
	effective operation of the systems and for other lawful
purposes.
	
	_______________________________________________
	Bioperl-l mailing list
	Bioperl-l at lists.open-bio.org 
	http://lists.open-bio.org/mailman/listinfo/bioperl-l
	

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory 
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Mon Feb 26 12:18:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 11:18:38 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
References: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu>


On Feb 26, 2007, at 9:59 AM, ? ?? wrote:

> Thank you!
> I have checked the sequences retrieved through lots of Bio:DB  
> objects work simultaneously.
> There are not problems you mentioned, the sequences are not  
> overwritten.

Again, keep this on the list.  I have my hands full this month so I  
will be checking the list only very sporadically; someone else may be  
able to help you.

The only explanation for the clustalw output you get is that you are  
not retrieving the correct sequence in some way fundamental way,  
which to me indicates the bug originates either in the way the  
sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my  
thought about conflicting indices) or in the way they are converted  
via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw.

When I have used Bio::DB::Fasta in the past I have never had a  
problem when indexing multiple files and retrieving sequences, so  
beyond running tests with your data I can't help you much beyond the  
above conjecturing.

chris


From jason at bioperl.org  Mon Feb 26 13:45:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 10:45:34 -0800
Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast
In-Reply-To: <20070226095515.68810@gmx.net>
References: <20070226095515.68810@gmx.net>
Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org>

Alex -
I am glad to see of your interest in the module, but I don't  
currently have any time to maintain it and so queries should be sent  
to the BioPerl mailing list.  In general we prefer you don't contact  
developers directly, but use the mailing list so that others can  
learn from questions.

Please note there are several tutorials and documentation on the  
website, you will get a better response from people if you can show  
you have at least tried to use the existing example code to construct  
your program.

-jason
On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote:

> Daer Jason Stajich,
> I hope you can me help.
>
> I am inspired of their module and would like to work with it.
> I am a student to the TFH Wildau.
> I have problems with the understanding of the module.
>
> You could send me an example.
>
> The example is to process a text file (FASTA) with NCBI-Blast (Web).
>
> Parameter:
> Choose database -> Others -> nr
> Limit by entrez query -> Campylobacter -> or select from: ->  
> Bacteria [ORGN]
> Expect -> 10
> Other advanced -> -q-1
>
> output format
> plain text without Graphical Overview
> Number of: -> Descriptions -> 10000
> Alignment view -> query-anchored with identities
>
> All other parameters remain undef.
>
> Thank you for your help.
>
> faithfully Alexander Auner
> -- 
> "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ...
> Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out


From jason at bioperl.org  Mon Feb 26 14:13:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 11:13:00 -0800
Subject: [Bioperl-l] BioPerl leadership additions
Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>

Dear BioPerl Users and Developers,

I want to announce a addition in the leadership of BioPerl.   
Christopher Fields and and Sendu Bala are now members of the BioPerl  
Core developer group to recognize their ongoing leadership in the  
project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
release and have made a significant commitment and contribution to  
the quality of the code and the documentation of the project.  We  
have invited them to be part of the core to recognize their work and  
to feel comfortable to ask them to do more. ;-)

The Core group was established to insure that someone was responsible  
for making code releases, vetting new developers for CVS write  
accounts, and generally dealing with things that might otherwise slip  
through the cracks.  We are very excited to have more people  
contributing to and maintaining the toolkit.  We look forward to  
their help along with all the other developers, as we work towards a  
1.6 release release this year.

As always, while their is a need for some individuals to lead the  
project, we encourage contributions from all levels of expertise to  
improve the code, documentation, and tutorials of the project.

We plan to discuss the progress of the toolkit at this year's  
Bioinformatics Open Source Conference held in Vienna, Austria in  
conjunction with the SIG meetings at ISMB.   We are trying to use  
BOSC 2007 as a chance for the developers of Open Bioinformatics  
Foundation sponsored and related projects to coordinate future  
development and release cycles.

Jason Stajich on behalf of the Core developers


From khan at cshl.edu  Mon Feb 26 15:29:19 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Mon, 26 Feb 2007 15:29:19 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791CA@mailbox02.cshl.edu>

Thanks Michael.  I have the scripts installed.  I can pass an id to indexed fasta file and retrieve the seq.  However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids?
Thanks.

-Sohail

-----Original Message-----
From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk]
Sent: Tuesday, February 20, 2007 4:33 PM
To: Khan, Sohail; Bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file.


Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Feb 26 16:44:49 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 26 Feb 2007 15:44:49 -0600
Subject: [Bioperl-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx>

Congrats Chris & Sendu! Very well-deserved. Keep up the great work.

Cheers!
Mauricio.

Jason Stajich wrote:
> Dear BioPerl Users and Developers,
> 
> I want to announce a addition in the leadership of BioPerl.   
> Christopher Fields and and Sendu Bala are now members of the BioPerl  
> Core developer group to recognize their ongoing leadership in the  
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
> release and have made a significant commitment and contribution to  
> the quality of the code and the documentation of the project.  We  
> have invited them to be part of the core to recognize their work and  
> to feel comfortable to ask them to do more. ;-)
> 
> The Core group was established to insure that someone was responsible  
> for making code releases, vetting new developers for CVS write  
> accounts, and generally dealing with things that might otherwise slip  
> through the cracks.  We are very excited to have more people  
> contributing to and maintaining the toolkit.  We look forward to  
> their help along with all the other developers, as we work towards a  
> 1.6 release release this year.
> 
> As always, while their is a need for some individuals to lead the  
> project, we encourage contributions from all levels of expertise to  
> improve the code, documentation, and tutorials of the project.
> 
> We plan to discuss the progress of the toolkit at this year's  
> Bioinformatics Open Source Conference held in Vienna, Austria in  
> conjunction with the SIG meetings at ISMB.   We are trying to use  
> BOSC 2007 as a chance for the developers of Open Bioinformatics  
> Foundation sponsored and related projects to coordinate future  
> development and release cycles.
> 
> Jason Stajich on behalf of the Core developers
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Tue Feb 27 08:26:30 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 27 Feb 2007 14:26:30 +0100
Subject: [Bioperl-l] parsing blast results
Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>

Hi,
I am using the module Bio::SearchIO to parse some blast results. I would
like to store the ids of the results into an array but I am not sure if this
is possible to do it with an existing subroutine. Does anyone have an idea
whether there is a method included within the module Bio::SearchIO to do so?
Thanks in advance,
L.Pardo


From cjfields at uiuc.edu  Tue Feb 27 09:11:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 08:11:37 -0600
Subject: [Bioperl-l] parsing blast results
In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
Message-ID: <E1B6ED22-1120-4333-AA73-19B57D102EA9@uiuc.edu>


On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote:

> Hi,
> I am using the module Bio::SearchIO to parse some blast results. I  
> would
> like to store the ids of the results into an array but I am not  
> sure if this
> is possible to do it with an existing subroutine. Does anyone have  
> an idea
> whether there is a method included within the module Bio::SearchIO  
> to do so?
> Thanks in advance,
> L.Pardo

Bio::SearchIO doesn't currently have a method to retrieve all the  
accessions in a BLAST result.  The best way to do this is to iterate  
through the objects:

my @accs;

while (my $result = $searchio->next_result) {
     while (my $hit = $result->next_hit) {
         push @accs, $hit->accession;
         # do whatever else here...
     }
}

print join ',', @accs;

I don't think all accessions in the description are parsed out at the  
moment, just the first one (or the one in the hit table).  If you  
want all of them or if you want the NCBI GI you'll need to parse them  
out of the description heading ($hit->description).

chris


From sac at bioperl.org  Tue Feb 27 12:59:22 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 27 Feb 2007 09:59:22 -0800
Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com>

Welcome to the club, Chris & Sendu. Always good to have an infusion of new
blood and capable, motivated hands.

Steve

On 2/26/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Dear BioPerl Users and Developers,
>
> I want to announce a addition in the leadership of BioPerl.
> Christopher Fields and and Sendu Bala are now members of the BioPerl
> Core developer group to recognize their ongoing leadership in the
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer
> release and have made a significant commitment and contribution to
> the quality of the code and the documentation of the project.  We
> have invited them to be part of the core to recognize their work and
> to feel comfortable to ask them to do more. ;-)
>
> The Core group was established to insure that someone was responsible
> for making code releases, vetting new developers for CVS write
> accounts, and generally dealing with things that might otherwise slip
> through the cracks.  We are very excited to have more people
> contributing to and maintaining the toolkit.  We look forward to
> their help along with all the other developers, as we work towards a
> 1.6 release release this year.
>
> As always, while their is a need for some individuals to lead the
> project, we encourage contributions from all levels of expertise to
> improve the code, documentation, and tutorials of the project.
>
> We plan to discuss the progress of the toolkit at this year's
> Bioinformatics Open Source Conference held in Vienna, Austria in
> conjunction with the SIG meetings at ISMB.   We are trying to use
> BOSC 2007 as a chance for the developers of Open Bioinformatics
> Foundation sponsored and related projects to coordinate future
> development and release cycles.
>
> Jason Stajich on behalf of the Core developers
>
> _______________________________________________
> Bioperl-announce-l mailing list
> Bioperl-announce-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l
>


From cjfields at uiuc.edu  Tue Feb 27 15:57:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 14:57:40 -0600
Subject: [Bioperl-l] Bio::SeqIO::FTHelper
Message-ID: <D6922F04-A349-41C4-B4DC-6763E3195B05@uiuc.edu>

Could anyone tell me what FTHelper is used for?  From what I gather  
it rolls up seqfeature data into a lightweight object but then  
creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ 
Swiss), which seems to be a waste of memory and time.  Is there  
something I'm missing (besides my sanity of course)?

chris


From Jay at jays.net  Wed Feb 28 04:39:55 2007
From: Jay at jays.net (Jay Hannah)
Date: Wed, 28 Feb 2007 03:39:55 -0600
Subject: [Bioperl-l] "Command-Line Bioinformatics"
Message-ID: <F7C1E903-1712-40A5-B817-8CDAADECEBF4@jays.net>

Reading this article:
http://www.linuxjournal.com/article/6977
Sequencing the SARS Virus - Linux Journal, Nov 2003

This guy needs Perl and/or BioPerl.  :)

> The sequence file is in FASTA format consisting of a header line  
> and the sequence, split into fixed-width lines. The following  
> counts the number of Gs and Cs in the sequence and presents the  
> total as a fraction of the total number of bases:
>
> > grep -v "^>" AY274119.fa | fold -w 1 |
> tr "ATGC" "..xx" | sort | uniq -c |
> sed 's/[^0-9]//g' | t -s "\012" " " |
> sed 's/\([0-9]*\) \([0-9]*\)/scale = 3;
> ?\2 \/ (\1+\2)/' |
> bc -i
> scale = 3; 12127 / (17624+12127)
> .407
>
> Out of the 29,751 bases in our sequence, 12,127 are either G or C,  
> giving a GC content of 41%.

BioPerl version:

use Bio::SeqIO;
my $io = Bio::SeqIO->new(
   -file   => 'AY274119.fa',
   -format => 'Fasta'
);
my $seq = $io->next_seq->seq;
print ( ($seq =~ tr/GC/GC/) / length ($seq) );

Command-line Perl:

perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ /  
length($_)' AY274119.fa

I'm sure you can Perl Golf my stabs at it.  :)

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From n.saunders at uq.edu.au  Wed Feb 28 05:25:08 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:25:08 +1000
Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E55884.9010908@uq.edu.au>

Dear Bioperlers,

I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used 
in a CGI script.  Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7.

If I load this test CGI script (cgi.pl) in a browser:

BEGIN CODE
----------
#!/usr/bin/perl -Tw
use strict;
use CGI;
use Bio::Factory::EMBOSS;

my $cgi = new CGI;
my $f   = new Bio::Factory::EMBOSS;

print $cgi->header,
       $cgi->start_html,
       $cgi->end_html;
--------
END CODE

I get a 500 server error and the Apache error log reads:
[error] [client 192.168.0.3] Premature end of script headers: cgi.pl

I can fix this in 2 ways:

(1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, 
which isn't a very useful fix.
(2) Remove the -T switch from the shebang line

There seem to be a few old posts on the list regarding "taint-safe" modules.  It 
seems that the new Bio::Factory::EMBOSS object is interfering with the headers 
in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From n.saunders at uq.edu.au  Wed Feb 28 05:30:31 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:30:31 +1000
Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E559C7.1090308@uq.edu.au>

Further to my previous email, adding:

BEGIN {
     $|=1;
     print "Content-type: text/html\n\n";
     use CGI::Carp('fatalsToBrowser');
}

to my CGI script generates:

Insecure $ENV{PATH} while running with -T switch at 
/usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From n.saunders at uq.edu.au  Wed Feb 28 05:50:58 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:50:58 +1000
Subject: [Bioperl-l] CGI taint solved
Message-ID: <45E55E92.10608@uq.edu.au>

Apologies for running a one-man thread, but I realised that I've now answered my 
own question regarding errors with CGI, Bio::Factory::EMBOSS and taint.

Given that the EMBOSS binaries are in /usr/local/bin, adding:

$ENV{'PATH'} = '/usr/local/bin'

near the top of the script does the trick.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From cjfields at uiuc.edu  Wed Feb 28 08:39:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 07:39:24 -0600
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <45E55E92.10608@uq.edu.au>
References: <45E55E92.10608@uq.edu.au>
Message-ID: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>

That could possibly clobber any other program calls from within the  
same script (unless they reside in /usr/local/bin) since you're  
explicitly assigning PATH, not appending:

$ENV{"PATH"} = '/usr/local/bin';

gets me (printing $ENV{"PATH"}):

/usr/local/bin

whereas this:

$ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};

gets me:

/usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin

There's probably a File::* module that does this safely per OS flavor.

chris

On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:

> Apologies for running a one-man thread, but I realised that I've  
> now answered my
> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
> taint.
>
> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>
> $ENV{'PATH'} = '/usr/local/bin'
>
> near the top of the script does the trick.
>
>
> Neil
> -- 
>   School of Molecular and Microbial Sciences
>   University of Queensland
>   Brisbane 4072 Australia
>
> http://nsaunders.wordpress.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Feb 28 10:35:31 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 10:35:31 -0500
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
References: <45E55E92.10608@uq.edu.au>
	<E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
Message-ID: <45E5A143.3080303@bms.com>

Neil, I believe this is your situation:
http://wn.cyberwerks.com/2000/0411.html
my advice: any commands executed from within cgi script should have a 
path hardcoded whenever possible.
If those commands require different path, try writing a wrapper shell 
script that sets the environment (which should be reset to the default 
once the shell script terminates). It all also depends on the type of 
environment you have- it it is not secure you may wish to think hard how 
to eliminate all security loopholes with CGI, I am definitely not an 
expert on this.
Stefan

Chris Fields wrote:
> That could possibly clobber any other program calls from within the  
> same script (unless they reside in /usr/local/bin) since you're  
> explicitly assigning PATH, not appending:
>
> $ENV{"PATH"} = '/usr/local/bin';
>
> gets me (printing $ENV{"PATH"}):
>
> /usr/local/bin
>
> whereas this:
>
> $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};
>
> gets me:
>
> /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
> local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
>
> There's probably a File::* module that does this safely per OS flavor.
>
> chris
>
> On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:
>
>   
>> Apologies for running a one-man thread, but I realised that I've  
>> now answered my
>> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
>> taint.
>>
>> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>>
>> $ENV{'PATH'} = '/usr/local/bin'
>>
>> near the top of the script does the trick.
>>
>>
>> Neil
>> -- 
>>   School of Molecular and Microbial Sciences
>>   University of Queensland
>>   Brisbane 4072 Australia
>>
>> http://nsaunders.wordpress.com
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From lubapardo at gmail.com  Wed Feb 28 12:21:07 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Wed, 28 Feb 2007 18:21:07 +0100
Subject: [Bioperl-l] retrieven ids
Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>

Hi everyone,
I wonder if someone could give an advice of the following:
I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not
want to translate the protein back to DNA, but rather get the DNA coding
sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any
module that allow to get all possible ids for a sequence given a gi protein
?

Thank you very much in advance,
L. Pardo


From johnston at biochem.ucl.ac.uk  Wed Feb 28 12:05:49 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT)
Subject: [Bioperl-l] _rearrange
Message-ID: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>

hi,

Is there a discussion of the rationale behind the _rearrange method
somewhere? I'm probably just being gormless, but I think I'm missing the
point a bit.

Is it okay for a method just to expect named params like
->foo(arg1=>'stuff', arg2=>'things'); ?

Cxx


From ckuanglim at yahoo.com  Wed Feb 28 10:51:50 2007
From: ckuanglim at yahoo.com (Chan Kuang Lim)
Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST)
Subject: [Bioperl-l] Problem of Installing Bioperl
Message-ID: <459942.77644.qm@web60518.mail.yahoo.com>

I have problem of installing bioperl in windows using command-line installation.
In the cmd windows, after 
ppm-shell
search bioperl
install 2

many downloading had done, but the next line is:
Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz


Hope you can answer my question. Thank you.

Regards,
Chan Kuang Lim
Malaysia

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From cjfields at uiuc.edu  Wed Feb 28 13:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 12:30:45 -0600
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu>

 From what I gather it's a convenient utility method that is used for  
consistent and enforced parameter checking/setting for any method,  
including the constructor.

There are a few modules that don't use _rearrange (Bio::WebAgent::new 
() comes to mind).  It's not required that you use it but the naming  
conventions for parameters outlined in _rearrange (in  
Bio::Root::RootI POD) are generally enforced for consistency across  
classes.

As a note, Sendu has committed a related method (_set_from_args) to  
CVS which works rather well, but I don't think it is in the last  
release.

chris

On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote:

> hi,
>
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm  
> missing the
> point a bit.
>
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?
>
> Cxx
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Wed Feb 28 14:31:29 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST)
Subject: [Bioperl-l] retrieven ids
In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu>

Whenever I'm unsure of how to do something, I first look to see if one of
the  HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has
example code which I think will do what you want.

Genbank records typically have the coding sequence of a protein as a
feature, so I would do something like:

- use the RefSeq protein IDs to query Entrez and get back the Genbank
records.

- read the Features HOWTO to refresh my memory on the syntax for grabbing
features.

That HOWTO is at:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

- whip up a little script to loop through the Genbank records one at a
time with SeqIO and pull out the cDNA sequence features.


Dave


From bix at sendu.me.uk  Wed Feb 28 14:38:46 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 19:38:46 +0000
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <45E5DA46.3020503@sendu.me.uk>

Caroline Johnston wrote:
> hi,
> 
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm missing the
> point a bit.
> 
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?

The Bioperl style for named args is -arg1, and wrong case is allowed as 
well. So, make use of _rearrange; it won't do you any harm.


From johnsonm at gmail.com  Wed Feb 28 14:59:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 13:59:09 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark
	and Glimmer
Message-ID: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>

    I happen to need something like Bio::Tools::Run::Genemark, so I'm coding
one up.  When I started on the tests for it, I realized I have a problem.  I
can distribute a fasta file downloaded from GenBank to use as input, but I
can't distribute the model file needed to actually run Genemark (
Genemark.hmm for prokaryotes, gmhmmp, in my case).
    It took *forever* to get a license, and I'm not thrilled with the
prospect of talking them out of a redistributable model file.  I'd love to
distribute the test, but I don't see how I'm going to be able to.
Suggestions?
    Also, I've settled on IPC::Run instead of system().  The docs indicate
the bits of it I'm using should be OK on Windows, except maybe for Win9X.
I don't want to clutter up the console, I don't like embedding stdout/stderr
redirection in command strings, and I don't want to have to worry about
signal handling (What if the child catches a ctrl-c halfway through
parsing?  What if the parent does?).  Anybody object to that?
   One final thing.  I'm lazy, I don't want to deal with parsing arguments
to the constructor, so I'm just calling _rearrange() to deal with it.  The
Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
stuff in Bio::Tools::Run:: takes dashless args.  Objections?


From dmessina at wustl.edu  Wed Feb 28 15:14:56 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST)
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>

> I'm not thrilled with the prospect of talking them out of a
redistributable
> model file.

I suppose it's not possible to fake your own, or at least the parts of it
you're testing for?

If not, I'd put the tests in a skip block while waiting to hear from the
Genemark folks.


> The Bio::Tools:: parsers all take dash options, but it looks like a
bunch of
> the stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu will chime in I'm sure, but I think he was planning to switch
everything  in Bio::Tools::Run over to dashed args anyway...


Dave


From bix at sendu.me.uk  Wed Feb 28 15:52:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 20:52:23 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <45E5EB87.9020106@sendu.me.uk>

Mark Johnson wrote:
>    One final thing.  I'm lazy, I don't want to deal with parsing arguments
> to the constructor, so I'm just calling _rearrange() to deal with it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby 
for an example.


From bix at sendu.me.uk  Wed Feb 28 16:29:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 21:29:32 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
Message-ID: <45E5F43C.9080902@sendu.me.uk>

I have GD 2.35 and GD::SVG 2.33 installed.

I have a working script in which a Bio::Graphics::Panel object is made 
and output with:

print $panel->png;

This is fine. Changing it to:

print $panel->svg;

Gives the error:

Can't locate object method "svg" via package "GD:Image" at 
/.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.


Am I supposed to do something else to get this to work?


Cheers,
Sendu.


From crabtree at tigr.ORG  Wed Feb 28 16:40:52 2007
From: crabtree at tigr.ORG (Jonathan Crabtree)
Date: Wed, 28 Feb 2007 16:40:52 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F6E4.80003@tigr.org>


Sendu-

I believe you must set 'image_class' to 'GD::SVG' when you create the 
Panel (and note that older versions of Bio::Graphics::Panel don't know 
anything about this parameter.)  Here's the relevant part of the Panel 
perldoc:

   -image_class To create output in scalable vector
                graphics (SVG), optionally pass the image
                class parameter 'GD::SVG'. Defaults to
                using vanilla GD. See the corresponding
                image_class() method below for details.

Jonathan


Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
> 
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
> 
> print $panel->png;
> 
> This is fine. Changing it to:
> 
> print $panel->svg;
> 
> Gives the error:
> 
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
> 
> 
> Am I supposed to do something else to get this to work?
> 
> 
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Feb 28 17:01:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 22:01:17 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F6E4.80003@tigr.org>
References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org>
Message-ID: <45E5FBAD.3030404@sendu.me.uk>

Jonathan Crabtree wrote:
> 
> Sendu-
> 
> I believe you must set 'image_class' to 'GD::SVG' when you create the 
> Panel (and note that older versions of Bio::Graphics::Panel don't know 
> anything about this parameter.)  Here's the relevant part of the Panel 
> perldoc:

... Oh! I had no idea there was any perldoc for these modules, hiding 
down there at the bottom. Does anyone want to intersperse the docs?...


From cjfields at uiuc.edu  Wed Feb 28 17:10:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 16:10:54 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>

On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote:

>     I happen to need something like Bio::Tools::Run::Genemark, so  
> I'm coding
> one up.  When I started on the tests for it, I realized I have a  
> problem.  I
> can distribute a fasta file downloaded from GenBank to use as  
> input, but I
> can't distribute the model file needed to actually run Genemark (
> Genemark.hmm for prokaryotes, gmhmmp, in my case).
>     It took *forever* to get a license, and I'm not thrilled with the
> prospect of talking them out of a redistributable model file.  I'd  
> love to
> distribute the test, but I don't see how I'm going to be able to.
> Suggestions?

For bioperl-run tests you have to have the program installed for  
tests to work (otherwise they are passed over).  Therefore one would  
assume if you had the GeneMark program you would have the models as  
well.

You could set up your module to require an env. variable be set (like  
the HMMER module, for instance) which contains the executables and/or  
the models, so that if it isn't set the tests are skipped.

>     Also, I've settled on IPC::Run instead of system().  The docs  
> indicate
> the bits of it I'm using should be OK on Windows, except maybe for  
> Win9X.
> I don't want to clutter up the console, I don't like embedding  
> stdout/stderr
> redirection in command strings, and I don't want to have to worry  
> about
> signal handling (What if the child catches a ctrl-c halfway through
> parsing?  What if the parent does?).  Anybody object to that?

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?   
Otherwise we'll need to add it to the optional dependencies for  
bioperl-run.

>    One final thing.  I'm lazy, I don't want to deal with parsing  
> arguments
> to the constructor, so I'm just calling _rearrange() to deal with  
> it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a  
> bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in  
another thread _rearrange() works as well.

chris


From johnsonm at gmail.com  Wed Feb 28 17:29:36 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:29:36 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
Message-ID: <ebf5eb170702281429u51e8f7fgb9c0591a410500f8@mail.gmail.com>

On 2/28/07, Dave Messina <dmessina at wustl.edu> wrote:
>
> > I'm not thrilled with the prospect of talking them out of a
> redistributable model file.
>
> I suppose it's not possible to fake your own, or at least the parts of it
> you're testing for?


We got a gzipped tarball with some model files and a precompiled executable
(gmhmmp).  As far as building a model file goes, I don't even have two
sticks to rub together.  I suppose it's possible that it's not actually some
weird proprietary format, I'll go dig for some docs...but I don't hold out a
lot of hope.


From sukhinder.sandhu at osumc.edu  Wed Feb 28 16:49:31 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Wed, 28 Feb 2007 16:49:31 -0500
Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx
Message-ID: <C20B631B.1E0%sukhinder.sandhu@osumc.edu>

Hi
I am having trouble installing Bundle::BioPerl through CPAN. I don't know if
this has something to do with my having root priveleges. Can you please
suggest how may I proceed to get over this. I shall really appreciate any
help. I am pasting part of the error it keeps giving after trying to install
every module.
######################
CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz

make: *** No rule to make target
`/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h',
needed by `Makefile'.  Stop.
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible

###############################
Thanks

sukhinder


From sukhinder.sandhu at osumc.edu  Tue Feb 27 23:41:43 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Tue, 27 Feb 2007 23:41:43 -0500
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
Message-ID: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>

Hi
I am trying to install bioperl on my MACOSX and having problems. I try to
following the instructions both at the www.tc.umn.edu..... And in the README
and INSTALL files in the bioperl folder that I downloaded.
The error I get is the following: (end part of the output is copied)
####################
t/versions........ok
t/xs..............skipped
        all skipped: C_support not enabled
Failed Test Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/compat.t     5  1280    60    5   8.33%  25-28 31
4 tests and 31 subtests skipped.
Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay.
make: *** [test] Error 2
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force
Couldn't install Module::Build, giving up.
BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51.
Compilation failed in require at Build.PL line 14.
BEGIN failed--compilation aborted at Build.PL line 14.
###########################################################################
I am not able to figure out whats' going wrong.
And when I try to run the CPAN, I get the follwing error. I have no idea how
to fix these. Any help is greatly appreciated.
############################################################################
[Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e
shell  Terminal does not support AddHistory.

There seems to be running another CPAN process (pid 7207).  Contacting...
Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
    On UNIX try:
    rm /Users/sand60/.cpan/.lock
  and then rerun us.
 at -e line 1
###################################################
And doing what it says, removing some lock file doesn't help. I am wondering
if all this has something to do with having root priveleges on the system
and if so , is there an alternative? Thanks


sukhinder


From stefan.kirov at bms.com  Wed Feb 28 16:44:05 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 16:44:05 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F7A5.3090805@bms.com>

I think you should create the object with -image_class='svg'. Can you 
post the code with wich you create the object?
Stefan

Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
>
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
>
> print $panel->png;
>
> This is fine. Changing it to:
>
> print $panel->svg;
>
> Gives the error:
>
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
>
>
> Am I supposed to do something else to get this to work?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From johnsonm at gmail.com  Wed Feb 28 17:54:02 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:54:02 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
Message-ID: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>

On 2/28/07, Chris Fields <cjfields at uiuc.edu> wrote:

> For bioperl-run tests you have to have the program installed for
> tests to work (otherwise they are passed over).  Therefore one would
> assume if you had the GeneMark program you would have the models as
> well.
>
> You could set up your module to require an env. variable be set (like
> the HMMER module, for instance) which contains the executables and/or
> the models, so that if it isn't set the tests are skipped.


Sounds like a plan.

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?
> Otherwise we'll need to add it to the optional dependencies for
> bioperl-run.


I'd test it, but I don't have access to any Win9x boxes anymore.  IPC::Run
is not a core module, but I think it's worth the dependency.  I considered
IPC::Open3, but it can't be made reliable on Win32, something about not
being able to select() on filehandles, only sockets.  I also looked at
IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection
layered on top of system().  I don't like using system() due to issues with
signals (Such as the user hitting ctrl-c and taking out the child).  I feel
better knowing the wrapped executable is in another process disconnected
from the console.

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in
> another thread _rearrange() works as well.


I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
preferred to the other?


From bix at sendu.me.uk  Wed Feb 28 19:13:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 01 Mar 2007 00:13:29 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules
 for	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
Message-ID: <45E61AA9.9030906@sendu.me.uk>

Mark Johnson wrote:
> I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
> preferred to the other?

_set_from_args() is implemented using _rearrange() iirc. In any case, 
they do different things but _set_from_args() just makes creating 
wrapper modules a lot simpler. Another example: compare revisions 1.15 
and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it 
to use _set_from_args() and _setparams().

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h

So, its new, but I'd recommend new modules, especially wrappers, make 
use of it.


From bix at sendu.me.uk  Wed Feb 28 19:19:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 01 Mar 2007 00:19:29 +0000
Subject: [Bioperl-l] Problem of Installing Bioperl
In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com>
References: <459942.77644.qm@web60518.mail.yahoo.com>
Message-ID: <45E61C11.90806@sendu.me.uk>

Chan Kuang Lim wrote:
> I have problem of installing bioperl in windows using command-line installation.
> In the cmd windows, after 
> ppm-shell
> search bioperl
> install 2
> 
> many downloading had done, but the next line is:
> Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz

Does that file exist on your system? Is it larger than 0kb? Can you open 
it yourself?


From cjfields at uiuc.edu  Wed Feb 28 20:19:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 19:19:31 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules
	for	Genemark and Glimmer
In-Reply-To: <45E61AA9.9030906@sendu.me.uk>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
	<45E61AA9.9030906@sendu.me.uk>
Message-ID: <93734147-BDDE-4D73-B8F1-FB4A9D073F9B@uiuc.edu>


On Feb 28, 2007, at 6:13 PM, Sendu Bala wrote:

> Mark Johnson wrote:
>> I'm using _rearrange() now.  I'll look at _set_from_args().  Is  
>> either one
>> preferred to the other?
>
> _set_from_args() is implemented using _rearrange() iirc. In any case,
> they do different things but _set_from_args() just makes creating
> wrapper modules a lot simpler. Another example: compare revisions 1.15
> and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it
> to use _set_from_args() and _setparams().
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/ 
> Alignment/Lagan.pm.diff? 
> r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h
>
> So, its new, but I'd recommend new modules, especially wrappers, make
> use of it.

Agreed; I think it allows for parameter variations (dashed, dashless,  
etc) and can create on-the-fly simple get/setters, so is particularly  
suited for wrappers.

_rearrange() will always have use in situations where using named  
parameters helps (long arg lists) but you don't want get/setters,  
just values.


From dmessina at wustl.edu  Wed Feb 28 20:40:39 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 19:40:39 -0600 (CST)
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
In-Reply-To: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>
References: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>
Message-ID: <58485.75.33.119.169.1172713239.squirrel@gscmail.wustl.edu>

> t/compat.t     5  1280    60    5   8.33%  25-28 31

This is the test that failed. I think you snipped the part above where the
actual errors causing the failure was printed.


> There seems to be running another CPAN process (pid 7207). Contacting...
> Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
>     On UNIX try:
>     rm /Users/sand60/.cpan/.lock
>   and then rerun us.
>  at -e line 1
> ###################################################
> And doing what it says, removing some lock file doesn't help.

Are you sure the lock file is really being removed? If so, what was the
error you got when running it after doing that?


Also, this line is important:
>  /usr/bin/make test -- NOT OK

It looks like you're trying to install on OS X. By default, OS X has perl
but not make. So /usr/bin/make probably doesn't exist on your system,
along with lots of other UNIX tools you'll want. To verify this, type:

which /usr/bin/make

on the command line. If you get:
/usr/bin/make: Command not found.

you'll need to install the OS X developer tools, called Xcode. You'll need
to register first, but you can get the latest version at:
http://developer.apple.com/tools/download/

After you do that, reread the BioPerl install docs and try to install
again. Since you don't have root on your machine, be sure to read the part
of the install instructions that describe what to do.


Dave


From hlapp at gmx.net  Wed Feb 28 23:16:38 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 28 Feb 2007 23:16:38 -0500
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
Message-ID: <EE9CB4BA-3C6C-4F38-85DB-E0A21FCD8B07@gmx.net>


On Feb 28, 2007, at 5:54 PM, Mark Johnson wrote:

> I don't like using system() due to issues with
> signals (Such as the user hitting ctrl-c and taking out the  
> child).  I feel
> better knowing the wrapped executable is in another process  
> disconnected
> from the console.

I'm not sure how the user would be able to take out the child hitting  
ctrl-c if you run it through system() (except if the parent  
terminates first - but maybe then terminating a run-away child is in  
good order).

I haven't read the IPC::run POD in full detail but you will want to  
make sure that if the parent gets killed the child does get killed  
too, or otherwise you'll have a run-away process that novices will  
have trouble with understanding or terminating.

Other than that though IPC::run seems like a useful module, so  
incurring this as a dependency should be fine.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cuiw at ncbi.nlm.nih.gov  Thu Feb  1 09:47:38 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Thu, 1 Feb 2007 09:47:38 -0500
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov>

This is a simple test from gene ID 3632373 (protein is 46100068) to
contig coordinates: 

perl -MLWP::Simple -e 'map {print $_, "\n" if
/<(Gene-source_src.*?>)(.*)?<$1/} (split "\n",
get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i
d=3632373&retmode=xml}))'

You need to translate protein id to gene id though. 

If the genome is available at Map Viewer, try (the contig name is
NW_101115 from last step)
http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA
PS=genes&cmd=txt

Wenwu Cui, PhD

-----Original Message-----
From: Rainer Machne [mailto:raim at tbi.univie.ac.at] 
Sent: Wednesday, January 31, 2007 4:10 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Dear Bioperl list,

hoping not be on the wrong email list, i would have a short question:

Is there a standard way or are there nice (Bioperl) tools to come from a

gene id (gi) other ids (see below) to the genomic coordinates of the 
respective gene?

We have Fasta files retrieved from NCBI protein Blast in fungal genomes:

 >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago 
maydis 521]
or
 >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida
glabrata]

(we only have gi, ref and gb in my set).

I retrieved all my fasta files from whole fungal genomes with available 
protein sequences at
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi

As I only searched whole finished genomes (not shotgun), I thought it 
would then be easy to get the genomic coordinates and retrieve upstream 
sequences, but we have failed so far to find a consistent way to do this

automatically. Many of the gi entries refer to mRNAs or partial mRNAs 
and the way to the coordinates seems to differ for each case.

Any suggestions would be appreciated.

with kind regards,
Rainer Machne

University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From raim at tbi.univie.ac.at  Thu Feb  1 07:54:21 2007
From: raim at tbi.univie.ac.at (Rainer Machne)
Date: Thu, 01 Feb 2007 13:54:21 +0100
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at>

Barry and Jason,

thanks for your quick and very helpful replies.

I guess we should have done (or repeat) our blast search at 
http://fungal.genome.duke.edu/
to get better mapping from proteins to genomes ?

As I retrieved all my proteins via whole genome blasts we should find 
(most of) them in the genbank files ... a good opportunity for me to 
learn some Bioperl and the other packages you mentioned in case we want 
to do more complex analysis later :-)

Thank you very much!

Rainer


Barry Moore wrote:
> Rainer,
> 
> We use a perl library called CGL written by Mark Yandell and  colleagues 
> (which in turn uses Chris Mungal's BioChaos and  Unflattener.pm referred 
> to by Jason) for this type of task.  The  basic pipeline is convert 
> GenBank files to Chaos XML, then use CGL  with those XML files to get a 
> nice object oriented access to exons,  transcripts, proteins, 
> coordinates and more for of those genes.  I am  currently using this 
> with good success on most GenBank genomes  (unfortunately I haven't been 
> working with the fungal genomes, but it  should work fine).  The Ensembl 
> API provides similar functionality  for Ensembl genomes - but not very 
> many fungi there.
> 
> http://www.yandell-lab.org/cgl/
> http://www.ensembl.org/info/software/core/core_tutorial.html
> 
> Feel free to contact Mark or myself  directly if you are interested  in 
> using CGL.
> 
> Barry
> 
> On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:
> 
>> Dear Bioperl list,
>>
>> hoping not be on the wrong email list, i would have a short question:
>>
>> Is there a standard way or are there nice (Bioperl) tools to come  from a
>> gene id (gi) other ids (see below) to the genomic coordinates of the
>> respective gene?
>>
>> We have Fasta files retrieved from NCBI protein Blast in fungal  genomes:
>>
>>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
>>
>> maydis 521]
>> or
>>
>>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida  glabrata]
>>
>>
>> (we only have gi, ref and gb in my set).
>>
>> I retrieved all my fasta files from whole fungal genomes with  available
>> protein sequences at
>> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
>>
>> As I only searched whole finished genomes (not shotgun), I thought it
>> would then be easy to get the genomic coordinates and retrieve  upstream
>> sequences, but we have failed so far to find a consistent way to do  this
>> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
>> and the way to the coordinates seems to differ for each case.
>>
>> Any suggestions would be appreciated.
>>
>> with kind regards,
>> Rainer Machne
>>
>> University of Vienna
>> Department for Theoretical Chemistry
>> Theoretical Biochemistry Group
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu Feb  1 12:55:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 11:55:27 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
Message-ID: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>


On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:

> Barry and Jason,
>
> thanks for your quick and very helpful replies.
>
> I guess we should have done (or repeat) our blast search at
> http://fungal.genome.duke.edu/
> to get better mapping from proteins to genomes ?
>
> As I retrieved all my proteins via whole genome blasts we should find
> (most of) them in the genbank files ... a good opportunity for me to
> learn some Bioperl and the other packages you mentioned in case we  
> want
> to do more complex analysis later :-)
>
> Thank you very much!
>
> Rainer

If the data is available in GenBank you could run the BLAST searches  
at NCBI and limit the search with an Entrez query:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query

Most (all?) genome files are tagged as complete

I'm not sure but there might be a way of doing this via  
Bio::Tools::Run::RemoteBlast.  Jason, any ideas?

chris


From cjfields at uiuc.edu  Thu Feb  1 13:09:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 12:09:16 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu>

> If the data is available in GenBank you could run the BLAST searches
> at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete

sorry, didn't finish that...

"Most (all?) genome files are tagged as complete, wgs, in progress,  
etc. and can be limited by taxonomy using Fungi[ORGN] or similar."

chris


From jason at bioperl.org  Thu Feb  1 13:36:02 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 10:36:02 -0800
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <D8E2FDBC-AA2E-4EB9-8CB1-F3610776B41C@bioperl.org>


On Feb 1, 2007, at 9:55 AM, Chris Fields wrote:

>
> On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:
>
>> Barry and Jason,
>>
>> thanks for your quick and very helpful replies.
>>
>> I guess we should have done (or repeat) our blast search at
>> http://fungal.genome.duke.edu/
>> to get better mapping from proteins to genomes ?
>>

Well I'm not quite sure of your exact goals.  To find upstream  
regions of known genes, or look at upstream regions of orthologous  
genes?

You can first figure out orthologs based on protein similarities,  
then go in an extract upstream regions for the orthologous genes (I  
provide a link to a big all-vs-all FASTA result at the bottom of the  
page if you want those results, as well as some pairiwise orthology  
assignments, although you may want more or less stringent parameters).

All the GFF and AA data is freely available for download on the site  
for each genome we've annotated or for annotation we've re-formatted  
so you can do things locally and/or modify it to your liking.


>> As I retrieved all my proteins via whole genome blasts we should find
>> (most of) them in the genbank files ... a good opportunity for me to
>> learn some Bioperl and the other packages you mentioned in case we  
>> want
>> to do more complex analysis later :-)
>>
>> Thank you very much!
>>
>> Rainer
>
> If the data is available in GenBank you could run the BLAST  
> searches at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete
>
> I'm not sure but there might be a way of doing this via  
> Bio::Tools::Run::RemoteBlast.  Jason, any ideas?
>
> chris

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From reenayadav at gmail.com  Thu Feb  1 13:38:03 2007
From: reenayadav at gmail.com (Reena Yadav)
Date: Fri, 2 Feb 2007 00:08:03 +0530
Subject: [Bioperl-l] pdb parser
Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com>

hi need to extract pdb atomic coordinates (1ake), and do certain
calculations.
i am going stepwise:
steps that involved are:
(1) reading the atomic coordinates
(2) read the result in a file.

need to understand how to whole xyz line in another file.
could someone help.
R.


From jason at bioperl.org  Thu Feb  1 08:06:42 2007
From: jason at bioperl.org (sandhya khatal)
Date: Thu, 1 Feb 2007 13:06:42 +0000
Subject: [Bioperl-l] Regarding Bioperl program
Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com>

Respected Sir,
                      I want to do a program which gives dendrogram like
UPGMA a clustering method, but i want this dendrogram by using single
linkage or centroid method.Can u help me for this.U have given the  
code for
tree but i want dendrogram as output by using above any method.

Thanks for anticipating.

Regards,
Sandhya Khatal.


From jason at bioperl.org  Thu Feb  1 19:55:26 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 16:55:26 -0800
Subject: [Bioperl-l] Fwd: Regarding Bioperl program
References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com>
Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org>

re-forwarding Sandhya's email to the list so the email address is  
visible.

The approach that is coded in bioperl is for distance based data such  
as evolutionary distance of DNA or protein sequences - I assume you  
are talking about clustering expression data? You may want to focus  
on the available literature and toolkits that focus on expression  
data - something BioPerl doesn't deliberately focus on right now.

-jason
Begin forwarded message:

> From: "sandhya khatal" <sandhya.khatal at gmail.com>
> Date: February 1, 2007 5:06:42 AM PST
> To: jason at bioperl.org
> Subject: Regarding Bioperl program
>
> Respected Sir,
>                      I want to do a program which gives dendrogram  
> like
> UPGMA a clustering method, but i want this dendrogram by using single
> linkage or centroid method.Can u help me for this.U have given the  
> code for
> tree but i want dendrogram as output by using above any method.
>
> Thanks for anticipating.
>
> Regards,
> Sandhya Khatal.

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From lzhtom at hotmail.com  Thu Feb  1 22:20:10 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:20:10 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F24A936E35D7C6B9059EE3CC79B0@phx.gbl>


_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From lzhtom at hotmail.com  Thu Feb  1 22:27:39 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:27:39 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>

Sorry guys, the former empty mail was sent out by mistake.

I'm using Bio::index::Fasta to index a file containing lots of sequences in 
fasta format. All is fine except one thing.

According to the bioperl tutorial and the documents, the following code 
will make a indexed file:

my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
                                     -write_flag => 1);
    $inx->make_index("test.fasta");

And in another script I can access the indexed file by sayinig

$ENV{BIOPERL_INDEX} = "."; # find index in current directory
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
my $seq=$inx->fetch("ent1001");        #fetch the sequence named ent1001

However, after running the first script, I cannot find a new file 
test.fasta.idx in my current directory. And not surprisingly, when I ran 
the second script, perl told me it couldn't find "test.fasta.idx".

What's going on here?

Thanks a lot!

_________________________________________________________________
???????????????????????????? MSN Messenger:  http://messenger.msn.com/cn  


From jason at bioperl.org  Fri Feb  2 01:24:44 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 22:24:44 -0800
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
References: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
Message-ID: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>

I don't think BIOPERL_INDEX does anything in the module so that  
documentation is not quite right.  the env variable is used in the  
scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job  
went bad somewhere.

you need to specify the full path you want with -filename - you can  
just prepen the BIOPERL_INDEX to the filename like.
-filename => $ENV{BIOPERL_INDEX}."/$index"

-jason
On Feb 1, 2007, at 7:27 PM, zhihua li wrote:

> Sorry guys, the former empty mail was sent out by mistake.
>
> I'm using Bio::index::Fasta to index a file containing lots of  
> sequences in fasta format. All is fine except one thing.
>
> According to the bioperl tutorial and the documents, the following  
> code will make a indexed file:
>
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
>                                     -write_flag => 1);
>    $inx->make_index("test.fasta");
>
> And in another script I can access the indexed file by sayinig
>
> $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> my $seq=$inx->fetch("ent1001");        #fetch the sequence named  
> ent1001
>
> However, after running the first script, I cannot find a new file  
> test.fasta.idx in my current directory. And not surprisingly, when  
> I ran the second script, perl told me it couldn't find  
> "test.fasta.idx".
>
> What's going on here?
>
> Thanks a lot!
>
> _________________________________________________________________
> ?????????????? MSN Messenger:  http:// 
> messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From marian.thieme at lycos.de  Fri Feb  2 05:06:09 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 10:06:09 +0000
Subject: [Bioperl-l] seqDiff
Message-ID: <101051013116870@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/cb3feed1/attachment-0003.html>

From marian.thieme at lycos.de  Fri Feb  2 06:37:05 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 11:37:05 +0000
Subject: [Bioperl-l] susp. header
Message-ID: <188661178024725@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/d3c3535c/attachment-0003.html>

From lubapardo at gmail.com  Fri Feb  2 09:31:06 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 2 Feb 2007 15:31:06 +0100
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>

Hello, (I am using bioperl-1.5.2_100, linux machine)
I am trying to get the ids of a list of genes using the module
Bio::DB::Query:GenBank. I have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n";
my @a1=<READER_1>;
close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives me
the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards,
Luba Pardo


From hlapp at gmx.net  Fri Feb  2 10:44:02 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:44:02 -0500
Subject: [Bioperl-l] susp. header
In-Reply-To: <188661178024725@lycos-europe.com>
References: <188661178024725@lycos-europe.com>
Message-ID: <EE6A34C7-0579-487E-B529-1F82E714793D@gmx.net>

You are sending HTML emails. You should configure your mailer to  
ideally just send plain text. If you really must have fancy formatted  
emails (i.e., HTML-formatted emails), then configure it such that the  
mailer will send a plain text and a HTML version.

(Many spam filters will flag email the body of which consists of only  
an HTML attachment.)

	-hilmar

On Feb 2, 2007, at 6:37 AM, marian thieme wrote:

> why each message I sent to this list is considered to have a susp.  
> header ?
>
> Marian
>
>  Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit  
> 20 Singles aus Ihrer Umgebung.Meetic.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 11:03:16 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 11:03:16 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <1170432196.2706.661.camel@localhost.localdomain>

Hi Hilmar,

That is a good idea; when I started down this road, it felt like there
would only be a few things that I might want to allow to be different,
but I think you are right that having one standard implementation that
can be subclassed for legacy systems is a good thing.

Scott


On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> 
> > The second main change was to introduce a -flybase_compat argument  
> > when
> > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> > (that are compatable with flybase) will be used, but now the default
> > will be to use current standards:
> 
> Just my $0.02 ... obviously, Flybase may be the only organization  
> that uses an 'old style' or any other way not compliant with 'current  
> standards' (presumably SO), but if it's not the only one then this  
> approach won't scale.
> 
> Also, an argument -flybase_compat suggests to the unsuspecting that  
> this is an endorsed flavor of the standard and fine to use for  
> everyone else too.
> 
> If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
> compliant with the standard as we all want it, keep it free from  
> litter caused by usage of old versions of SO, and create a second  
> module fb-chadoxml.pm that inherits from the first and merely  
> overrides a few things so that it works for Flybase. This way, other  
> organizations with similar needs can follow the path and create their  
> own xyz-chadoxml.pm, rather than having to muck around in the  
> chadoxml.pm that comes with the distribution.
> 
> I'm not sure I fully grasp the underlying issue, so I may not make  
> much sense here. Apologies if that's the case ...
> 
> 	-hilmar
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/2488afc4/attachment-0003.bin>

From bosborne11 at verizon.net  Fri Feb  2 10:27:44 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 02 Feb 2007 10:27:44 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <C1E8C2A0.C967%bosborne11@verizon.net>

Hilmar,

I second your motion, good idea. Let's keep the standard module nice and
clean.

Brian O.


On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

> and create a second
> module fb-chadoxml.pm that inherits from the first and merely
> overrides a few things so that it works for Flybase


From Kevin.M.Brown at asu.edu  Fri Feb  2 10:52:20 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 2 Feb 2007 08:52:20 -0700
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu>

It looks like you have some problems with the code you posted.

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i < @a1;$i++ ) {
        
# is this necessary as you don't seem to use it anywhere later in your
code.
my @a1_s=split/\s+/,$a1[$i];

# you enclosed the variable in '' which means perl won't evaluate it
# changed the query so that perl can evaluate the variable
my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; 
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo
Sent: Friday, February 02, 2007 7:31 AM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;

Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get
the ids of a list of genes using the module Bio::DB::Query:GenBank. I
have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives
me the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards, Luba Pardo _______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Feb  2 11:37:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 10:37:49 -0600
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>

I was going to suggest maybe allowing one to switch out XML handlers/ 
writers based on the style (ala XML::SAX), but I see that chadoxml  
currently uses XML::Writer and there is no next_seq() implemented.   
Oh well...

chris

On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:

> Hi Hilmar,
>
> That is a good idea; when I started down this road, it felt like there
> would only be a few things that I might want to allow to be different,
> but I think you are right that having one standard implementation that
> can be subclassed for legacy systems is a good thing.
>
> Scott
>
>
> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>
>>> The second main change was to introduce a -flybase_compat argument
>>> when
>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>> cvterms
>>> (that are compatable with flybase) will be used, but now the default
>>> will be to use current standards:
>>
>> Just my $0.02 ... obviously, Flybase may be the only organization
>> that uses an 'old style' or any other way not compliant with 'current
>> standards' (presumably SO), but if it's not the only one then this
>> approach won't scale.
>>
>> Also, an argument -flybase_compat suggests to the unsuspecting that
>> this is an endorsed flavor of the standard and fine to use for
>> everyone else too.
>>
>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>> compliant with the standard as we all want it, keep it free from
>> litter caused by usage of old versions of SO, and create a second
>> module fb-chadoxml.pm that inherits from the first and merely
>> overrides a few things so that it works for Flybase. This way, other
>> organizations with similar needs can follow the path and create their
>> own xyz-chadoxml.pm, rather than having to muck around in the
>> chadoxml.pm that comes with the distribution.
>>
>> I'm not sure I fully grasp the underlying issue, so I may not make
>> much sense here. Apologies if that's the case ...
>>
>> 	-hilmar
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Fri Feb  2 11:45:30 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 11:45:30 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>

There must be at least a stub for next_seq(). It may throw a not- 
implemented exception, but it should not just be absent.

	-hilmar

On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:

> I was going to suggest maybe allowing one to switch out XML  
> handlers/writers based on the style (ala XML::SAX), but I see that  
> chadoxml currently uses XML::Writer and there is no next_seq()  
> implemented.  Oh well...
>
> chris
>
> On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
>
>> Hi Hilmar,
>>
>> That is a good idea; when I started down this road, it felt like  
>> there
>> would only be a few things that I might want to allow to be  
>> different,
>> but I think you are right that having one standard implementation  
>> that
>> can be subclassed for legacy systems is a good thing.
>>
>> Scott
>>
>>
>> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>>
>>>> The second main change was to introduce a -flybase_compat argument
>>>> when
>>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>>> cvterms
>>>> (that are compatable with flybase) will be used, but now the  
>>>> default
>>>> will be to use current standards:
>>>
>>> Just my $0.02 ... obviously, Flybase may be the only organization
>>> that uses an 'old style' or any other way not compliant with  
>>> 'current
>>> standards' (presumably SO), but if it's not the only one then this
>>> approach won't scale.
>>>
>>> Also, an argument -flybase_compat suggests to the unsuspecting that
>>> this is an endorsed flavor of the standard and fine to use for
>>> everyone else too.
>>>
>>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>>> compliant with the standard as we all want it, keep it free from
>>> litter caused by usage of old versions of SO, and create a second
>>> module fb-chadoxml.pm that inherits from the first and merely
>>> overrides a few things so that it works for Flybase. This way, other
>>> organizations with similar needs can follow the path and create  
>>> their
>>> own xyz-chadoxml.pm, rather than having to muck around in the
>>> chadoxml.pm that comes with the distribution.
>>>
>>> I'm not sure I fully grasp the underlying issue, so I may not make
>>> much sense here. Apologies if that's the case ...
>>>
>>> 	-hilmar
>> -- 
>> --------------------------------------------------------------------- 
>> ---
>> Scott Cain, Ph. D.                                    
>> cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 12:02:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 12:02:32 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
	<3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
Message-ID: <1170435752.2706.676.camel@localhost.localdomain>

Ah, I'll go ahead and add one, though it will just throw an exception
because this is a write-only adapter.

Scott


On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote:
> There must be at least a stub for next_seq(). It may throw a not- 
> implemented exception, but it should not just be absent.
> 
> 	-hilmar
> 
> On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:
> 
> > I was going to suggest maybe allowing one to switch out XML  
> > handlers/writers based on the style (ala XML::SAX), but I see that  
> > chadoxml currently uses XML::Writer and there is no next_seq()  
> > implemented.  Oh well...
> >
> > chris
> >
> > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
> >
> >> Hi Hilmar,
> >>
> >> That is a good idea; when I started down this road, it felt like  
> >> there
> >> would only be a few things that I might want to allow to be  
> >> different,
> >> but I think you are right that having one standard implementation  
> >> that
> >> can be subclassed for legacy systems is a good thing.
> >>
> >> Scott
> >>
> >>
> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> >>>
> >>>> The second main change was to introduce a -flybase_compat argument
> >>>> when
> >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
> >>>> cvterms
> >>>> (that are compatable with flybase) will be used, but now the  
> >>>> default
> >>>> will be to use current standards:
> >>>
> >>> Just my $0.02 ... obviously, Flybase may be the only organization
> >>> that uses an 'old style' or any other way not compliant with  
> >>> 'current
> >>> standards' (presumably SO), but if it's not the only one then this
> >>> approach won't scale.
> >>>
> >>> Also, an argument -flybase_compat suggests to the unsuspecting that
> >>> this is an endorsed flavor of the standard and fine to use for
> >>> everyone else too.
> >>>
> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
> >>> compliant with the standard as we all want it, keep it free from
> >>> litter caused by usage of old versions of SO, and create a second
> >>> module fb-chadoxml.pm that inherits from the first and merely
> >>> overrides a few things so that it works for Flybase. This way, other
> >>> organizations with similar needs can follow the path and create  
> >>> their
> >>> own xyz-chadoxml.pm, rather than having to muck around in the
> >>> chadoxml.pm that comes with the distribution.
> >>>
> >>> I'm not sure I fully grasp the underlying issue, so I may not make
> >>> much sense here. Apologies if that's the case ...
> >>>
> >>> 	-hilmar
> >> -- 
> >> --------------------------------------------------------------------- 
> >> ---
> >> Scott Cain, Ph. D.                                    
> >> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)                      
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/9acaa3c3/attachment-0003.bin>

From peili at morgan.harvard.edu  Fri Feb  2 10:56:56 2007
From: peili at morgan.harvard.edu (Peili Zhang)
Date: Fri, 02 Feb 2007 10:56:56 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <C1E8C2A0.C967%bosborne11@verizon.net>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
Message-ID: <1170431816.6583.47.camel@jacks>

i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
because i wrote it for fb's data loading task. no need to worry about
flybase compatibility in making the module generic. in fact, at flybase,
i tweak the module frequently to make it work for different scenarios.

cheers,
peili
 
On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> Hilmar,
> 
> I second your motion, good idea. Let's keep the standard module nice and
> clean.
> 
> Brian O.
> 
> 
> On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> 
> > and create a second
> > module fb-chadoxml.pm that inherits from the first and merely
> > overrides a few things so that it works for Flybase
> 
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> 


From cain.cshl at gmail.com  Fri Feb  2 13:05:47 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 13:05:47 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170431816.6583.47.camel@jacks>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
	<1170431816.6583.47.camel@jacks>
Message-ID: <1170439549.2706.683.camel@localhost.localdomain>

Hi Peili,

A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is
fairly simple.  My suggestion is that when you make tweaks for different
scenarios, that you turn the things you are tweaking into methods in
BSIO::chadoxml and then override them in flybase_chadoxml (and commit at
least the chadoxml module) to make it more flexible when other people
have similar scenarios.

Scott


On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote:
> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
> because i wrote it for fb's data loading task. no need to worry about
> flybase compatibility in making the module generic. in fact, at flybase,
> i tweak the module frequently to make it work for different scenarios.
> 
> cheers,
> peili
>  
> On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> > Hilmar,
> > 
> > I second your motion, good idea. Let's keep the standard module nice and
> > clean.
> > 
> > Brian O.
> > 
> > 
> > On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> > 
> > > and create a second
> > > module fb-chadoxml.pm that inherits from the first and merely
> > > overrides a few things so that it works for Flybase
> > 
> > 
> > 
> > -------------------------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services, security?
> > Get stuff done quickly with pre-integrated technology to make your job easier.
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmod-schema at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/a6d23204/attachment-0003.bin>

From cjfields at uiuc.edu  Fri Feb  2 15:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 14:33:46 -0600
Subject: [Bioperl-l] seqDiff
In-Reply-To: <101051013116870@lycos-europe.com>
References: <101051013116870@lycos-europe.com>
Message-ID: <C752CE9D-61A7-4DF2-958E-7162723D0BA9@uiuc.edu>

Judging by the code you'll have to recreate the SeqDiff while  
iterating through various alleles; there is no method to remove  
particular variants or purge them (at least I couldn't find one).

I also noticed SeqDiff doesn't support deletions/insertions either;  
using a null allele (no seq) or leaving out either the mutant or  
original allele leads to errors.  I'll look into the latter, and I  
may try to add a method to at least purge variants and reset dna_mut().

chris

On Feb 2, 2007, at 4:06 AM, marian thieme wrote:

> HI,
>
> is there a way to put out all mutated sequences of a seqdiff object ?
> Suppose I add some variants via:
>
> $dnamut->add_Allele($a2);
> $dnamut->add_Allele($a3);
> $seqDiff->add_Variant($dnamut);
>
> and afterwards want to access the alternative sequences via
> $seqDiff->dna_mut()
>
> which allele is choosen when using dna_mut(), respective can I  
> control to access the first or the second alternate sequence ?
> If yes, how can I do this ?
>
> Regards,
> Marian
>
> Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme  
> Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die  
> Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf  
> www.spain.info
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From MEC at stowers-institute.org  Fri Feb  2 16:47:08 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 2 Feb 2007 15:47:08 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and
	annotations
Message-ID: <CED81D34E37D5043A1211565277A51E50768EDB3@exchkc02.stowers-institute.org>

Lincoln,
 
I don't think that adding this directive is a good idea after all
either.
 
But, I see that you remap the ID= to a load_id attribute which is
preserved in the Bio::DB::SeqFeatureStore database.
 
And then it gets squelched during GFF production by
NormalizedFeature::format_attributes.
 
However, if ID is prone to clashes, then certainly simply renaming the
attribute to be load_id does not preclude clashes from happening, and
only courts disaster.  Don't you think?
 
I'm a little blurry on the GFF3Loader, but it looks like you're using
load_id to facilitate loading parent/child features out of order.  Is
that right?  If so, I suggest you delete all load_id features
immediately after performing a load.  It has not further use.
 
Or, you might consider instead of `round-trip-ids` directive, rather,
give the GFF3Loader  an IDAttribute option which would allow the use of
the loader to preserve the ID values, but to use a named
 
In my case, munging flybase gff,  I would then use it like this:
 
bp_seqfeature_load.PLS --fast --IDAttribute flybaseID
 
which would preserve the ID values in the database but under the
FlybaseID attribute for features so loaded.
 
---------------------------------------------
 
On a related topic:
I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature

_create_subfeatures : ensure that subfeatures get the `source` of their
parent

While doing this I wonder: what is the -class that subfeatures are
getting from their parent...??? I left it in place. Please advise! Fix
my thinking....

----------------------------------------------

Further, I observe that Bio::Graphics::FeatureBase::new handles the
-segments option is to call add_segment.  So, when I create a new
Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the
-segments option gets handled by Bio::Graphics::FeatureBase::new, which,
as mentioned, calls add_segment. The surprising thing to me when thrying
to trace through the class modules and understand what is going on is
that what gets run at this point is not
Bio::Graphics::FeatureBase::add_segment, but rather
Bio::DB::SeqFeature::add_segment, whose semantics is different in at
least one regard, namely, that it does not set the start and stop of the
parent feature from the min and max of the segments.

I have committed a patch to Bio::Graphics::FeatureBase with a comment to
this effect, and have also patched it's add_segment method to copy the
parent's source into the segment.

I hope my commits and suggestions further the cause.  Let me know if
not!
 
-- Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Tuesday, January 30, 2007 4:46 PM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature treamtent of tags and
annotations
	
	
	I've fixed the first issue in CVS. Sorry for the inconsistency.
add_tag_value(), delete_tag_value() and get_Annotations() now all work
as expected.
	
	The problem with the ID column is that it is supposed to be
LOCAL to the GFF3 file and is not intended to be stored in the database.
In contrast, Name can survive roundtripping. Perhaps the thing to do is
to add a flag to the GFF3 file that turns on ID round-tripping, e.g.
	
	##round-trip-ids: 1
	
	If you like this idea, I can implement it.
	
	Lincoln
	
	
	On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln,
		 
		Thanks for your suggestions on approach to my problems
augmenting Flybase annotation.  I am trying to follow them and finding
the following oddities
		 
		The first issue relates to the intermix of 'annotations'
and 'tag values'.  I find that Bio::DB::SeqFeature implements some of
the 'tag' methods and some of the 'Annotation' methods.  Here is a perl
one-liner that shows values stored using add_tag_value are not retreived
with get_tag_values, but rather with get_Annotations.
		 
		> perl -MBio::DB::SeqFeature -e 'my
$f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print
"get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" .
$f->get_Annotations("x");'
		 
		whose output is:
		get_tag_values: 
		get_Annotations:    666
		 
		Tracing this shows me that this results from the fact
that:
		 
		
		Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase
(via Bio::DB::SeqFeature::NormalizedFeature) which does not support
-tags in ->new but rather -attributes, viz:
		 
		
		  -attributes   a hashref of tag value attributes, in
which the key is the tag
		                  and the value is an array reference of
values
		 
		
		And though Bio::Graphics::FeatureBase purports to
implement Bio::SeqFeatureI, it only partially implements the  'tag'
methods (now deprecated and relegated to Bio::AnnotatableI).  In
particular, the '*' methods Bio::SeqFeatureI are not implemented in
Bio::Graphics::FeatureBase 

		  has_tag
		*  add_tag_value
		  get_tag_values
		  get_all_tags
		*  remove_tag
		  get_tagset_values
		  get_Annotations

		As a result, add_tag_value and remove_tag are inherited
from different modules whose understanding of tags is not the same!

		This one-liner :

		>perl -MClass::ISA -MClass::Inspector
-MBio::DB::SeqFeature -e 'my @c =
Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn
qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep
{Class::Inspector->function_exists($_, $fn)} @c)}'

		confirms that they are defined in different packages,
namely:

		add_tag_value: Bio::AnnotatableI 
		get_tag_values: Bio::Graphics::FeatureBase
Bio::AnnotatableI

		
		Proposed solution...  hmmmm ..... I dunno.... maybe the
following patch to Bio::Graphics::FeatureBase->add_tag_value :
		 
		sub add_tag_value {
		  my ($self,$tag, at vals) = @_;
		  push @{$self->{attributes}{$tag}}, @vals;
		}
		
		
		It fixes my use case for now but I'm still concerned and
confused about this variety of methods.  
		 
		Suggestions?
		 

------------------------------------------------------------------------
-

		Also, I think that any "ID" in column 9 of GFF3 float
file should be preserved through a round-trip through a
Bio::DB::SeqFeature store, but this is not yet possible since any ID
attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to
locally patch GFF3Loader::handle_feature method to add the following:

		  # mec at stowers-institute.org
<mailto:mec at stowers-institute.org>  , wondering why not all attributes
are
		  # carried forward, adds ID tag in particular service
of
		  # round-tripping ID, which, though present in database
as load_id
		  # attribute, was getting lost as itself
		  $unreserved->{ID}= $reserved->{ID}     if exists
$reserved->{ID}; 

		Poised to patch.... what d'you think?

		Malcolm Cook
		Stowers Institute for Medical Research - Kansas City,
Missouri
		  

________________________________

			From: lincoln.stein at gmail.com [mailto:
lincoln.stein at gmail.com <mailto:lincoln.stein at gmail.com> ] On Behalf Of
Lincoln Stein
			Sent: Tuesday, December 19, 2006 3:58 PM
			To: Cook, Malcolm
			Cc: bioperl list; lstein at cshl.org
			Subject: Re: bp_seqfeature_load /
Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase
annotation
			
			
			Hi Malcom,
			
			Your second guess was right. The use case of
augmenting an existing gene with additional splice forms isn't provided
for. You can get the functionality by making direct calls to
Bio::DB::SeqFeature::Store methods:
			
			my @genes =
$db->get_features_by_name('FBgn0017545');
			@genes == 1 or die "Didn't get exactly one
gene";
			my $parent = $genes[0];
			
			my $parent = $genes[0];
			my $chr    = $parent->seq_id;
			my $start  = $parent->start;
			my $end    = $parent->end;
			my $strand = $parent->strand;
			
			my $new_splice_form =
$db->new_feature(-primary_tag => 'mRNA',
			                       -source      => 'added',
			                       -seq_id   => '4r',
			                       -strand   => $strand,
			                       -start    => $start+10,
			                       -end      => $end,
			                       );
			$parent->add_SeqFeature($new_splice_form);
			
			for my $pos
([$start+10,$start+100],[$start+200,$end]) {
			  my ($e_start,$e_end) = @$pos;
			  my $exon =
Bio::DB::SeqFeature->new(-primary_tag => 'exon',
			                                      -store
=> $db,
			                      -seq_id      => '4r',
			                      -strand     => $strand,
			                      -start       => $e_start,
			                      -end         => $e_end);
			  $new_splice_form->add_SeqFeature($exon);
			}
			
			I found a bug in updating the seqfeature
database when I wrote this script, so you'll have to get the latest
biperl live. I think you can use this to write a splice form updating
script.
			
			In order to support the idea of adding new
splice forms to an existing gene using the GFF3 format, I will have to
either modify the loader, or write a separate script (probably better to
do the latter). It shouldn't be hard if you'd like to give it a try.
			
			Lincoln
			
			
			On 12/19/06, Cook, Malcolm
<MEC at stowers-institute.org <mailto:MEC at stowers-institute.org>  > wrote: 

				Lincoln and fellow Bio::DB::SeqFeature
travelers,
				
				I find that using bp_seqfeature_load.PLS
to load subfeatures of genes
				already loaded using
bp_seqfeature_load.PLS fails with
				
				------------- EXCEPTION  ------------- 
				MSG: FBgn0017545 doesn't have a primary
id
				STACK
	
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load_fh 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
				STACK toplevel
	
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo

				ad.PLS:76
				
				Where FBgn0017545 is the ID of a gene
previously loaded.
				
				I am unsure how to remedy my situation
and welcome any advise on correct
				or improved approach to my problem.
				
				Here's some detail if it helps.  I am
developing a pipeline to design a 
				microarray probes capable of
distinguishing among splice variants in
				drosophila (using latest Flybase
dmel_r5.1 annotation).  So I
				
				1) load a filtered selection of Flybase
annotation using
				bp_seqfeature_load.  (for testing
purposes, I am using a single gene's 
				worth of annotation, FBgn0017545.gff,
attached).  This is done as
				follows:
				
				        > bp_seqfeature_load.PLS
--create FBgn0017545.gff
				
				2) analyze all the genes in the
database, and create GFF3 output each 
				feature of which has a 'Parent' that is
a previously loaded gene (i.e.
				FBgn0017545).  (These features represent
the unique introns, splice
				sites, and exonic design targets. Output
of this analysis,
				FBgn0017545_matd.gff, is also attached) 
				
				3) load these analysis results into the
same database, as follows:
				
				        > bp_seqfeature_load.PLS
FBgn0017545_matd.gff
				
				It is at this point that I get the above
error.
				
				However, I don't get any error and the
data loads fine if I load the two
				files together, as follows: 
				
				        > bp_seqfeature_load.PLS
--create <(cat FBgn0017545.gff
				FBgn0017545_matd.gff)
				
				So, I suspect that either I am
misunderstanding when/how to use
				bp_seqfeature_load.PLS or else this use
case has not yet arisen and must 
				be provided for somehow.
				
				I am running against bioperl-live
				
				Thanks for your thoughts and assistance,
				
				Malcolm Cook
				Database Applications Manager -
Bioinformatics
				Stowers Institute for Medical Research -
Kansas City, Missouri 
				
				
			-- 
			Lincoln D. Stein
			Cold Spring Harbor Laboratory
			1 Bungtown Road
			Cold Spring Harbor, NY 11724
			(516) 367-8380 (voice)
			(516) 367-8389 (fax)
			FOR URGENT MESSAGES & SCHEDULING, 
			PLEASE CONTACT MY ASSISTANT, 
			SANDRA MICHELSEN, AT michelse at cshl.edu
<mailto:michelse at cshl.edu>  


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From neha_bafs at yahoo.co.in  Mon Feb  5 12:59:03 2007
From: neha_bafs at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From jason at bioperl.org  Mon Feb  5 13:10:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 10:10:42 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org>

you want to write the TREE out not the TREE WRITER.

$treeout->write_tree($tree)

not
$treeout->write_tree($treeout);

On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:

> Hello everyone,
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
> /*------------------------------------------------------------*/
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
> use Bio::TreeIO;
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
> exit 0;
>
>
> /*------------------------------------------------------------*/
>
> Running the script through command line:
> Gives the following error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Questions:-
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 13:05:26 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$  ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank  you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From hlapp at duke.edu  Fri Feb  2 10:09:57 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:09:57 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>


On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:

> The second main change was to introduce a -flybase_compat argument  
> when
> initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> (that are compatable with flybase) will be used, but now the default
> will be to use current standards:

Just my $0.02 ... obviously, Flybase may be the only organization  
that uses an 'old style' or any other way not compliant with 'current  
standards' (presumably SO), but if it's not the only one then this  
approach won't scale.

Also, an argument -flybase_compat suggests to the unsuspecting that  
this is an endorsed flavor of the standard and fine to use for  
everyone else too.

If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
compliant with the standard as we all want it, keep it free from  
litter caused by usage of old versions of SO, and create a second  
module fb-chadoxml.pm that inherits from the first and merely  
overrides a few things so that it works for Flybase. This way, other  
organizations with similar needs can follow the path and create their  
own xyz-chadoxml.pm, rather than having to muck around in the  
chadoxml.pm that comes with the distribution.

I'm not sure I fully grasp the underlying issue, so I may not make  
much sense here. Apologies if that's the case ...

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From jason at bioperl.org  Mon Feb  5 14:43:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 11:43:09 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com>
References: <209988.63723.qm@web8715.mail.in.yahoo.com>
Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org>

please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your  
code so I don't know if you still have a typo.

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file => $filein, -format => $format);
my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");

while( my $t = $in->next_tree ) {
  $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

> Thank you very much for the reply.
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
> Please help me out with this script.
>
> Thank you.
> Regards,
> Neha
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
> $treeout->write_tree($tree)
>
> not
> $treeout->write_tree($treeout);
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
> Hello everyone,
>
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
> /*------------------------------------------------------------*/
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
> use Bio::TreeIO;
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
> exit 0;
>
>
>
>
> /*------------------------------------------------------------*/
>
>
> Running the script through command line:
> Gives the following error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
> Questions:-
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 14:58:08 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com>
Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com>


Hi,
Thank you for the code.
I tried it but I still get the same exception.

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus1.pl:18


Please find attached the perl file(nexus.pl).


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Please let me know if I am using the correct version.If not, please point me to the latest one.

Thank you.
Regards,
nnahar


Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo.  

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");


while( my $t = $in->next_tree ) { 
 $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

Thank you very much for the reply.


I fixed the code as per your suggestion,but now am getting a different error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


-------------  EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Please help me out with this script.


Thank you.
Regards,
Neha


Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE out not the TREE WRITER.


$treeout->write_tree($tree) 


not 
$treeout->write_tree($treeout);


On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:


Hello everyone,


I am trying  to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :


/*------------------------------------------------------------*/


$ cat nexus.pl
#!/usr/bin/perl -w


use Bio::TreeIO;


($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }


exit 0;


/*------------------------------------------------------------*/


Running the script through command line:
Gives the following error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm


Questions:-


1. Please let me know if I am using the correct version.
If not, please point me to the latest one.


2. Provided that the version I am using is the right one, please let me know what is wrong with the script.


Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"


---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441


http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
     

---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
 

 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441

http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/
 

-Neha Nahar
  " Work  for cause and not for applause, live to express and not to impress !"         

---------------------------------
  Here?s a new way to find what you're looking for - Yahoo! Answers 


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nexus.pl
Type: application/x-perl
Size: 811 bytes
Desc: 1389215665-nexus.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070205/c6453dcf/attachment-0003.bin>

From jason at bioperl.org  Mon Feb  5 17:15:52 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 14:15:52 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com>
References: <36024.1212.qm@web8405.mail.in.yahoo.com>
Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>

Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> <nexus.pl>


From lzhtom at hotmail.com  Mon Feb  5 22:31:56 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Tue, 06 Feb 2007 03:31:56 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>
Message-ID: <BAY110-F28F9C9145AC24F2D0E0D34C79F0@phx.gbl>

Thanks a lot!

After checking out the script bp_index, I changed the syntax to:
 my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE');
$inx->make_index("test.fasta");


Now I have a index file test.fasta.idx in my current directory. And I can 
use it in my later script
by saying 
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");

So now everything is OK. But I don't understand why I have to use that 
syntax. And why the syntax provided in the document didn't work?


>From: Jason Stajich <jason at bioperl.org>
>To: zhihua li <lzhtom at hotmail.com>
>CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com
>Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
>Date: Thu, 1 Feb 2007 22:24:44 -0800
>
>I don't think BIOPERL_INDEX does anything in the module so that
>documentation is not quite right.  the env variable is used in the
>scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job
>went bad somewhere.
>
>you need to specify the full path you want with -filename - you can
>just prepen the BIOPERL_INDEX to the filename like.
>-filename => $ENV{BIOPERL_INDEX}."/$index"
>
>-jason
>On Feb 1, 2007, at 7:27 PM, zhihua li wrote:
>
> > Sorry guys, the former empty mail was sent out by mistake.
> >
> > I'm using Bio::index::Fasta to index a file containing lots of
> > sequences in fasta format. All is fine except one thing.
> >
> > According to the bioperl tutorial and the documents, the following
> > code will make a indexed file:
> >
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
> >                                     -write_flag => 1);
> >    $inx->make_index("test.fasta");
> >
> > And in another script I can access the indexed file by sayinig
> >
> > $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> > my $seq=$inx->fetch("ent1001");        #fetch the sequence named
> > ent1001
> >
> > However, after running the first script, I cannot find a new file
> > test.fasta.idx in my current directory. And not surprisingly, when
> > I ran the second script, perl told me it couldn't find
> > "test.fasta.idx".
> >
> > What's going on here?
> >
> > Thanks a lot!
> >
> > _________________________________________________________________
> > ???????????????????????????????????????? MSN Messenger:  http://
> > messenger.msn.com/cn
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>--
>Jason Stajich
>Miller Research Fellow
>University of California, Berkeley
>lab: 510.642.8441
>http://pmb.berkeley.edu/~taylor/people/js.html
>http://fungalgenomes.org/
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From johnston at biochem.ucl.ac.uk  Tue Feb  6 06:52:08 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
Message-ID: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>

Hello,

I've just joined the list - I'm a Bioinformatics PhD student at Essex
University doing transcriptomics-related things. Mainly microarray
analysis and more recently looking at RNA structure prediction.

I was thinking about having a go at writing a bioperl-run wrapper around
some of the structure prediction stuff, but according to the wiki this is
being done already (at least for the Vienna tools). I spoke to Albert
Vilella at the EBI the other day and he said Chris Fields was the man to
speak to. So could he (or anyone) let me know what the current state of
RNA structure prediction tools in bioperl is?

Cheers,
Cass xx


From marian.thieme at lycos.de  Tue Feb  6 08:52:10 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Tue, 06 Feb 2007 14:52:10 +0100
Subject: [Bioperl-l] dbSNP
Message-ID: <45C8880A.7030702@lycos.de>

Hello all,

I looked for a method/class/function/script in the docuementation which
provides the opportunity to generate a snp assay suited to submit to
dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/
http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html)
I didnt find those code, but I recognized that there is at least a xml
parser to read dbSNP reports.

Does anybody know if there is also an output class to generate dbSNP
reports ? I could imagine, that at least the snp assay section is worth
to be implemented.

This example is given by ncbi:


TYPE:SNPASSAY
HANDLE:WI
BATCH: 1.98
MOLTYPE:Genomic
METHOD:RESEQ
SYN NAMES:WI-SNP,DnaId,MapDna
COMMENT:
Here is where some public comment that applies to the entire
batch of SNPS could be put.
PRIVATE:
Here is where a note to NCBI regarding processing that would
not be seen by the outside, could be put.
Note that these are is not exactly real SNPs, as
the data were modified.
||
SNP:WI|WIAF-1234567
SYNONYM:EST4291092,EST8291092,EST7291092
ACCESSION:H30533
LENGTH:101
5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG
OBSERVED:C/T
3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA
||
SNP:WI|WIAF-1722
SYNONYM:STS-T17494,STS-T17494,STS-T17494
ACCESSION:T17494
LENGTH:269
5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT
5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC
OBSERVED:A/T
3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA
3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT
    GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT
    CCCGGGCGTAGGCATTGCTCAAGTACCGAT
||


Regards,
Marian

P.S. this is not in contradiction to my first request about the brackets 
notation. We need both formats.


From cjfields at uiuc.edu  Tue Feb  6 11:45:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Feb 2007 10:45:36 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
Message-ID: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>

On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote:

> Hello,
>
> I've just joined the list - I'm a Bioinformatics PhD student at Essex
> University doing transcriptomics-related things. Mainly microarray
> analysis and more recently looking at RNA structure prediction.
>
> I was thinking about having a go at writing a bioperl-run wrapper  
> around
> some of the structure prediction stuff, but according to the wiki  
> this is
> being done already (at least for the Vienna tools). I spoke to Albert
> Vilella at the EBI the other day and he said Chris Fields was the  
> man to
> speak to. So could he (or anyone) let me know what the current  
> state of
> RNA structure prediction tools in bioperl is?
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Actually, the only RNA tool wrappers I have made are ones for ERPIN,  
RNAMotif, and Infernal (the only one in bioperl-run CVS at this time  
is RNAMotif).  I am planning on writing up wrappers for Vienna,  
UNAFold, and a few others but haven't really started in.  Here's  
where I'm at right now...

I am writing up a new set of AnnotationI classes which positionally  
describe data (Meta) which I hope will help deal with this stuff.   
These would be similar in nature to Heikki's Bio::Seq::Meta classes:

http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html

I would use a regular Bio::SeqI and store the structural data and  
anything else (such as energy calculations, etc) as Annotation  
objects in an AnnotationCollection, and then write up a series of  
SeqIO modules to get data into/out of the designated structure  
formats, like UNAfold ct, RNAML, and so on.  Each sequence would then  
be capable of holding more than one structural Annotation (i.e. could  
represent different folding pathways, alternative RNA folds, and so on).

At this point I represent the data as an array of hashes where $array 
[0] is nt 1 and the hash keys indicate the type of interaction, base  
interacted with, etc.  The text representation would be as simple  
Eddy WUSS (Rfam-like) format by default, which is capable of  
representing some complex data (pseudoknots, for instance), is  
compact, and is documented (via the Infernal manual).  Tags will  
probably switch to more ontologically relevant terms (probably from  
RNAML or RNA Ontology), but in general it is something like this:

[
  {'interaction' => 'WC',
    'base'  => 24},
  {'interaction' => 'WC',
    'base'  => 23},
  {'interaction' => 'SS'},
...
]

In this implementation every seq position would have some kind of  
interaction designation, though that's open for debate as it could  
just be simple text or undef for single-stranded regions.

This is also scalable based on complexity of the data: if one wanted  
to add tert/quaternary interactions, location, base modifications,  
remote sequence interactions, etc., extra key/value pairs could be  
used.  Comversely, if one only wanted sec structure (for drawing RNA  
structures, for example), then only that data would be parsed.

If you (or anyone listening) have any suggestions I would greatly  
appreciate them.

chris


From johnsonm at gmail.com  Tue Feb  6 18:53:49 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 6 Feb 2007 17:53:49 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
Message-ID: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>

Okay, I need to get something going for a project I'm working on.  Options:

1) Stick it all in one module:  This can get a bit ugly, as Glimmer, as
opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in
the prediction report.  You can pick up on some unique things in the output
file, but you don't know what you've got until you're actually parsing it.
Unless you require a format argument up front, then you can split the
parsing code up into different functions.
2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3.
With or without an abstract dispatch front end.

I suppose at this point, after getting my hands dirty, I'd prefer 1), with
an explicit -format => Glimmer2/3/M/HMM arg required in the constructor.
Though I'm not opposed to 2) if that is what it takes to get it into
Bioperl.

If we can achieve some sort of consensus without too much bloodshed, I'll
shoot y'all some patches and we can consider this issue checked off the
list.

On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     I think it's going to be at least two modules, one for the
> prokaryotic stuff and one for the eukaryotic.  And really, the
> prokaryotic stuff is different enough to warrant two modules. So three
> different parsers.  Could do it in one, but it would be ugly and
> nasty.  However, this does not preclude three parsers and one abstract
> interface, which is your excellent suggestion.
>     Oh, and excuse me, but I have a bit of a rant here, after dealing
> with parsers and pipelines for the last few months.  Parsers should
> not load the whole input file into RAM to parse it.  And Pipelines
> using the parsers (Ensembl / biopipe) should not stuff the whole
> result set from the parser into a single array.  When you're trying to
> annotate assemblies, it sucks to have to split up contigs/supercontigs
> because the whole result set won't fit into RAM on a 12 gig blade.
> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
> tiny (by comparison to vertebrates).  There, sorry, been saving up
> that frustration for a while.  No offense meant, hope I didn't tick
> anybody off.  8)
>     Torsten:  You sound like you know what you're doing with respect
> to Bioperl more than I do, and I know I don't have CVS access, so I'll
> defer to you.  I'd be happy to help out, though.
>
>
> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> >
> > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
> >
> > > I'm not sure whether to
> > >
> > > 1. parse them all under the same module, perhaps with a
> > > -format=>'glimmerXXX' parameter
> > >
> > > 2. create a single new module  Glimmer2 and Glimmer3
> > >
> > > 3. create two new modules, one for Glimmer2 and one for Glimmer3,
> > > given
> > > they are different outputs both in syntax and number of output files
> > >
> > > Any advice from Bioperl 'old timers' appreciated ;-)
> > >
> >
> > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
> > example for how this can work.
> >
> > If this would amount to basically 4 modules stringed together into
> > one file (because the parsing code can't share much if anything
> > between the flavors), it'd still be advantageous to have a single
> > frontend module that would then dispatch.
> >
> >         -hilmar
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>


From jason at bioperl.org  Tue Feb  6 19:33:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Feb 2007 16:33:11 -0800
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>

I definitely vote for 1) - worst case you have 4 separate methods if  
there is no good way to condense the parsing for each format and  
require the user to specify the format.

I have no problem with requiring user to specify what program she  
used - if we can be fancy and guess the format later (i.e. guess  
format in SeqIO) -then that's icing.

-jason
On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote:

> Okay, I need to get something going for a project I'm working on.   
> Options:
>
> 1) Stick it all in one module:  This can get a bit ugly, as  
> Glimmer, as
> opposed to GlimmerM and GlimmerHMM, does not explicitly identify  
> itself in
> the prediction report.  You can pick up on some unique things in  
> the output
> file, but you don't know what you've got until you're actually  
> parsing it.
> Unless you require a format argument up front, then you can split the
> parsing code up into different functions.
> 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ 
> Glimmer3.
> With or without an abstract dispatch front end.
>
> I suppose at this point, after getting my hands dirty, I'd prefer  
> 1), with
> an explicit -format => Glimmer2/3/M/HMM arg required in the  
> constructor.
> Though I'm not opposed to 2) if that is what it takes to get it into
> Bioperl.
>
> If we can achieve some sort of consensus without too much  
> bloodshed, I'll
> shoot y'all some patches and we can consider this issue checked off  
> the
> list.
>
> On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>>
>>     I think it's going to be at least two modules, one for the
>> prokaryotic stuff and one for the eukaryotic.  And really, the
>> prokaryotic stuff is different enough to warrant two modules. So  
>> three
>> different parsers.  Could do it in one, but it would be ugly and
>> nasty.  However, this does not preclude three parsers and one  
>> abstract
>> interface, which is your excellent suggestion.
>>     Oh, and excuse me, but I have a bit of a rant here, after dealing
>> with parsers and pipelines for the last few months.  Parsers should
>> not load the whole input file into RAM to parse it.  And Pipelines
>> using the parsers (Ensembl / biopipe) should not stuff the whole
>> result set from the parser into a single array.  When you're  
>> trying to
>> annotate assemblies, it sucks to have to split up contigs/ 
>> supercontigs
>> because the whole result set won't fit into RAM on a 12 gig blade.
>> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
>> tiny (by comparison to vertebrates).  There, sorry, been saving up
>> that frustration for a while.  No offense meant, hope I didn't tick
>> anybody off.  8)
>>     Torsten:  You sound like you know what you're doing with respect
>> to Bioperl more than I do, and I know I don't have CVS access, so  
>> I'll
>> defer to you.  I'd be happy to help out, though.
>>
>>
>> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>
>>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
>>>
>>>> I'm not sure whether to
>>>>
>>>> 1. parse them all under the same module, perhaps with a
>>>> -format=>'glimmerXXX' parameter
>>>>
>>>> 2. create a single new module  Glimmer2 and Glimmer3
>>>>
>>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3,
>>>> given
>>>> they are different outputs both in syntax and number of output  
>>>> files
>>>>
>>>> Any advice from Bioperl 'old timers' appreciated ;-)
>>>>
>>>
>>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
>>> example for how this can work.
>>>
>>> If this would amount to basically 4 modules stringed together into
>>> one file (because the parsing code can't share much if anything
>>> between the flavors), it'd still be advantageous to have a single
>>> frontend module that would then dispatch.
>>>
>>>         -hilmar
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From torsten.seemann at infotech.monash.edu.au  Tue Feb  6 21:36:54 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 7 Feb 2007 13:36:54 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <a79f6a4b0702061836l7e63933bs3f065b773054c9c4@mail.gmail.com>

> I definitely vote for 1) - worst case you have 4 separate methods if
> there is no good way to condense the parsing for each format and
> require the user to specify the format.

And make the defaut -format to be what is currently parses, ie.
GlimmerM/GlimmerHMM

> I have no problem with requiring user to specify what program she
> used - if we can be fancy and guess the format later (i.e. guess
> format in SeqIO) -then that's icing.

Agreed.

>> Okay, I need to get something going for a project I'm working on.

I would normally try to help but I am so swamped with work-work at the
moment. Just a reminder that last year I added examples of the
different Glimmer outputs to the CVS repository:

./t/data/Glimmer3.predict
./t/data/Glimmer3.detail
./t/data/GlimmerHMM.out
./t/data/Glimmer2.out
./t/data/GlimmerM.out
./t/data/glimmer.out (this was the original one)

Thanks for taking this on!

--Torsten


From mitch_skinner at berkeley.edu  Tue Feb  6 23:37:35 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Tue, 06 Feb 2007 20:37:35 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
Message-ID: <45C9578F.2060802@berkeley.edu>

Hello,

I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), 
where we're pre-rendering entire chromosomes by breaking them up into 
tiles.  One of the problems we have is that it takes a long time to 
render all those tiles.  One of the things that's slowing the process 
down (and using lots of RAM) is rendering the gridlines, and it would 
make things a lot easier (and faster) for us if we could assume that the 
gridlines were the same for each tile.  Since we're only rendering at a 
particular set of zoom levels (that we have control over), I think this 
is a reasonable assumption.

Given the right set of zoom levels, the assumption works almost all the 
time, except for one specific case.  It has to do with the way draw_grid 
and map_pt in Bio::Graphics::Panel work for the very first gridline.

Here's how draw_grid (in CVS HEAD) computes the first gridline:

    my $first_tick = $minor * int($self->start/$minor);

$first_tick, $minor and $self->start are in base-pair space, which is 
1-based.  However, if ($self->start < $minor) then $first_tick is 0.  
This might not be a problem, except that $first_tick is translated into 
pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here 
are the relevant lines in map_pt:

    my $val = $flip 
      ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
      : int (0.5 + ($_-$offset-1) * $scale);

This style of rounding only works for positive numbers; rounding 0.6 by 
doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing 
int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0, 
10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates 
false, and pad left is 0) they're drawn at pixels 0, 9, and 19.

I think that there should be gridlines at pixels 0, 10, and 20.  The 
fact that currently the first interval is 9 pixels and the second is 10 
pixels is breaking my hopeful assumption about the gridlines.

AFAICT my problems are solved if we make two changes:
change the above line from draw_grid to this:
    my $first_tick = 1 + $minor * int(($start - 1)/$minor);
and change the lines from map_pt to this:

    my $val = $flip 
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

Does this make sense?  If people agree that these changes are right then 
I can also produce a proper patch if y'all would prefer that.

Regards,
Mitch


From lstein at cshl.edu  Wed Feb  7 07:17:22 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:17:22 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>

Hi Mitch,

Zero is not a forbidden coordinate, since gbrowse also works on genetic maps
which have negative and floating point coordinates. You've simply picked up
a boundary case where the rounding isn't working properly. I will fix this
now.

Lincoln


On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed Feb  7 07:18:40 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:18:40 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>

However, I'm also very interested in why grid-drawing takes so long. When
I've profiled drawing, neither grid drawing nor map_pt() consume any
significant amount of time.

Lincoln

On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From johnsonm at gmail.com  Wed Feb  7 11:50:05 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 7 Feb 2007 10:50:05 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>

    Well, each format has some unique features.  If the user declines to
specify the format, I can figure it out, but it will probably involve
scanning the input file twice.  I'll take a look.
    I can do all the parsing in one function, in fact I have, just to see
how nasty it would end up being.  I just can't stomach having the code that
tightly coupled and hard to read.  In the end it'll probably be three
functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
Glimmer3 aren't *that* different, either.

On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>
> I definitely vote for 1) - worst case you have 4 separate methods if there
> is no good way to condense the parsing for each format and require the user
> to specify the format.
>
> I have no problem with requiring user to specify what program she used -
> if we can be fancy and guess the format later (i.e. guess format in SeqIO)
> -then that's icing.
>
> -jason
>
>


From adsj at novozymes.com  Wed Feb  7 12:11:32 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Wed, 07 Feb 2007 18:11:32 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
Message-ID: <8764adoptn.fsf@topper.koldfront.dk>

  Hi.


I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
to features in Bio::Seq objects have stopped appearing when I output
them as EMBL or GenBank-files.

Below is a test-script that exercises the problem.

I guess I should be doing something else when adding qualifiers, now
with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
again of course works perfectly), but I can't deduce what from perldoc
Bio::SeqFeature::Generic - it still lists the add_tag_value method,
and calling it doesn't croak nor warn.

I have found some comments on this in the release notes of 1.5.0? on
the Bioperl wiki, but I must admit I wasn't able to extract what
methods I should be calling instead.

If someone could point me to the relevant documentation or tell me
what method to use instead, I would be happy as a clam.


  Best regards,

    Adam

== =
use Test::More tests=>2;

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqFeature::Generic;
use IO::String;
use Bio::SeqIO;

my $seq=Bio::Seq->new(
                      -seq=>'actgactgactg',
                     );

$seq->display_id('D27');
$seq->accession_number('DB:D27');

my $seq_feature=Bio::SeqFeature::Generic->new(
                                              -strand=>1,
                                              -primary=>'source',
                                             );
$seq_feature->set_attributes(-start=>2, -end=>8);
$seq_feature->add_tag_value(note=>'TEST');
$seq_feature->add_tag_value(db_xref=>'DB:D27');

$seq->add_SeqFeature($seq_feature);

my $raw='';
my $fh=IO::String->new($raw);
my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh);
$out->write_seq($seq);

ok($raw=~m!/note!, 'Qualifier note found');
ok($raw=~m!/db_xref!, 'Qualifier db_xref found');
== =


? <http://www.bioperl.org/wiki/Core_1.4.0_1.5.0_delta>

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com


From cjfields at uiuc.edu  Wed Feb  7 12:50:13 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 11:50:13 -0600
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk>
References: <8764adoptn.fsf@topper.koldfront.dk>
Message-ID: <C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>


On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote:

>   Hi.
>
>
> I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
> to features in Bio::Seq objects have stopped appearing when I output
> them as EMBL or GenBank-files.
>
> Below is a test-script that exercises the problem.
>
> I guess I should be doing something else when adding qualifiers, now
> with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
> again of course works perfectly), but I can't deduce what from perldoc
> Bio::SeqFeature::Generic - it still lists the add_tag_value method,
> and calling it doesn't croak nor warn.
>
> I have found some comments on this in the release notes of 1.5.0? on
> the Bioperl wiki, but I must admit I wasn't able to extract what
> methods I should be calling instead.
>
> If someone could point me to the relevant documentation or tell me
> what method to use instead, I would be happy as a clam.
>
>
>   Best regards,
>
>     Adam

...

This works for me using bioperl-live (Mac OS X):

ok 1 - Qualifier note found
ok 2 - Qualifier db_xref found

If I print the string I get:

ID   DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP.
XX
AC   DB:D27;
XX
XX
FH   Key             Location/Qualifiers
FH
FT   source          2..8
FT                   /db_xref="DB:D27"
FT                   /note="TEST"
XX
SQ   Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other;
      actgactgac  
tg                                                            12
//

GenBank also works:

LOCUS       D27                       12 bp    dna     linear   UNK
ACCESSION   DB:D27
FEATURES             Location/Qualifiers
      source          2..8
                      /db_xref="DB:D27"
                      /note="TEST"
BASE COUNT        3 a      3 c      3 g      3 t
ORIGIN
         1 actgactgac tg
//

If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
mixing the two versions (you can check by using 'perldoc -l  
Bio::Root::Root').

chris


From cjfields at uiuc.edu  Wed Feb  7 13:04:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 12:04:33 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu>


On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote:

>     Well, each format has some unique features.  If the user  
> declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just  
> to see
> how nasty it would end up being.  I just can't stomach having the  
> code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.

I don't see a problem with passing off the parse to a defined class  
method either right off or mid-parse.  I'm doing something like this  
with a revamped GenBank parser:

# declare local to module

my %GLIMMER_METHODS = (
     'GlimmerHMM' => '_parsehmm',
     'Glimmer'  => '_parsenormal',
     ....others if needed
     '_DEFAULT_' => '_parseabnormal'
);

...

Then either preparse part of file using _readline() to determine  
format, or use -format and bypass preparsing:

sub next_thingy {
    ...
    if (!$format) {
        while (my $line = $self->_readline()) {
            if ($line =~ m{(something)}) {
                $format = $1; $self->_pushback($line); last;
            }
        }
    }
    my $method =  (exists $GLIMMER_METHODS($format)) ?  
$GLIMMER_METHODS($format) :
                  ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one

    return $self->$method() # hand off parsing flow to to proper parser
    ...
}

# all parser variants would have this structure:

sub _parsehmm {
    my $self = shift;
    ... init stuff here
    while (my $line = $self->_readline()) {
        ... do stuff until END of next prediction/report
    }
    ... return data if any
}

chris

> On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> I definitely vote for 1) - worst case you have 4 separate methods  
>> if there
>> is no good way to condense the parsing for each format and require  
>> the user
>> to specify the format.
>>
>> I have no problem with requiring user to specify what program she  
>> used -
>> if we can be fancy and guess the format later (i.e. guess format  
>> in SeqIO)
>> -then that's icing.
>>
>> -jason
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Wed Feb  7 13:56:52 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
In-Reply-To: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>

Thanks Chris.

Storing the interaction data as a hash according to an ontology and using
an extended bracket notation as the string representation seems to make
sense, but I'm still unsure how this is supposed to be
attached to the Seq objects. You reckon it should be an AnnotationI?

I'm not sure I understand the distinction between annotations and
features. From the docs I got the impression that Features were like
annotation on bits of sequences and had a reference to the sequence to
which they belong, whereas annotations don't. If that's the case though,
why would RNA structure be an annotation rather than a feature? If not,
what is the distinction between them? Are the positional Annotation
subclasses you're developing intended to replace features? Have I got the
wrong end of the stick entirely?

Cheers,
Cass


On Tue, 6 Feb 2007, Chris Fields wrote:

> Actually, the only RNA tool wrappers I have made are ones for ERPIN,
> RNAMotif, and Infernal (the only one in bioperl-run CVS at this time
> is RNAMotif).  I am planning on writing up wrappers for Vienna,
> UNAFold, and a few others but haven't really started in.  Here's
> where I'm at right now...
>
> I am writing up a new set of AnnotationI classes which positionally
> describe data (Meta) which I hope will help deal with this stuff.
> These would be similar in nature to Heikki's Bio::Seq::Meta classes:
>
> http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html
>
> I would use a regular Bio::SeqI and store the structural data and
> anything else (such as energy calculations, etc) as Annotation
> objects in an AnnotationCollection, and then write up a series of
> SeqIO modules to get data into/out of the designated structure
> formats, like UNAfold ct, RNAML, and so on.  Each sequence would then
> be capable of holding more than one structural Annotation (i.e. could
> represent different folding pathways, alternative RNA folds, and so on).
>
> At this point I represent the data as an array of hashes where $array
> [0] is nt 1 and the hash keys indicate the type of interaction, base
> interacted with, etc.  The text representation would be as simple
> Eddy WUSS (Rfam-like) format by default, which is capable of
> representing some complex data (pseudoknots, for instance), is
> compact, and is documented (via the Infernal manual).  Tags will
> probably switch to more ontologically relevant terms (probably from
> RNAML or RNA Ontology), but in general it is something like this:
>
> [
>   {'interaction' => 'WC',
>     'base'  => 24},
>   {'interaction' => 'WC',
>     'base'  => 23},
>   {'interaction' => 'SS'},
> ...
> ]
>
> In this implementation every seq position would have some kind of
> interaction designation, though that's open for debate as it could
> just be simple text or undef for single-stranded regions.
>
> This is also scalable based on complexity of the data: if one wanted
> to add tert/quaternary interactions, location, base modifications,
> remote sequence interactions, etc., extra key/value pairs could be
> used.  Comversely, if one only wanted sec structure (for drawing RNA
> structures, for example), then only that data would be parsed.
>
> If you (or anyone listening) have any suggestions I would greatly
> appreciate them.
>
> chris
>
>


From cjfields at uiuc.edu  Wed Feb  7 17:15:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 16:15:44 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
	<Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu>


On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote:

> Thanks Chris.
>
> Storing the interaction data as a hash according to an ontology and  
> using
> an extended bracket notation as the string representation seems to  
> make
> sense, but I'm still unsure how this is supposed to be
> attached to the Seq objects. You reckon it should be an AnnotationI?

As long as it describes everything in the object and that there is a  
reasonable way of textually representing the data, I think you can  
attach anything as annotation.  A recent example is the addition of  
trees as annotation.  Also, Annotation can be used to describe  
alignments (such as the structure consensus string in Rfam  
alignments), or added to SeqFeatures.  The class just needs to  
implement AnnotatableI.

> I'm not sure I understand the distinction between annotations and
> features. From the docs I got the impression that Features were like
> annotation on bits of sequences and had a reference to the sequence to
> which they belong, whereas annotations don't. If that's the case  
> though,
> why would RNA structure be an annotation rather than a feature? If  
> not,
> what is the distinction between them? Are the positional Annotation
> subclasses you're developing intended to replace features? Have I  
> got the
> wrong end of the stick entirely?
>
> Cheers,
> Cass

The key distinction between seqfeatures and annotations is that  
annotations are normally associated with the entire sequence record,  
while seqfeatures normally describe a part of the sequence (and thus  
have a location on the sequence).  There are a few exceptions, but in  
general that's that case.  The HOWTO gives a bit more background:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Using annotations or seqfeatures in a case like this may be  
completely dependent on one's point of view.  For instance, one  
implementation I had considered was adding an interface to Bio::Seq  
which would allow Seq objects to also have Bio::Structure objects/  
since my view is that any sequence could (optionally) have a  
structure associated with it.  However, I reasoned that a sequence  
could actually have multiple structures (RNA, ssDNA, and protein can  
have several alternative folds or different folding pathways, for  
instance).   Instead of splitting up each structure into individual  
seqfeatures (where each which would have to be tagged with the  
relevant structure and score info), I could have one class encompass  
all of that data in a reasonable way.  Hence I used Annotation.

BTW, this isn't meant to replace features in any way.  It would be  
primarily used to describe (1) a sequence as a whole, such as a tRNA  
sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in  
a genome sequence, or (3) a conserved structure in an alignment, such  
as Rfam stockholm output.

I'll add that the option of splitting the data into seqfeatures isn't  
ruled out.  It would be a matter of using a helper method, maybe in  
SeqUtils or directly in Annotation::Meta or whatever I end up calling  
it.  I plan on adding something along those lines at some point.

chris


From mitch_skinner at berkeley.edu  Wed Feb  7 18:26:53 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:26:53 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
Message-ID: <45CA603D.1070901@berkeley.edu>

Lincoln Stein wrote:
> Zero is not a forbidden coordinate, since gbrowse also works on 
> genetic maps which have negative and floating point coordinates. 
> You've simply picked up a boundary case where the rounding isn't 
> working properly. I will fix this now.
Thanks for the fix.  What do you think of the following case?.  This is 
something I actually ran into.  Suppose you have:
the original draw_grid:

    my $first_tick = $minor * int($self->start/$minor);

and my version of map_pt:

    my $val = $flip
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10.
Our tiles are currently 1000px wide.  So the first gridline will be at 
0bp => -1px and the 200th gridline will be at 2000bp => 1000px.  So the 
first tile will not have a gridline at it's 0th pixel but the second 
tile will have one there.  Last night I was thinking that this was an 
artifact of having gridlines start at 0bp but now I'm thinking this is 
just because rounding half-pixels leaves an extra space when crossing 
zero.  Which is not unreasonable; it just invalidates the assumption I 
was hoping to make that the gridlines are the same for each tile.  Maybe 
it's just unreasonable to think that floating point calculations will 
give pixel-exact results.

Or I may just be barking up the wrong tree entirely.  Perhaps it's time 
to reconsider at a higher level (see my next message).

Mitch


From mitch_skinner at berkeley.edu  Wed Feb  7 18:28:11 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:28:11 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
Message-ID: <45CA608B.80907@berkeley.edu>

Lincoln Stein wrote:
> However, I'm also very interested in why grid-drawing takes so long. 
> When I've profiled drawing, neither grid drawing nor map_pt() consume 
> any significant amount of time.
Well, the approach that we've been taking is to hand 
Bio::Graphics::Panel a fake GD object that stores all of the graphical 
primitives (line, rectangle, filledRectangle, etc. + their parameters) 
and then draws them later in chunks (for each tile, we draw all the 
primitives that overlap its pixel boundaries).  We're doing this because 
trying to create a real GD object that's hundreds of millions of pixels 
wide takes too much RAM.  But storing all the gridlines (for a whole 
chromosome, at a high zoom level) also takes a lot of RAM, and getting 
the gridlines for the current tile and translating their coordinates 
into the coordinate space of the tile also takes a fair amount of CPU.  
The gridline hack I've been experimenting with (that prompted these 
emails) was motivated by the hope that the gridlines were regular enough 
that we wouldn't have to store them explicitly, but just draw the same 
gridlines over and over again.  It runs almost twice as fast as the 
version that explicitly stores the gridlines.

So the main slowdown is not in draw_grid or map_pt, but in our code 
that's storing/retrieving and translating the gridlines.  Which we are 
also looking into speeding up.  But the memory usage is harder to 
reduce; I've experimented with trying to compress the gridline data but 
it seems easier to just have the panel draw the grid directly.

The more I read the Panel code, the more I think it would be nice to 
make more use of it.  One of the reasons that we're trying to fool it 
right now is that there seem to be a number of behaviors in it (and/or 
in the glyphs?) that take the current image boundaries into account 
(drawing an arrow where a feature runs off the edge of the image, 
etc.).  But in our browser each tile is supposed to mesh seamlessly with 
its neighbor, so if there's an easy way to turn off those edge-aware 
behaviors that would be pretty interesting.

Ian has also suggested that it might be better to store less information 
than the full set of graphics primitives.  For example, we could just 
store the Panel's glyph boxes and use their (pixel bound)->feature 
information to decide which features need to be drawn for each tile.

I'm going to be spending some time reading the Bio::Graphics code in 
more depth.  I'd also welcome suggestions from you or anyone on the list.

Thanks,
Mitch


From sdbrown at annular.org  Wed Feb  7 18:41:13 2007
From: sdbrown at annular.org (Steven Brown)
Date: Wed, 7 Feb 2007 15:41:13 -0800
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>

The module seems to have trouble handling the cut-site specifiers  
that surround the sequence that the enzyme is specific for.  The error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad end parameter (22). End must be less than the total length  
of sequence (total=6)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/PrimarySeq.pm:371
STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 
5.8.6/Bio/Restriction/Analysis.pm:369
STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
---snip (my script line)---
-----------------------------------------------------------

The offending enzyme:

---snip---
<1>AcuI
<2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
<3>CTGAAG(16/14)
---snip---

If I get rid of the (16/14) the error disappears and the right  
sequence site is matched.  It seems like maybe a decision was made  
not analyze enzymes with remote cut positions, or the code wouldn't  
throw the error...?  Any information on this would be helpful.

Thanks,
Steve


From adsj at novozymes.com  Thu Feb  8 03:55:50 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Thu, 08 Feb 2007 09:55:50 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
References: <8764adoptn.fsf@topper.koldfront.dk>
	<C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>
Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk>

On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote:

> This works for me using bioperl-live (Mac OS X):

> ok 1 - Qualifier note found
> ok 2 - Qualifier db_xref found

*slaps forehead*

Thanks for the test - your diagnose was spot on:

> If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
> mixing the two versions (you can check by using 'perldoc -l  
> Bio::Root::Root').

I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in
my @INC (added, and promptly forgotten, writing the patch mentioned
here: <http://article.gmane.org/gmane.comp.lang.perl.bio.general/13349/>).

Removing those and patching 1.5.2 fixed my self-inflicted problem.


  Thanks again!

     Adam

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com


From heikki at sanbi.ac.za  Thu Feb  8 04:39:47 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Feb 2007 11:39:47 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
Message-ID: <200702081139.48125.heikki@sanbi.ac.za>

The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an 
existing sequence. Maybe your sequence has a restriction site that is near 
the end of your sequence?

This is a special case which has not been into account in 
Bio::Restriction::Analysis::_cuts method. 

The question is : should the site be be detected if its cut site is not within 
the studied sequence?

Please submit a bugzilla bug, so this gets solved. I probably do not have time 
to tweak the code myself.

	-Heikki


On Thursday 08 February 2007 01:41:13 Steven Brown wrote:
> The module seems to have trouble handling the cut-site specifiers
> that surround the sequence that the enzyme is specific for.  The error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bad end parameter (22). End must be less than the total length
> of sequence (total=6)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/PrimarySeq.pm:371
> STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
> STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
> STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/
> 5.8.6/Bio/Restriction/Analysis.pm:369
> STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
> ---snip (my script line)---
> -----------------------------------------------------------
>
> The offending enzyme:
>
> ---snip---
> <1>AcuI
> <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
> <3>CTGAAG(16/14)
> ---snip---
>
> If I get rid of the (16/14) the error disappears and the right
> sequence site is matched.  It seems like maybe a decision was made
> not analyze enzymes with remote cut positions, or the code wouldn't
> throw the error...?  Any information on this would be helpful.
>
> Thanks,
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Thu Feb  8 09:20:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Feb 2007 08:20:26 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
Message-ID: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>

All,

BLAST XML parsing should now work for any CPAN-based XML::SAX parser!

XML::SAX::PurePerl (comes with XML::SAX, the slowest)
XML::SAX::Expat
XML::SAX::ExpatXS (the fastest)
XML::LibXML::SAX
XML::LibXML::SAX::Parser

Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl  
bug, so using that parser will necessitate an XML::SAX upgrade.  I  
had also found a bug in the SAX handler which chopped off a large  
chunk of the description for hits which is now fixed in CVS.

If Sendu is out there, I think we can safely remove any dependencies  
beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
modify Build.PL?

chris


From lstein at cshl.edu  Thu Feb  8 10:51:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 8 Feb 2007 10:51:49 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45CA608B.80907@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com>

Hi,

I like the approach you're taking (creating a fake GD object that stores the
graphics primitives). Perhaps the best thing to do is to subclass Panel
itself so that it doesn't draw the gridlines (or turn gridlines off
completely). Then you can draw gridlines after the fact in each tile as
needed.

Lincoln

On 2/7/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Lincoln Stein wrote:
> > However, I'm also very interested in why grid-drawing takes so long.
> > When I've profiled drawing, neither grid drawing nor map_pt() consume
> > any significant amount of time.
> Well, the approach that we've been taking is to hand
> Bio::Graphics::Panel a fake GD object that stores all of the graphical
> primitives (line, rectangle, filledRectangle, etc. + their parameters)
> and then draws them later in chunks (for each tile, we draw all the
> primitives that overlap its pixel boundaries).  We're doing this because
> trying to create a real GD object that's hundreds of millions of pixels
> wide takes too much RAM.  But storing all the gridlines (for a whole
> chromosome, at a high zoom level) also takes a lot of RAM, and getting
> the gridlines for the current tile and translating their coordinates
> into the coordinate space of the tile also takes a fair amount of CPU.
> The gridline hack I've been experimenting with (that prompted these
> emails) was motivated by the hope that the gridlines were regular enough
> that we wouldn't have to store them explicitly, but just draw the same
> gridlines over and over again.  It runs almost twice as fast as the
> version that explicitly stores the gridlines.
>
> So the main slowdown is not in draw_grid or map_pt, but in our code
> that's storing/retrieving and translating the gridlines.  Which we are
> also looking into speeding up.  But the memory usage is harder to
> reduce; I've experimented with trying to compress the gridline data but
> it seems easier to just have the panel draw the grid directly.
>
> The more I read the Panel code, the more I think it would be nice to
> make more use of it.  One of the reasons that we're trying to fool it
> right now is that there seem to be a number of behaviors in it (and/or
> in the glyphs?) that take the current image boundaries into account
> (drawing an arrow where a feature runs off the edge of the image,
> etc.).  But in our browser each tile is supposed to mesh seamlessly with
> its neighbor, so if there's an easy way to turn off those edge-aware
> behaviors that would be pretty interesting.
>
> Ian has also suggested that it might be better to store less information
> than the full set of graphics primitives.  For example, we could just
> store the Panel's glyph boxes and use their (pixel bound)->feature
> information to decide which features need to be drawn for each tile.
>
> I'm going to be spending some time reading the Bio::Graphics code in
> more depth.  I'd also welcome suggestions from you or anyone on the list.
>
> Thanks,
> Mitch
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Kevin.M.Brown at asu.edu  Thu Feb  8 10:28:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 08:28:30 -0700
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu>

> The more I read the Panel code, the more I think it would be 
> nice to make more use of it.  One of the reasons that we're 
> trying to fool it right now is that there seem to be a number 
> of behaviors in it (and/or in the glyphs?) that take the 
> current image boundaries into account (drawing an arrow where 
> a feature runs off the edge of the image, etc.).  But in our 
> browser each tile is supposed to mesh seamlessly with its 
> neighbor, so if there's an easy way to turn off those 
> edge-aware behaviors that would be pretty interesting.

I think the glyphs try to deal with edges because if they didn't, then
they would flow out into whatever right or left padding had been placed
around the image when the panel was created.  Something I've noticed is
that when I create tiles for the chromosomes I'm working on the panels
don't line up because the bump position in one panel is not accounted
for when the next panel is drawn.


From sheris at eps.berkeley.edu  Thu Feb  8 12:42:27 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Thu, 08 Feb 2007 09:42:27 -0800
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>

Hi,
I'm a newbie to BioPerl so apologies if this is a very basic 
question. I am trying to parse GenBank files with the goal of 
creating concatenated gene lists in nucleic acid or amino acid 
format. It is working fine, except for one thing: I need to create 
gene labels incorporating information on whether the gene is on the 
complementary strand or not ("complement" in the CDS tag). How can I 
get Bioperl to tell me whether the CDS tag value includes the word 
"complement"?

Thanks
Sheri


From george.heller at yahoo.com  Thu Feb  8 13:54:41 2007
From: george.heller at yahoo.com (George Heller)
Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST)
Subject: [Bioperl-l] Perl script to extract from ncbi
Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com>

Hi all, 
   
  I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. 
   
  I have about 1500 records for which I need to extract the names from ncbi. 
   
  Any ideas of how I can go about writing a perl script for extracting this information from ncbi?
   
  Thanks!
  George.

 
---------------------------------
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.


From Kevin.M.Brown at asu.edu  Thu Feb  8 14:11:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 12:11:50 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu>

When you extract the features, just look at the strand method on the
returned sequence to find out.

@features = $seq->all_SeqFeatures;
# sort features by their primary tags
for my $f (@features)
{
	my $tag = $f->primary_tag;
	if ($tag eq 'CDS')
	{
		print $f->strand ."\n";
	}
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sheri Simmons
> Sent: Thursday, February 08, 2007 10:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl newbie needs help with 
> extracting cds info
> 
> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic 
> question. I am trying to parse GenBank files with the goal of 
> creating concatenated gene lists in nucleic acid or amino 
> acid format. It is working fine, except for one thing: I need 
> to create gene labels incorporating information on whether 
> the gene is on the complementary strand or not ("complement" 
> in the CDS tag). How can I get Bioperl to tell me whether the 
> CDS tag value includes the word "complement"?
> 
> Thanks
> Sheri
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From barry.moore at genetics.utah.edu  Thu Feb  8 14:35:03 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 8 Feb 2007 12:35:03 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <E6200600-30F2-4471-9107-29A355F543F9@genetics.utah.edu>

Sheri-

The Bio::SeqFeature::Generic object has a 'strand' method, so you can  
just call strand on the CDS (or any other) feature like this.

   my @features = grep { $_->primary_tag eq 'CDS' } $seq- 
 >get_SeqFeatures();
   for my $feature (@features) {
	  my $strand = $feature->strand;
  }

Barry

On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote:

> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic
> question. I am trying to parse GenBank files with the goal of
> creating concatenated gene lists in nucleic acid or amino acid
> format. It is working fine, except for one thing: I need to create
> gene labels incorporating information on whether the gene is on the
> complementary strand or not ("complement" in the CDS tag). How can I
> get Bioperl to tell me whether the CDS tag value includes the word
> "complement"?
>
> Thanks
> Sheri
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Thu Feb  8 23:18:33 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 9 Feb 2007 15:18:33 +1100
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>

Chris,

> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
> XML::SAX::Expat
> XML::SAX::ExpatXS (the fastest)
> XML::LibXML::SAX
> XML::LibXML::SAX::Parser

That's excellent news - thanks for all the work you have put in on
this one. I'm impressed.

This is a good opportunity to encourage people who use Bio::SearchIO
for BLAST parsing to switch to 'blastxml' format over 'blast'.
Although the latter is more human readable, it perenially requires
parser source changes to cope with the variations and new formatting
introduced with each new NCBI BLAST release. Best to use "-m 7" XML
format, and convert as appropriate using one of the
Bio::Search::Writer:: classes.

--Torsten


From cjfields at uiuc.edu  Fri Feb  9 08:58:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 07:58:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu>

On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote:

> Chris,
>
>> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
>> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
>> XML::SAX::Expat
>> XML::SAX::ExpatXS (the fastest)
>> XML::LibXML::SAX
>> XML::LibXML::SAX::Parser
>
> That's excellent news - thanks for all the work you have put in on
> this one. I'm impressed.

Jason did most of the hard work; I just tinkered with it until it  
worked (and pestered a few perl XML guys along the way).  Thanks  
Grant and Bj?rn!

> This is a good opportunity to encourage people who use Bio::SearchIO
> for BLAST parsing to switch to 'blastxml' format over 'blast'.
> Although the latter is more human readable, it perenially requires
> parser source changes to cope with the variations and new formatting
> introduced with each new NCBI BLAST release. Best to use "-m 7" XML
> format, and convert as appropriate using one of the
> Bio::Search::Writer:: classes.
>
> --Torsten

I'll try getting some benchmarks for the different parsers up today  
on the wiki if I have time.

Strangely enough, NCBI changed a few things about BLAST XML a few  
releases back w/o mentioning it to anyone (it was a silent bug in  
BLAST XML parsing which I fixed recently).  If you sent in multiple  
queries in older versions of BLAST you would get all of the BLAST XML  
reports concatenated together, which required preparsing the reports  
to carve up the XML prior to parsing.  Now they treat it like PSI- 
BLAST where multiple queries = multiple iterations, so you get one  
long XML BLAST report where each iteration=Result.

The current parser should handle both as it just caches the other  
results and returns them one at a time prior to new parses, but I  
wouldn't recommend parsing a huge BLAST XML file with hundreds of  
queries as you'll quickly run out of memory!

If they get Perl SAX2 up to date with Expat they'll eventually add  
parse_chunk() and pause_parse() for each parser.  Until then...

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cuiw at ncbi.nlm.nih.gov  Fri Feb  9 09:20:10 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Fri, 9 Feb 2007 09:20:10 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
References: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov>

This is an example for fetching two GenBank records
(id=124504630,110665734) in XML format. Organism names like
'<GBSeq_organism>Rattus norvegicus</GBSeq_organism>' can be parsed from
the XML. 

 
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i
d=124504630,110665734&retmode=xml&rettype=gb

 
Or you can get TaxIds and translate them into real names:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide
&id=124504630,110665734&retmode=xml

 
Wenwu Cui, PhD

 
-----Original Message-----
From: George Heller [mailto:george.heller at yahoo.com] 
Sent: Thursday, February 08, 2007 1:55 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Perl script to extract from ncbi

 
Hi all, 

   
  I have a question regarding extracting data from Ncbi. I have a
database to store the sequence data, but the files I have loaded into
it, dont have a proper description line specified. Based on the
accession number, I need to find out what is the genus and species name
() from ncbi. 

   
  I have about 1500 records for which I need to extract the names from
ncbi. 

   
  Any ideas of how I can go about writing a perl script for extracting
this information from ncbi?

   
  Thanks!

  George.

 
---------------------------------

Now that's room service! Choose from over 150,000 hotels 

in 45,000 destinations on Yahoo! Travel to find your fit.

_______________________________________________

Bioperl-l mailing list

Bioperl-l at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Feb  9 12:51:19 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 09 Feb 2007 12:51:19 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <C1F21EC7.CBAA%bosborne11@verizon.net>

George,

http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat
abase

Brian O.


On 2/8/07 1:54 PM, "George Heller" <george.heller at yahoo.com> wrote:

> Hi all, 
>    
>   I have a question regarding extracting data from Ncbi. I have a database to
> store the sequence data, but the files I have loaded into it, dont have a
> proper description line specified. Based on the accession number, I need to
> find out what is the genus and species name (organism name) from ncbi.
>    
>   I have about 1500 records for which I need to extract the names from ncbi.
>    
>   Any ideas of how I can go about writing a perl script for extracting this
> information from ncbi?
>    
>   Thanks!
>   George.
> 
>  
> ---------------------------------
> Now that's room service! Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnston at biochem.ucl.ac.uk  Fri Feb  9 14:23:41 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
Message-ID: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>

Hi,

Could WrapperBase::executable warn you if it doesn't find the exe in
program_path? At the moment it just silently goes ahead and uses one in
the system path if it exists.

Cass.

I've never used diff, so not sure if this is right, but:

305,308c305,314
<        if( $prog_path && -e $prog_path && -x $prog_path ) {
<            $self->{'_pathtoexe'} = $prog_path;
<        } else {
<            my $exe;
---
>        if($prog_path){
>        if(-e $prog_path && -x $prog_path){
>          $self->{'_pathtoexe'} = $prog_path;
>        }
>        else{
>          $self->warn("executable not found in $prog_path, trying system
path...") if $warn;
>        }
>        }
>        unless ($self->{_path_to_exe}){
>        my $exe;
335a342


From bix at sendu.me.uk  Fri Feb  9 17:38:59 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:38:59 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
Message-ID: <45CCF803.9030004@sendu.me.uk>

Caroline Johnston wrote:
> Hi,
> 
> Could WrapperBase::executable warn you if it doesn't find the exe in
> program_path? At the moment it just silently goes ahead and uses one in
> the system path if it exists.

No, I think not. That would be very annoying when using wrappers for 
programs that you just have in your system path.

What specific problem are you encountering with the current behaviour?


From bix at sendu.me.uk  Fri Feb  9 17:40:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:40:33 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <45CCF861.8030000@sendu.me.uk>

Chris Fields wrote:
> If Sendu is out there, I think we can safely remove any dependencies  
> beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
> modify Build.PL?

Sure, good to hear.


From cjfields at uiuc.edu  Fri Feb  9 22:42:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 21:42:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45CCF861.8030000@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
Message-ID: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>


On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> If Sendu is out there, I think we can safely remove any dependencies
>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>> modify Build.PL?
>
> Sure, good to hear.

I added a version dependency for XML::SAX (v. 0.15) for the PurePerl  
fix.  That likely obviates the need for a Bundle for XML::Simple.   
Not too pressing; we can determine that before the next release.

chris


From johnston at biochem.ucl.ac.uk  Sat Feb 10 11:27:53 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <45CCF803.9030004@sendu.me.uk>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
	<45CCF803.9030004@sendu.me.uk>
Message-ID: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>

> No, I think not. That would be very annoying when using wrappers for
> programs that you just have in your system path.
>

Hmm, maybe I misundertood what the program_path was for? The executable
method goes straight to the system path unless program_path is set, so I
assumed you would only set program_path if you specifically wanted it to
look somewhere else. You wouldn't get a warning if you didn't specify a
program_path and just left it to look in the system path.

> What specific problem are you encountering with the current behaviour?

One version of an executable in /usr/local, another version - which I
wanted to use in my home directory.
The program_path method gets a path from an environment variable, which
was set to ~/.
I didn't realise I had the wrong permissions on the
executable though, and it was silently failing to use my version and using
the one in /usr/local instead.


Cass


From george.heller at yahoo.com  Sat Feb 10 15:35:18 2007
From: george.heller at yahoo.com (George Heller)
Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST)
Subject: [Bioperl-l] Error while parsing
Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com>

Hi all,
   
  I am in the process of parsing a few files, actually blast results, but happen to get the following error:
   
  ------------- EXCEPTION  -------------
MSG: Can't get HSPs: data not collected.
STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
STACK toplevel parser.pl:31
  --------------------------------------

  I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. 
   
  Thanks!
  George.

 
---------------------------------
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.


From cjfields at uiuc.edu  Sat Feb 10 17:56:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Feb 2007 16:56:19 -0600
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>

On Feb 10, 2007, at 2:35 PM, George Heller wrote:

> Hi all,
>
>   I am in the process of parsing a few files, actually blast  
> results, but happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing  
> wrong. Any pointers are appreciated.
>
>   Thanks!
>   George.

We'll need more to go on than that.  If the bioperl version is  
v1.5.2, please file a bug via the bioperl bugzilla:

http://bugzilla.open-bio.org/

Don't forget to attach a test file which triggers the bug using the  
'Create a new attachment' link after the report has been filed.

chris


From sac at bioperl.org  Sat Feb 10 22:56:10 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Feb 2007 19:56:10 -0800
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com>

Your report may be lacking HSP alignments for the hit you are attempting to
process. Note that by default, blast will report twice as many one-line
descriptions as it will alignments:

  -v  Number of database sequences to show one-line descriptions for (V)
[Integer]
    default = 500
  -b  Number of database sequence to show alignments for (B) [Integer]
    default = 250

Verify that this isn't the case for your error. If not, go ahead and file a
bug report. Attach the report (zipped if big) as well as the relevant
portion of your processing script.

Steve

On 2/10/07, George Heller <george.heller at yahoo.com> wrote:
>
> Hi all,
>
>   I am in the process of parsing a few files, actually blast results, but
> happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp
> /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing wrong.
> Any pointers are appreciated.
>
>   Thanks!
>   George.
>
>
> ---------------------------------
> No need to miss a message. Get email on-the-go
> with Yahoo! Mail for Mobile. Get started.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jay at jays.net  Sun Feb 11 09:24:55 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 08:24:55 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>

Just a heads-up --

I wanted to check the "E-mail me when a page I'm watching is changed"  
box in my preferences

http://www.bioperl.org/wiki/Special:Preferences

But I can't. Even if I change nothing and hit the Save button I get  
this:

----------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "User::saveSettings". MySQL returned error  
"1054: Unknown column 'user_newpass_time' in 'field list' (localhost)".
----------

(Yes, it literally says "(SQL query hidden)". That wasn't me for the  
purposes of this email. -grin-)

Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


Username:	Jhannah
User ID:	51


From jay at jays.net  Sun Feb 11 10:16:13 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 09:16:13 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>

Hmm.... The error appears to not be limited to changing preferences.  
I tried to update a couple different pages and got errors like this:

------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "Article::updateRedirectOn". MySQL returned  
error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
------

So all changes to the wiki aren't working right now?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Sun Feb 11 15:18:15 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 12:18:15 -0800
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>

Should be fine now - I did an upgrade to mediawiki 1.9 this weekend  
and i think the upgrade script didn't finish.

In the future system support requests should go to support - AT -  
open-bio.org so we can track them.

-jason
On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:

> Hmm.... The error appears to not be limited to changing preferences.
> I tried to update a couple different pages and got errors like this:
>
> ------
> Database error
> A database query syntax error has occurred. This may indicate a bug
> in the software. The last attempted database query was:
>
>      (SQL query hidden)
>
> from within function "Article::updateRedirectOn". MySQL returned
> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
> ------
>
> So all changes to the wiki aren't working right now?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From cjfields at uiuc.edu  Sun Feb 11 15:51:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 11 Feb 2007 14:51:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
Message-ID: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>

Is there a good place on the main wiki page to prominently display  
this?  I wanted to place something at the top of the main page but I  
didn't know if we wanted to post the support email address on the  
page itself.

chris

On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote:

> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend
> and i think the upgrade script didn't finish.
>
> In the future system support requests should go to support - AT -
> open-bio.org so we can track them.
>
> -jason
> On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:
>
>> Hmm.... The error appears to not be limited to changing preferences.
>> I tried to update a couple different pages and got errors like this:
>>
>> ------
>> Database error
>> A database query syntax error has occurred. This may indicate a bug
>> in the software. The last attempted database query was:
>>
>>      (SQL query hidden)
>>
>> from within function "Article::updateRedirectOn". MySQL returned
>> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
>> ------
>>
>> So all changes to the wiki aren't working right now?
>>
>> j
>> seqlab.net
>> http://www.bioperl.org/wiki/User:Jhannah
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Sun Feb 11 15:56:53 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 14:56:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
	<E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
Message-ID: <CAF40EBD-F0E2-434C-91F4-2B766B20E734@jays.net>

On Feb 11, 2007, at 2:51 PM, Chris Fields wrote:
> Is there a good place on the main wiki page to prominently display  
> this?  I wanted to place something at the top of the main page but  
> I didn't know if we wanted to post the support email address on the  
> page itself.

I added it here:

http://www.bioperl.org/wiki/About_site

Which is linked from all pages via the left-hand bar:  community |  
About this site

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From agd27 at cornell.edu  Sun Feb 11 12:47:03 2007
From: agd27 at cornell.edu (Adam Diehl)
Date: Sun, 11 Feb 2007 12:47:03 -0500
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
Message-ID: <45CF5697.60703@cornell.edu>

Good morning folks,

I've got sort of a newbie question regarding how to get gff's out of 
Bio::Tools:GFF objects that are formatted according to the UCSC browser 
conventions, described here:

http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
(Ignore the custom track headers and what-not. I just need the fields to 
be set up according to the descriptions in 1 - 9).

The write_feature($feature) method isn't doing it for me, as I get lines 
like the following (newlines excepted):

chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +       
.       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +       
.       
EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_
id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT
LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN

As you can see, field 8, which should be frame according to UCSC 
conventions is blank, and field 9, group according to UCSC, has frame, 
along with ID, etc. All this extra stuff causes the UCSC browser to 
choke. First off, it can't identify which features are the same (it does 
this by matching the group field), and second, it can't interpret the 
CDS's into translated proteins because it lacks frame data.

Basically what I need to do is, for CDS features, extract frame (or 
codon_start, as it were), from the last field, parse out the integer 
value and store that in field 8 (as frame), then parse out locus_tag 
from the last field, clear out everything else and store the locus_tag 
only in that field (preferably without the qualifier locus_tag=). For 
feature type gene, I just want to do the last step, so that the gene and 
CDS features for the same feature have matching group fields that are as 
simple as possible. Let me know if this is not clear.

The way I've been trying to do this is by stringifying each gff object, 
splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the 
following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to 
parse out the bits I need with regular expressions and store back to 
@tmp1[n].  -- This does not work, because perl wants to interpret every 
/ + etc. as a metacharacter!

I am assuming there's a simple way to get at each value in the last 
field of the gff object using methods supplied by Bio::Tools::GFF, but 
the API docs seem a bit lacking in this area. Could anyone steer me 
towards what I need to know to do this? Please let me know if I can 
clarify any details!

Cheers,
Adam Diehl


From jason at bioperl.org  Sun Feb 11 18:29:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 15:29:16 -0800
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
In-Reply-To: <45CF5697.60703@cornell.edu>
References: <45CF5697.60703@cornell.edu>
Message-ID: <F6B017A7-E91F-4739-9688-F1212EC857C8@bioperl.org>

I assume you are getting your features from a Bio::SeqIO parse of a  
Genbank file?

you get back a Bio::SeqFeature::Generic objects  so you want to look  
at the docs for that module to see what the API is.
you will need to set frame via
$feature->frame($frame)
You are going to have to determine the frame yourself if that isn't  
part of the feature, we don't calculate it for you.

For the 9th column, this is available through the tags methods  
has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag
so you can remove all the tags you don't want through remove_tag (or  
if you want to remove them all)
my $locus;
for my $tag ( $feature->get_all_tags ) {
  if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it
   ($locus) = $feature->get_tag_values($tag);
  }
  $feature->remove_tag($tag);
}

You will also want to set the GFF format when you call  
Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I  
don't know exactly how you set the tag then when they aren't paired  
with key=>value, you'll need to set the tag to 'group' so
$feature->add_tag_value('group', $locus);

If this is all unsatistfactory you can easily write your own GFF  
write for your flavor of the data with the
print join("\t",
                  $feat->seq_id,
                  $feat->source_tag,
                  $feat->primary_tag,
                  $feat->start,
                  $feat->end,
                  $feat->score,
                  $feat->strand > 0 ? '+' : '-',
                  $feat->frame,
		$locus), "\n";


-jason
On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote:

> Good morning folks,
>
> I've got sort of a newbie question regarding how to get gff's out of
> Bio::Tools:GFF objects that are formatted according to the UCSC  
> browser
> conventions, described here:
>
> http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
> (Ignore the custom track headers and what-not. I just need the  
> fields to
> be set up according to the descriptions in 1 - 9).
>
> The write_feature($feature) method isn't doing it for me, as I get  
> lines
> like the following (newlines excepted):
>
> chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +
> .       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
> chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +
> .
> EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: 
> 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase 
> +III%2C+beta+chain;protein_
> id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA 
> IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK 
> EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI 
> VLSNHKDFKAVATDSHRMSQRLIT
> LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE 
> TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP 
> TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN
>
> As you can see, field 8, which should be frame according to UCSC
> conventions is blank, and field 9, group according to UCSC, has frame,
> along with ID, etc. All this extra stuff causes the UCSC browser to
> choke. First off, it can't identify which features are the same (it  
> does
> this by matching the group field), and second, it can't interpret the
> CDS's into translated proteins because it lacks frame data.
>
> Basically what I need to do is, for CDS features, extract frame (or
> codon_start, as it were), from the last field, parse out the integer
> value and store that in field 8 (as frame), then parse out locus_tag
> from the last field, clear out everything else and store the locus_tag
> only in that field (preferably without the qualifier locus_tag=). For
> feature type gene, I just want to do the last step, so that the  
> gene and
> CDS features for the same feature have matching group fields that  
> are as
> simple as possible. Let me know if this is not clear.
>
> The way I've been trying to do this is by stringifying each gff  
> object,
> splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the
> following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally,  
> trying to
> parse out the bits I need with regular expressions and store back to
> @tmp1[n].  -- This does not work, because perl wants to interpret  
> every
> / + etc. as a metacharacter!
>
> I am assuming there's a simple way to get at each value in the last
> field of the gff object using methods supplied by Bio::Tools::GFF, but
> the API docs seem a bit lacking in this area. Could anyone steer me
> towards what I need to know to do this? Please let me know if I can
> clarify any details!
>
> Cheers,
> Adam Diehl
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From bix at sendu.me.uk  Sun Feb 11 18:39:15 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 11 Feb 2007 23:39:15 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>	<45CCF803.9030004@sendu.me.uk>
	<Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
Message-ID: <45CFA923.8010201@sendu.me.uk>

Caroline Johnston wrote:
>> No, I think not. That would be very annoying when using wrappers for
>> programs that you just have in your system path.
> 
> Hmm, maybe I misundertood what the program_path was for? The executable
> method goes straight to the system path unless program_path is set, so I
> assumed you would only set program_path if you specifically wanted it to
> look somewhere else. You wouldn't get a warning if you didn't specify a
> program_path and just left it to look in the system path.

Yes, sorry. Having now actually looked at your patch it seems fine. I'll 
commit it unless someone beats me to it.


From flope004 at hotmail.com  Sun Feb 11 21:40:08 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 03:40:08 +0100
Subject: [Bioperl-l] TreeIO, how it works?
Message-ID: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>

Hi,

I have a problem. I don't understand how TreeIO reads the trees:
my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));

An unrooted tree with 4 tips and 2 internal nodes.
when I asked for:
print "Total number of nodes ",$tree->number_nodes;

I get 6 but when I ask for:
foreach my $node (@nodes) {
	print $node->internal_id,",";
}
I get 6,0,1,2,3,4,5. Total 7.

The root is number 6 and 2 and 5 are my internal nodes.
If I set the root to be number 5 this node 6 is still present.
Why? what is the node 6?

when I try the following:
  $node5 = $tree->find_node(-internal_id => '5');
  $node6 = $tree->find_node(-internal_id => '6');
  $node2 = $tree->find_node(-internal_id => '2');
  $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
  $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
  $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
  or any other distance I get 2 warnings:
  -------------------- WARNING ---------------------
MSG: Must provide a valid array reference for -nodes
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Could not find distance!
---------------------------------------------------
What am I doing incorrectly?

I am practicing with AlignIO and TreeIO to calculate the maximum likelihood 
for a given tree. So,
other information about that would be of great help. I am practicing with 
this to see how Bioperl can
help me with more complex problems.

Thank you very much for your help!

_________________________________________________________________
Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos 
incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos


From jason at bioperl.org  Sun Feb 11 22:05:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 19:05:18 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
References: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>


On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote:

> Hi,
>
> I have a problem. I don't understand how TreeIO reads the trees:
> my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));
>
> An unrooted tree with 4 tips and 2 internal nodes.
> when I asked for:
> print "Total number of nodes ",$tree->number_nodes;
>
> I get 6 but when I ask for:
> foreach my $node (@nodes) {
> 	print $node->internal_id,",";
> }
> I get 6,0,1,2,3,4,5. Total 7.
>
> The root is number 6 and 2 and 5 are my internal nodes.
> If I set the root to be number 5 this node 6 is still present.
> Why? what is the node 6?

Node 6 is to hold the root or a fake root with a trifurcation for  
unrooted trees.  Did you actually call the reroot method to set the  
root to node 5?

>
> when I try the following:
>   $node5 = $tree->find_node(-internal_id => '5');
>   $node6 = $tree->find_node(-internal_id => '6');
>   $node2 = $tree->find_node(-internal_id => '2');
>   $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
>   $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
>   $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
>   or any other distance I get 2 warnings:
>   -------------------- WARNING ---------------------
> MSG: Must provide a valid array reference for -nodes
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: Could not find distance!
> ---------------------------------------------------
> What am I doing incorrectly?
>
The distance method is just summing branch lengths on the path  
between two nodes.  Is that what are you trying to do?

The error message you report doesn't make sense as
"Must provide a valid array reference for -nodes"
is only printed when you call is_monophyletic or is_paraphyletic as  
far as I can tell.

what version of bioperl are you using?

> I am practicing with AlignIO and TreeIO to calculate the maximum  
> likelihood
> for a given tree. So,other information about that would be of great  
> help. I am practicing with
> this to see how Bioperl can help me with more complex problems.
>
You are trying to calculate the likelihood of a tree or are you  
trying to generate a ML tree from an alignment?

> Thank you very much for your help!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos
> incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis.
> http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From er at xs4all.nl  Mon Feb 12 08:03:06 2007
From: er at xs4all.nl (Erik)
Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET)
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>

Hi,


The bioperl wiki changes rss / atom feed has two leading empty lines which
invalidate the xml:

XML Parsing Error: xml declaration not at start of external entity
Location:
http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss
Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
^

Could those be removed? (I didn't see a way to do it myself). Might be a
useful feed :)


thanks,

Erik


From cjfields at uiuc.edu  Mon Feb 12 09:52:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Feb 2007 08:52:44 -0600
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
	<20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
Message-ID: <DA1A57C0-32B5-4095-AB80-318B5F529730@uiuc.edu>

I have forwarded this to support at open-bio.org, which should take  
care of it.

chris

On Feb 12, 2007, at 7:03 AM, Erik wrote:

> Hi,
>
>
> The bioperl wiki changes rss / atom feed has two leading empty  
> lines which
> invalidate the xml:
>
> XML Parsing Error: xml declaration not at start of external entity
> Location:
> http://www.bioperl.org/w/index.php? 
> title=Special:Recentchanges&feed=rss
> Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
> ^
>
> Could those be removed? (I didn't see a way to do it myself). Might  
> be a
> useful feed :)
>
>
> thanks,
>
> Erik
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sm8 at sanger.ac.uk  Mon Feb 12 12:12:00 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 17:12:00 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF830@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From sm8 at sanger.ac.uk  Mon Feb 12 11:04:41 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 16:04:41 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From flope004 at hotmail.com  Mon Feb 12 13:07:12 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 19:07:12 +0100
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>
Message-ID: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>

thanks for your reply!

I am using Bioperl 1.4.

>Node 6 is to hold the root or a fake root with a trifurcation for
>unrooted trees.  Did you actually call the reroot method to set the
>root to node 5?

Yes, I tried the following with the same result:
$tree->reroot($tree->find_node(-internal_id => '5'));
or
$tree->set_root_node($tree->find_node(-internal_id => '5'));

Even if I use a rooted tree: 
(((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1);
I get the node #6. So, is it always present? Am I not representing properly 
a rooted tree  in newick format?

>The distance method is just summing branch lengths on the path
>between two nodes.  Is that what are you trying to do?
>
>The error message you report doesn't make sense as
>"Must provide a valid array reference for -nodes"
>is only printed when you call is_monophyletic or is_paraphyletic as
>far as I can tell.

I do not know yet what I was doing incorrectly but now It works. Yes, I was 
using the distance method to know where the node 6 was located. For the 
unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree 
node 6 was 0.1 from the mouse leaf and the internal node (root).
The error message: "Must provide a valid array reference for -nodes" is 
shown if I indicate a node which is not present in the tree.

>You are trying to calculate the likelihood of a tree or are you
>trying to generate a ML tree from an alignment?

I am trying to calculate the likelihood of a tree, as a practice. Probably 
there are other  bioperl modules, besides AlignIO and TreeIO, which can help 
me in the process and I do not know them.

Again, thank you for your time!

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From dmessina at wustl.edu  Mon Feb 12 12:49:49 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 12 Feb 2007 11:49:49 -0600
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu>

Stephen,

Great, thanks for this. Could you submit it to Bugzilla as an  
enhancement?

http://bugzilla.open-bio.org/


Thanks,
Dave


From jason at bioperl.org  Mon Feb 12 13:38:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 12 Feb 2007 10:38:11 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
References: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
Message-ID: <BD0EF8B4-69A9-468E-A722-1110B02D0EF7@bioperl.org>

I would definitely suggest getting ahold of bioperl 1.5.2 as I seem  
to remember there are several fixes in the tree module code for re- 
rooting a tree.
-jason

On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote:

> thanks for your reply!
>
> I am using Bioperl 1.4.
>
>> Node 6 is to hold the root or a fake root with a trifurcation for
>> unrooted trees.  Did you actually call the reroot method to set the
>> root to node 5?
>
> Yes, I tried the following with the same result:
> $tree->reroot($tree->find_node(-internal_id => '5'));
> or
> $tree->set_root_node($tree->find_node(-internal_id => '5'));
>
> Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): 
> 0.1,mouse:0.1);
> I get the node #6. So, is it always present? Am I not representing  
> properly a rooted tree  in newick format?
>
>> The distance method is just summing branch lengths on the path
>> between two nodes.  Is that what are you trying to do?
>>
>> The error message you report doesn't make sense as
>> "Must provide a valid array reference for -nodes"
>> is only printed when you call is_monophyletic or is_paraphyletic as
>> far as I can tell.
>
> I do not know yet what I was doing incorrectly but now It works.  
> Yes, I was using the distance method to know where the node 6 was  
> located. For the unrooted tree, node 6 was node 5 (an internal  
> node) and for the rooted tree node 6 was 0.1 from the mouse leaf  
> and the internal node (root).
> The error message: "Must provide a valid array reference for - 
> nodes" is shown if I indicate a node which is not present in the tree.
>
>> You are trying to calculate the likelihood of a tree or are you
>> trying to generate a ML tree from an alignment?
>
> I am trying to calculate the likelihood of a tree, as a practice.  
> Probably there are other  bioperl modules, besides AlignIO and  
> TreeIO, which can help me in the process and I do not know them.
>
> Again, thank you for your time!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet.  
> Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
>

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From johnsonm at gmail.com  Mon Feb 12 18:13:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 12 Feb 2007 17:13:09 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>

On 2/7/07, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     Well, each format has some unique features.  If the user declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just to see
> how nasty it would end up being.  I just can't stomach having the code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.


    I've got a 4-in-1 parser roughed in per Chris Fields' suggestion.   Two
actual parsing routines (prokaryotic and eukaryotic).  You can specify
-format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it
will look through the input until it can figure out what it is looking at.
    I've got one main issue to solve, the rest is just stuff like updating
the POD.  Torsten Seemann very helpfully added example output for all 4
formats to t/data.  Looking at GlimmerHMM.out, the first line is
'GlimmerHMM'.  However, I think there is a bug in the existing
_parse_predictions:

Shouldn't this:

} elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }

be this instead:

} elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }


I lifted that bit of code to do format detection...we don't have GlimmerHMM
installed locally, so I'm assuming Torsten's output is correct and the above
is a bug.  Guess I'll go check bugzilla...


From torsten.seemann at infotech.monash.edu.au  Mon Feb 12 21:07:40 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 13 Feb 2007 13:07:40 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
Message-ID: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>

Mark,

>     I've got one main issue to solve, the rest is just stuff like updating
> the POD.  Torsten Seemann very helpfully added example output for all 4
> formats to t/data.  Looking at GlimmerHMM.out, the first line is
> 'GlimmerHMM'.  However, I think there is a bug in the existing
> _parse_predictions:
> Shouldn't this:
> } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
> be this instead:
> } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version

I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
Here's why:

I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
parse GlimmerM. I noted that GlimmerHMM was the same output format as
GlimmerM, except for the first line. So in rev 1.5 I modified the
regexp to match both ie. \S* . This would also hopefully match any
other Glimmer-clone formats that arose. I also fixed the pdocs to say
this, and added tests to t/Genpred.t.
% cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
% cvs diff -r 1.15 -r 1.16 t/Genpred.t

I then planned to extend support to Glimmer2 and Glimmer3. I added the
4 test files (t/Glimmer*.out) but never wrote the code. This is where
you have come in Mark :-)

> I lifted that bit of code to do format detection...we don't have GlimmerHMM
> installed locally, so I'm assuming Torsten's output is correct and the above
> is a bug.  Guess I'll go check bugzilla...

I'm pretty sure my 4 test files are correct - I spent a lot of time
ensuring they were consistent etc, as I was getting very confused with
the different "glimmer" versions!

Hope this all helps,

--Torsten


From avilella at gmail.com  Tue Feb 13 08:20:15 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 13 Feb 2007 13:20:15 +0000
Subject: [Bioperl-l] number of gaps for the other sequences in an alignment
Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com>

Hi,

It would be great if we could have a method to count, given one
sequence in an alignment, the number of gaps present in the rest of
the sequences of the alignment. That is, for each
nucleotide/aminoacidic position of the sequence of interest, look at
the column in the alignment, count the gaps, then sum them over for
the rest of the non-gapped columns in the sequence of interest.

Has anyone tried this before?

My idea is to end up having a coefficient of indel contribution for
each of the sequences in the alignment, with this coefficient being
high when one sequences forces a lot of gaps to be inserted in the
final alignment, in order to accommodate this given sequence.

I would say that the best place for this is either using methods
already available in SimpleAlign, or have something new added there.

Looking forward to your comments,

Cheers,

    Albert.


From bix at sendu.me.uk  Tue Feb 13 11:09:09 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 13 Feb 2007 16:09:09 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
Message-ID: <45D1E2A5.6060104@sendu.me.uk>

I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database 
and wanted to associated some basic information with them, like exon 
positions. I thought of creating Bio::SeqFeature::Gene::Transcript 
objects and storing them so I could later use features() to see what 
other features overlapped exons. I ran into a fatal error that can be 
replicated with the following simplified one-liner:

perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e 
'$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => 
"dbi:mysql:test"); $trans = 
Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id 
=> "test"); $db->store($trans); @trans = $db->features(-seqid => $id, 
-type => "transcript"); print "@trans\n";'

code sub {
     package Bio::SeqFeature::Generic;
     use strict 'refs';
     my $self = shift @_;
     foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
         $f = undef;
     }
     $$self{'_gsf_seq'} = undef;
     foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
         $$self{'_gsf_tag_hash'}{$t} = undef;
         delete $$self{'_gsf_tag_hash'}{$t};
     }
} did not evaluate to a subroutine reference, at 
/.../Bio/DB/SeqFeature/Store.pm line 2280


Is this a bug? Or am I taking the wrong approach?


From johnsonm at gmail.com  Tue Feb 13 15:10:23 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 13 Feb 2007 14:10:23 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
Message-ID: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>

    You're quite correct.  I wasn't paying enough attention.  That does work
just fine.  I fat-fingered something somewhere else, broke my version of the
module for GlimmerHMM, hallucinated and confused \S and \s.  8)
    All I have left now is to fixup the POD documentation and such and then
I can send the module along and somebody can make whatever tweaks and check
it in.  Shall I open a ticket in Bugzilla for this and attach diffs, or just
send them along to somebody to take care of directly?
    Oh, one thing I have not mentioned.  I also added a -seqname argument.
Glimmer2 does not provide any kind of sequence identifier in the output, and
only processes the first sequence in a fasta file.  It would be tedious to
have to code around this by fixing up the predictions after they are
produced, so I added the option to provide this missing info up front,
hopefully allowing downstream code to not have to care as much and have a
special case for fixing up Glimmer2 predictions.

On 2/12/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:

> I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
> Here's why:
>
> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
> parse GlimmerM. I noted that GlimmerHMM was the same output format as
> GlimmerM, except for the first line. So in rev 1.5 I modified the
> regexp to match both ie. \S* . This would also hopefully match any
> other Glimmer-clone formats that arose. I also fixed the pdocs to say
> this, and added tests to t/Genpred.t.
> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>
> I then planned to extend support to Glimmer2 and Glimmer3. I added the
> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
> you have come in Mark :-)
>
> > I lifted that bit of code to do format detection...we don't have
> GlimmerHMM
> > installed locally, so I'm assuming Torsten's output is correct and the
> above
> > is a bug.  Guess I'll go check bugzilla...
>
> I'm pretty sure my 4 test files are correct - I spent a lot of time
> ensuring they were consistent etc, as I was getting very confused with
> the different "glimmer" versions!
>
> Hope this all helps,
>
> --Torsten
>


From cjfields at uiuc.edu  Tue Feb 13 15:47:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 14:47:19 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
Message-ID: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>

You'll also want to update whatever relevant tests there are for  
Glimmer; looks like they are in GenPred.t.

chris

On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote:

>     You're quite correct.  I wasn't paying enough attention.  That  
> does work
> just fine.  I fat-fingered something somewhere else, broke my  
> version of the
> module for GlimmerHMM, hallucinated and confused \S and \s.  8)
>     All I have left now is to fixup the POD documentation and such  
> and then
> I can send the module along and somebody can make whatever tweaks  
> and check
> it in.  Shall I open a ticket in Bugzilla for this and attach  
> diffs, or just
> send them along to somebody to take care of directly?
>     Oh, one thing I have not mentioned.  I also added a -seqname  
> argument.
> Glimmer2 does not provide any kind of sequence identifier in the  
> output, and
> only processes the first sequence in a fasta file.  It would be  
> tedious to
> have to code around this by fixing up the predictions after they are
> produced, so I added the option to provide this missing info up front,
> hopefully allowing downstream code to not have to care as much and  
> have a
> special case for fixing up Glimmer2 predictions.
>
> On 2/12/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>
>> I think it should be what it says, or perhaps now /^(Glimmer(M| 
>> HMM))/.
>> Here's why:
>>
>> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
>> parse GlimmerM. I noted that GlimmerHMM was the same output format as
>> GlimmerM, except for the first line. So in rev 1.5 I modified the
>> regexp to match both ie. \S* . This would also hopefully match any
>> other Glimmer-clone formats that arose. I also fixed the pdocs to say
>> this, and added tests to t/Genpred.t.
>> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
>> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>>
>> I then planned to extend support to Glimmer2 and Glimmer3. I added  
>> the
>> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
>> you have come in Mark :-)
>>
>>> I lifted that bit of code to do format detection...we don't have
>> GlimmerHMM
>>> installed locally, so I'm assuming Torsten's output is correct  
>>> and the
>> above
>>> is a bug.  Guess I'll go check bugzilla...
>>
>> I'm pretty sure my 4 test files are correct - I spent a lot of time
>> ensuring they were consistent etc, as I was getting very confused  
>> with
>> the different "glimmer" versions!
>>
>> Hope this all helps,
>>
>> --Torsten
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thokeller at gmail.com  Tue Feb 13 17:00:06 2007
From: thokeller at gmail.com (Thomas Keller)
Date: Tue, 13 Feb 2007 14:00:06 -0800
Subject: [Bioperl-l] update/install problem
Message-ID: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>

Could someone suggest a workaround or fix for this error?

$ sudo fink update bioperl-pm586
Information about 5850 packages read in 2 seconds.
The package 'bioperl-pm586' will be built and installed.
The package 'xml-sax-pm586' will be installed.
The package 'xml-sax-writer-pm586' will be built and installed.
The package 'xml-filter-buffertext-pm586' will be built and installed.
The following package will be installed or updated:
 bioperl-pm586
The following 3 additional packages will be installed:
 xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
Do you want to continue? [Y/n] Y
/sw/bin/dpkg-lockwait -i
/sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin-
powerpc.deb
(Reading database ... 48029 files and directories currently installed.)
Preparing to replace xml-sax-pm586 0.13-2 (using
.../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
Unpacking replacement xml-sax-pm586 ...
Setting up xml-sax-pm586 (0.13-2) ...
update-perl586-sax-parsers: adding Perl SAX parser module info file of
XML::SAX::PurePerl...
Can't locate object method "save_parsers_debian" via package "XML::SAX" at
/sw/sbin/update-perl586-sax-parsers line 96.
/sw/bin/dpkg: error processing xml-sax-pm586 (--install):
 subprocess post-installation script returned error exit status 22
Errors were encountered while processing:
 xml-sax-pm586
### execution of /sw/bin/dpkg-lockwait failed, exit code 1
Failed: can't install package xml-sax-pm586-0.13-2


-- 
Tom Keller
"Ecrasez l'Infame!" -- Voltaire


From sac at bioperl.org  Tue Feb 13 18:00:46 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 13 Feb 2007 15:00:46 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>

I noticed that Bio::Root::Utilities was purged from bioperl-live for the
1.5.2 release, but I'd like us to consider adding it back. I agree that the
other purged Root modules were ancient relics of the past, but Bio::Root::
Utilities.pm still has signs of life (at least I still find occasion to use
it, or refer to code in it).

I know that it's not currently used by any other modules in Bioperl, but
there are likely some legacy scripts out there that rely on it. Probably
most of those scripts are ones I've written, but there have been substantive
commits by others in the not-to-distant past (Dec 2005), so at least some
folks besides myself are using it and may hesitate to upgrade their bioperl
installation if it's absent.

I'm all for avoiding bloat in the codebase and am eager to see Bioperl be
more lean and mean, but I'd like to keep this module around. I'll agree to
add some tests for it as well as clean some things up (e.g., use
Bio::Root::IO to get temp file name).

Cheers,
Steve
--
Steve Chervitz
sac at bioperl.org


From cjfields at uiuc.edu  Tue Feb 13 20:29:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 19:29:03 -0600
Subject: [Bioperl-l] update/install problem
In-Reply-To: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
Message-ID: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>

On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote:

> Could someone suggest a workaround or fix for this error?
>
> $ sudo fink update bioperl-pm586
> Information about 5850 packages read in 2 seconds.
> The package 'bioperl-pm586' will be built and installed.
> The package 'xml-sax-pm586' will be installed.
> The package 'xml-sax-writer-pm586' will be built and installed.
> The package 'xml-filter-buffertext-pm586' will be built and installed.
> The following package will be installed or updated:
>  bioperl-pm586
> The following 3 additional packages will be installed:
>  xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
> Do you want to continue? [Y/n] Y
> /sw/bin/dpkg-lockwait -i
> /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ 
> xml-sax-pm586_0.13-2_darwin-
> powerpc.deb
> (Reading database ... 48029 files and directories currently  
> installed.)
> Preparing to replace xml-sax-pm586 0.13-2 (using
> .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
> Unpacking replacement xml-sax-pm586 ...
> Setting up xml-sax-pm586 (0.13-2) ...
> update-perl586-sax-parsers: adding Perl SAX parser module info file of
> XML::SAX::PurePerl...
> Can't locate object method "save_parsers_debian" via package  
> "XML::SAX" at
> /sw/sbin/update-perl586-sax-parsers line 96.
> /sw/bin/dpkg: error processing xml-sax-pm586 (--install):
>  subprocess post-installation script returned error exit status 22
> Errors were encountered while processing:
>  xml-sax-pm586
> ### execution of /sw/bin/dpkg-lockwait failed, exit code 1
> Failed: can't install package xml-sax-pm586-0.13-2

The fink installation seems to be hanging on XML::SAX, not bioperl.   
You could try installing XML::SAX (now at v. 0.15) via CPAN using  
'sudo cpan'; I updated just recently w/o problems.

As an aside, you could similarly install bioperl directly from CPAN  
(which I also haven't had any problems with).  The installation  
allows for installing optional modules.

chris


From cjfields at uiuc.edu  Tue Feb 13 22:41:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 21:41:31 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>


On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:

> I noticed that Bio::Root::Utilities was purged from bioperl-live  
> for the
> 1.5.2 release, but I'd like us to consider adding it back. I agree  
> that the
> other purged Root modules were ancient relics of the past, but  
> Bio::Root::
> Utilities.pm still has signs of life (at least I still find  
> occasion to use
> it, or refer to code in it).
>
> I know that it's not currently used by any other modules in  
> Bioperl, but
> there are likely some legacy scripts out there that rely on it.  
> Probably
> most of those scripts are ones I've written, but there have been  
> substantive
> commits by others in the not-to-distant past (Dec 2005), so at  
> least some
> folks besides myself are using it and may hesitate to upgrade their  
> bioperl
> installation if it's absent.
>
> I'm all for avoiding bloat in the codebase and am eager to see  
> Bioperl be
> more lean and mean, but I'd like to keep this module around. I'll  
> agree to
> add some tests for it as well as clean some things up (e.g., use
> Bio::Root::IO to get temp file name).
>
> Cheers,
> Steve
> --
> Steve Chervitz
> sac at bioperl.org

I don't have a problem with adding it back, esp. if tests are added.   
Everything in Bio::Root* not tied to a module was yanked out when no  
one spoke up about cleaning up Bio::Root* modules:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ 
focus=12839

Maybe others disagree?

chris


From bix at sendu.me.uk  Wed Feb 14 03:00:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 08:00:35 +0000
Subject: [Bioperl-l] update/install problem
In-Reply-To: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
	<C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
Message-ID: <45D2C1A3.9060300@sendu.me.uk>

Chris Fields wrote:
> As an aside, you could similarly install bioperl directly from CPAN  
> (which I also haven't had any problems with).

Indeed. If you follow the unix instructions at 
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have 
a problem-free complete install under Mac OS X.


From bix at sendu.me.uk  Wed Feb 14 09:08:22 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:08:22 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
Message-ID: <45D317D6.5070903@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> If Sendu is out there, I think we can safely remove any dependencies
>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>> modify Build.PL?
>>
>> Sure, good to hear.
> 
> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl 
> fix.  That likely obviates the need for a Bundle for XML::Simple.  Not 
> too pressing; we can determine that before the next release.

The bundle is now obsolete. Does anything in Bioperl, or any of its 
dependencies, now make use of the expat library? If not, I can remove 
mention of it from the install documentation.


From bix at sendu.me.uk  Wed Feb 14 09:02:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:02:39 +0000
Subject: [Bioperl-l] DB.t failures
Message-ID: <45D3167F.2000608@sendu.me.uk>

DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer 
getting sequences back from NCBI in the order we requested them in batch 
mode.

Is this a change at NCBI? Is there some way we can make sure to return 
the sequences in the expected order? Or shouldn't the order be expected 
(should the test script be altered)?


From cjfields at uiuc.edu  Wed Feb 14 09:37:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:37:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu>

Confirmed on this end.

It's possible that the default sort order from eutils is different  
now though I haven't seen anything on the eutils mail list.  There  
may be a way to set the sort order via the base URL; I'll check into  
it later today; I'm still digging myself out from the midwest blizzard.

chris

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:

> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.
>
> Is this a change at NCBI? Is there some way we can make sure to return
> the sequences in the expected order? Or shouldn't the order be  
> expected
> (should the test script be altered)?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Feb 14 09:42:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:42:05 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45D317D6.5070903@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
	<45D317D6.5070903@sendu.me.uk>
Message-ID: <E9611B3C-658E-4CBC-A2ED-1990F929A130@uiuc.edu>


On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> If Sendu is out there, I think we can safely remove any  
>>>> dependencies
>>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>>> modify Build.PL?
>>>
>>> Sure, good to hear.
>>
>> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl
>> fix.  That likely obviates the need for a Bundle for XML::Simple.   
>> Not
>> too pressing; we can determine that before the next release.
>
> The bundle is now obsolete. Does anything in Bioperl, or any of its
> dependencies, now make use of the expat library? If not, I can remove
> mention of it from the install documentation.

I'll try getting something up about XML::SAX on the wiki today.   
XML::Parser, though, still requires expat AFAIK:

http://www.bioperl.org/wiki/BioPerl_Dependencies

chris


From kellert at ohsu.edu  Tue Feb 13 17:43:24 2007
From: kellert at ohsu.edu (Thomas J Keller)
Date: Tue, 13 Feb 2007 14:43:24 -0800
Subject: [Bioperl-l] HowTo:SearchIO
Message-ID: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>

Greetings,
I've been away from programming and informatics for many months.  
Hoping to get back into it, I thought it would be good to review the  
tutorials.
I tried the code in the tutorial on the sample blast report in the  
tutorial and it worked fine. So I ran a blastx search and saved the  
results and tried to parse them: It gave the "... parsing" message,  
but no other results get reported.

Any suggestions?

Thanks,
Tom

Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From mrouard at gmail.com  Wed Feb 14 06:23:47 2007
From: mrouard at gmail.com (Mathieu Rouard)
Date: Wed, 14 Feb 2007 12:23:47 +0100
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
Message-ID: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>

Dear all,

I am starting to use the bioperl API to parse multiple alignments and I am
wondering what is the most effective way to splice all the columns from an
alignment (all the AA at the postion 1, position 2 etc.). I quickly
implemented this simple code but it becomes quite slow when the length of
sequences increases.

my $stream  = Bio::AlignIO->new(-file => $inputfilename,
                        '-format' => 'stockholm');

my $aln = $stream->next_aln();

my $length = $aln->length();
my %column;

for (my $i=1;$i<=$length;$i++) {
       my $aa;
        foreach my $seq ($aln->each_seq()) {
          my $obj = $seq->trunc($i,$i);
          $aa .=$obj->seq;
        }
     # need to track the column number and the sequence of the column
     push $column,  $aa;
}

Would you have any other suggestion?

thanks
Mathieu


From avilella at gmail.com  Wed Feb 14 10:29:02 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 14 Feb 2007 15:29:02 +0000
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
In-Reply-To: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
References: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com>

there is a slice method:

  $mini_aln = $aln->slice(20,30);  # get a block of columns

 Title     : slice
 Usage     : $aln2 = $aln->slice(20,30)
 Function  : Creates a slice from the alignment inclusive of start and
             end columns, and the first column in the alignment is denoted 1.
             Sequences with no residues in the slice are excluded from the
             new alignment and a warning is printed. Slice beyond the length of
             the sequence does not do padding.
 Returns   : A Bio::SimpleAlign object
 Args      : Positive integer for start column, positive integer for end column,
             optional boolean which if true will keep gap-only columns
in the newly
             created slice. Example:

             $aln2 = $aln->slice(20,30,1)

but I don't know how well it behaves for lots of sequences :)


On 2/14/07, Mathieu Rouard <mrouard at gmail.com> wrote:
> Dear all,
>
> I am starting to use the bioperl API to parse multiple alignments and I am
> wondering what is the most effective way to splice all the columns from an
> alignment (all the AA at the postion 1, position 2 etc.). I quickly
> implemented this simple code but it becomes quite slow when the length of
> sequences increases.
>
> my $stream  = Bio::AlignIO->new(-file => $inputfilename,
>                         '-format' => 'stockholm');
>
> my $aln = $stream->next_aln();
>
> my $length = $aln->length();
> my %column;
>
> for (my $i=1;$i<=$length;$i++) {
>        my $aa;
>         foreach my $seq ($aln->each_seq()) {
>           my $obj = $seq->trunc($i,$i);
>           $aa .=$obj->seq;
>         }
>      # need to track the column number and the sequence of the column
>      push $column,  $aa;
> }
>
> Would you have any other suggestion?
>
> thanks
> Mathieu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Wed Feb 14 11:59:49 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 14 Feb 2007 08:59:49 -0800
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>

As always, reporting the version of BLAST and Bioperl you have  
installed will help someone diagnose if this is a fixed problem or  
not.  If you trawl through the list archives you'll chris and others  
have been playing cat and mouse with the text version output from  
NCBI BLAST which appears to be an ever evolving beast.

So the best advice right now is to get the latest bioperl from CVS   
to insure you have all the patches that might parse this version.  If  
it still fails then the standard response will be to submit the  
report as an attachment to a new bug report on the bugzilla.

thanks,
-jason


On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:

> Greetings,
> I've been away from programming and informatics for many months.
> Hoping to get back into it, I thought it would be good to review the
> tutorials.
> I tried the code in the tutorial on the sample blast report in the
> tutorial and it worked fine. So I ran a blastx search and saved the
> results and tried to parse them: It gave the "... parsing" message,
> but no other results get reported.
>
> Any suggestions?
>
> Thanks,
> Tom
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From dmessina at wustl.edu  Wed Feb 14 11:58:45 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 10:58:45 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu>

Hi Tom,

Could you tell us what version of BioPerl you are using, and what  
specific example is failing for  you? And could you post your code?

That would make it easier to diagnose the problem.

Thanks,
Dave

-- 
Dave Messina
Senior Programmer/Analyst, Assembly Group
WashU Genome Sequencing Center
dmessina a t  wustl.edu
314-286-1415


From cjfields at uiuc.edu  Wed Feb 14 12:28:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 11:28:24 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>

I would also strongly encourage switching to using XML-based parsing,  
which is much more stable now.  Here's the link to the NCBI response  
re: BLAST report parsing:

http://bioperl.org/wiki/NCBI_Blast_email

chris (taking a break from shoveling snow...)

On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote:

> As always, reporting the version of BLAST and Bioperl you have
> installed will help someone diagnose if this is a fixed problem or
> not.  If you trawl through the list archives you'll chris and others
> have been playing cat and mouse with the text version output from
> NCBI BLAST which appears to be an ever evolving beast.
>
> So the best advice right now is to get the latest bioperl from CVS
> to insure you have all the patches that might parse this version.  If
> it still fails then the standard response will be to submit the
> report as an attachment to a new bug report on the bugzilla.
>
> thanks,
> -jason
>
>
> On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:
>
>> Greetings,
>> I've been away from programming and informatics for many months.
>> Hoping to get back into it, I thought it would be good to review the
>> tutorials.
>> I tried the code in the tutorial on the sample blast report in the
>> tutorial and it worked fine. So I ran a blastx search and saved the
>> results and tried to parse them: It gave the "... parsing" message,
>> but no other results get reported.
>>
>> Any suggestions?
>>
>> Thanks,
>> Tom
>>
>> Tom Keller, Ph.D.
>> kellert at ohsu.edu
>> 503-494-2442
>> 6339b Basic Science Bldg
>> http://www.ohsu.edu/research/core
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sac at bioperl.org  Wed Feb 14 13:20:17 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 14 Feb 2007 10:20:17 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:
>
> > I noticed that Bio::Root::Utilities was purged from bioperl-live
> > for the
> > 1.5.2 release, but I'd like us to consider adding it back. I agree
> > that the
> > other purged Root modules were ancient relics of the past, but
> > Bio::Root::
> > Utilities.pm still has signs of life (at least I still find
> > occasion to use
> > it, or refer to code in it).
> >
> > I know that it's not currently used by any other modules in
> > Bioperl, but
> > there are likely some legacy scripts out there that rely on it.
> > Probably
> > most of those scripts are ones I've written, but there have been
> > substantive
> > commits by others in the not-to-distant past (Dec 2005), so at
> > least some
> > folks besides myself are using it and may hesitate to upgrade their
> > bioperl
> > installation if it's absent.
> >
> > I'm all for avoiding bloat in the codebase and am eager to see
> > Bioperl be
> > more lean and mean, but I'd like to keep this module around. I'll
> > agree to
> > add some tests for it as well as clean some things up (e.g., use
> > Bio::Root::IO to get temp file name).
> >
> > Cheers,
> > Steve
> > --
> > Steve Chervitz
> > sac at bioperl.org
>
> I don't have a problem with adding it back, esp. if tests are added.
> Everything in Bio::Root* not tied to a module was yanked out when no
> one spoke up about cleaning up Bio::Root* modules:
>
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
> focus=12839
>
> Maybe others disagree?
>
> chris
>

Sorry I missed out on that thread. I had some trouble with my bioperl-l
email delivery getting disabled due to excessive bounces, and it took me a
while to catch it.

Bio::Root::Utilities is quite a grab bag of miscellaneous general functions
that are occasionally useful for perl scripting (e.g., determining
end-of-line characters, sending email, etc.). The code could definitely use
a review, and maybe an example script to advertise it. I can look into this,
and suggestions are welcome.

Steve


From dmessina at wustl.edu  Wed Feb 14 13:55:18 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 12:55:18 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>


On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:

> I would also strongly encourage switching to using XML-based parsing,

Unless anyone objects, I would be happy to update the HOWTO to  
suggest people make the switch and give an example of XML parsing.

The Bio::SearchIO synopsis is already an XML example. However,  
there's no warning about text-based parsing nor a suggestion to use  
XML that I can see -- perhaps should be added?

Dave


From cjfields at uiuc.edu  Wed Feb 14 15:12:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 14:12:21 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
Message-ID: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>


On Feb 14, 2007, at 12:55 PM, David Messina wrote:

>
> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:
>
>> I would also strongly encourage switching to using XML-based parsing,
>
> Unless anyone objects, I would be happy to update the HOWTO to
> suggest people make the switch and give an example of XML parsing.
>
> The Bio::SearchIO synopsis is already an XML example. However,
> there's no warning about text-based parsing nor a suggestion to use
> XML that I can see -- perhaps should be added?
>
> Dave

We should probably add something specifically for BLAST, yes.  Other  
text parsers should be fine.

Personally, I use XML or tabular output parsing simply b/c they are  
faster and do what I need.  I think we'll need to retain the  
capability for text-based BLAST parsing, but it will become extremely  
bloated long-term if we plan on continuing support for parsing all  
versions and flavors of BLAST, particularly if NCBI continues to  
change the output.

chris


From dmessina at wustl.edu  Wed Feb 14 15:46:31 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 14:46:31 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
	<C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu>

On Feb 14, 2007, at 2:12 PM, Chris Fields wrote:

> We should probably add something specifically for BLAST, yes.   
> Other text parsers should be fine.

Good point -- I'll make it clear it's only pertinent to BLAST.


> I think we'll need to retain the capability for text-based BLAST  
> parsing,

Agreed. Through the 1.6 release at least, I would think.


> particularly if NCBI continues to change the output.

Well, clearly the solution is not to use the NCBI flavor of BLAST. :)


Dave
(look at my email address)


From jay at jays.net  Thu Feb 15 08:08:56 2007
From: jay at jays.net (Jay Hannah)
Date: Thu, 15 Feb 2007 07:08:56 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.

Is this the same result you get?


DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
         Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
okay, 85.84%)
Failed Test Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
8 subtests skipped.


Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From bix at sendu.me.uk  Thu Feb 15 08:37:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 13:37:32 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
Message-ID: <45D4621C.6040309@sendu.me.uk>

Jay Hannah wrote:
> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>> getting sequences back from NCBI in the order we requested them in  
>> batch
>> mode.
> 
> Is this the same result you get?
> 
> 
> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
> okay, 85.84%)
> Failed Test Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
> 8 subtests skipped.

Yes, those fails are all caused by results in the wrong order (I believe).


From cjfields at uiuc.edu  Thu Feb 15 09:22:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:22:09 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <CF92D281-CAC2-415C-91A9-CBA0893336B9@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

I'm fixing those now so it doesn't depend on order and will commit in  
the next few minutes.

chris


From bix at sendu.me.uk  Thu Feb 15 09:37:00 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 14:37:00 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
Message-ID: <45D4700C.8020305@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
> 
>> Jay Hannah wrote:
>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>>> getting sequences back from NCBI in the order we requested them in
>>>> batch mode.
 >
> Okay, I committed a fix for that.  I hope there are many users who 
> depend on the returned sequence order for anything!

s/are/aren't/ ?

I suspect there might be, and its certainly a reasonable assumption to 
make. Did you not see an easy way of maintaining the order?


From cjfields at uiuc.edu  Thu Feb 15 09:28:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:28:46 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

Okay, I committed a fix for that.  I hope there are many users who  
depend on the returned sequence order for anything!

chris


From michael.watson at bbsrc.ac.uk  Thu Feb 15 09:44:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 15 Feb 2007 14:44:27 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

OK I have some great images out of this glyph, but I can't see the axis,
and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
publication.  The docs say:

"NOTE: -gc_window=>'auto' gives nice results and is recommended for
drawing GC content. The GC content axes draw slightly outside the
panel, so you may wish to add some extra padding on the right and
left. "

Any idea how to do this?

Basically, I want a nice GC graph with the axis quite clearly labelled,
and a nice "%GC" title next to it :)

Thanks

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From nehadnahar at yahoo.co.in  Thu Feb 15 10:28:42 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>
Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com>

Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine.

Regards,
Neha.

Jason Stajich <jason at bioperl.org> wrote: Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich  wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich  wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>      
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> 


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From cjfields at uiuc.edu  Thu Feb 15 10:44:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 09:44:23 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4700C.8020305@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>


On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>
>>> Jay Hannah wrote:
>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no  
>>>>> longer
>>>>> getting sequences back from NCBI in the order we requested them in
>>>>> batch mode.
>>
>> Okay, I committed a fix for that.  I hope there are many users who
>> depend on the returned sequence order for anything!
>
> s/are/aren't/ ?

Yes, my oops.

> I suspect there might be, and its certainly a reasonable assumption to
> make. Did you not see an easy way of maintaining the order?

I haven't looked (been busy the last few days), but I think there is  
a way via efetch.

We could add in something to the default base URL if there is  
something or (probably better) add a sort_order() method to designate  
a particular sort order, defaulting to the old order if not set.

chris


From lstein at cshl.edu  Thu Feb 15 13:53:13 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Feb 2007 13:53:13 -0500
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>

Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20);

This will leave enough room on the left and right for you to see the Y axis.
Otherwise it runs off the edge of the image (ok, this is a mis-design, but
it was the only way to solve a chicken-and-egg problem about who gets to say
how wide the panel is)

Lincoln

On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote:
>
> Hi
>
> OK I have some great images out of this glyph, but I can't see the axis,
> and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
> publication.  The docs say:
>
> "NOTE: -gc_window=>'auto' gives nice results and is recommended for
> drawing GC content. The GC content axes draw slightly outside the
> panel, so you may wish to add some extra padding on the right and
> left. "
>
> Any idea how to do this?
>
> Basically, I want a nice GC graph with the axis quite clearly labelled,
> and a nice "%GC" title next to it :)
>
> Thanks
>
> Mick
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful.
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute.
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From johnsonm at gmail.com  Thu Feb 15 14:24:08 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 13:24:08 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
Message-ID: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>

Done.  Bug opened in Bugzilla, diffs attached including new/updated tests:

http://bugzilla.open-bio.org/show_bug.cgi?id=2206

Can somebody grab that, take a look, tweak to taste, test and commit?  Tests
pass on my end presently.

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> You'll also want to update whatever relevant tests there are for
> Glimmer; looks like they are in GenPred.t.
>
> chris
>


From cjfields at uiuc.edu  Thu Feb 15 14:37:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:37:22 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
	<ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu>


On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote:

> Done.  Bug opened in Bugzilla, diffs attached including new/updated  
> tests:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2206
>
> Can somebody grab that, take a look, tweak to taste, test and  
> commit?  Tests
> pass on my end presently.
>
> On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> You'll also want to update whatever relevant tests there are for
>> Glimmer; looks like they are in GenPred.t.
>>
>> chris

Done; everything passed on this end as well, no tweaking necessary.   
If there are problems we'll definitely hear about it down the road  
(Glimmer is a popular tool), but I think you'll be fine.

Thanks Mark!

chris


From cjfields at uiuc.edu  Thu Feb 15 14:46:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:46:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
	<809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
Message-ID: <FA9F2E96-064B-4C8F-87BB-D72A7D6F6910@uiuc.edu>


On Feb 15, 2007, at 9:44 AM, Chris Fields wrote:

>
> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>>
>>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>>
>>>> Jay Hannah wrote:
>>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no
>>>>>> longer
>>>>>> getting sequences back from NCBI in the order we requested  
>>>>>> them in
>>>>>> batch mode.
>>>
>>> Okay, I committed a fix for that.  I hope there are many users who
>>> depend on the returned sequence order for anything!
>>
>> s/are/aren't/ ?
>
> Yes, my oops.
>
>> I suspect there might be, and its certainly a reasonable  
>> assumption to
>> make. Did you not see an easy way of maintaining the order?
>
> I haven't looked (been busy the last few days), but I think there is
> a way via efetch.
>
> We could add in something to the default base URL if there is
> something or (probably better) add a sort_order() method to designate
> a particular sort order, defaulting to the old order if not set.
>
> chris

Delving in to it further, the problem only occurs when using  
get_seq_stream() directly in batch mode, which is likely only used by  
developers for testing.  The sort issue only pops up when eposting  
IDs using that mode; retrieved seqs are returned in a different order  
than through a direct efetch query (the default with get_Stream* or  
get_Seq* methods).  No use of the 'sort' parameter works to get  
around that problem, not a complete surprise since it is supposed to  
only work for PubMed, but since the method is rarely used I'll just  
leave the bullet-proofed tests alone.

chris


From letondal at pasteur.fr  Thu Feb 15 15:23:55 2007
From: letondal at pasteur.fr (Catherine Letondal)
Date: Thu, 15 Feb 2007 21:23:55 +0100
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
Message-ID: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>

Hi bioperlers,

I have a script called protal2dna 
(http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see 
attachment #1) that realign DNA sequences giving their sequences + the 
corresponding protein alignment (sequences have to be in the same order 
or named equivalently). We have a parsing problem reported from the 
AlignIO class when users enter some clustalw file (see attachment #2 
for an example):

% protal2dna alig-protal2dna.dat dna-protal2dna.data
no alignment available in 'clustalw' format from file 
'alig-protal2dna.dat'
%

I have tried with bioperl 1.4. I have looked in the archive and in the 
BUGS, but found nothing?
Is there any bug fix for this? I also provide the DNA sequences file if 
you want to test.

Thanks a lot in advance,

--
Catherine Letondal -- Institut Pasteur
www.pasteur.fr/~letondal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: protal2dna
Type: application/octet-stream
Size: 11093 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0009.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: alig-protal2dna.dat
Type: application/octet-stream
Size: 12022 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0010.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dna-protal2dna.data
Type: application/octet-stream
Size: 7739 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0011.obj>

From Kevin.M.Brown at asu.edu  Thu Feb 15 16:38:25 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 15 Feb 2007 14:38:25 -0700
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
In-Reply-To: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
References: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu>

Did you try Bioperl 1.5.2 to see if updates to it might fix the issue?
IIRC 1.4 is nearly 2 years old now.  1.5.2 was released within the last
few months.

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Catherine Letondal
> Sent: Thursday, February 15, 2007 1:24 PM
> To: bioperl-l
> Cc: Catherine Letondal; Katja Schuerer
> Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
> 
> Hi bioperlers,
> 
> I have a script called protal2dna
> (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, 
> see attachment #1) that realign DNA sequences giving their 
> sequences + the corresponding protein alignment (sequences 
> have to be in the same order or named equivalently). We have 
> a parsing problem reported from the AlignIO class when users 
> enter some clustalw file (see attachment #2 for an example):
> 
> % protal2dna alig-protal2dna.dat dna-protal2dna.data no 
> alignment available in 'clustalw' format from file 
> 'alig-protal2dna.dat'
> %
> 
> I have tried with bioperl 1.4. I have looked in the archive 
> and in the BUGS, but found nothing?
> Is there any bug fix for this? I also provide the DNA 
> sequences file if you want to test.
> 
> Thanks a lot in advance,
> 
> --
> Catherine Letondal -- Institut Pasteur
> www.pasteur.fr/~letondal
> 
> 


From cjfields at uiuc.edu  Thu Feb 15 16:50:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:50:54 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
	<8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
Message-ID: <C53B465C-8BBA-4DE7-92BC-FFC5DDBEB4AA@uiuc.edu>


On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote:
...

>>
>> I don't have a problem with adding it back, esp. if tests are added.
>> Everything in Bio::Root* not tied to a module was yanked out when no
>> one spoke up about cleaning up Bio::Root* modules:
>>
>> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
>> focus=12839
>>
>> Maybe others disagree?
>>
>> chris
>>
>
> Sorry I missed out on that thread. I had some trouble with my  
> bioperl-l
> email delivery getting disabled due to excessive bounces, and it  
> took me a
> while to catch it.
>
> Bio::Root::Utilities is quite a grab bag of miscellaneous general  
> functions
> that are occasionally useful for perl scripting (e.g., determining
> end-of-line characters, sending email, etc.). The code could  
> definitely use
> a review, and maybe an example script to advertise it. I can look  
> into this,
> and suggestions are welcome.
>
> Steve

Steve,

I have added Root::Utilities back to CVS but I didn't know if I  
should add back the other related Root modules (didn't know what your  
future plans were for them).  Could the Bio::Root::Global and  
Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or  
would that be too problematic?  None of the other Bio* modules  
currently use them.

Personally, I use Date::Manip for anything that requires date/time  
manipulation (updating seq records based on dates, for instance).   
Some of the other utilities could come in handy, though.  Don't know  
if that helps...

chris


From cjfields at uiuc.edu  Thu Feb 15 16:51:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:51:58 -0600
Subject: [Bioperl-l] XEMBL deprecation
Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>

I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService  
both for deprecation in the wiki and in CVS (though I haven't set any  
timeline):

http://www.bioperl.org/wiki/Deprecated_modules

The XEMBL web services are no longer available, and it looks like  
everything is running through DBFetch now.  The XEMBL tests are  
skipped if no server is detected, so they shouldn't cause any  
problems with Bioperl installations.

Lincoln, was there anything to salvage from these?  I noticed they  
used SOAP::Lite, so maybe we could convert these over to a SOAP-based  
interface to DBFetch web services?

chris


From johnsonm at gmail.com  Thu Feb 15 17:29:37 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 16:29:37 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Glimmer?
Message-ID: <ebf5eb170702151429w233ec66dkfb89743a4b8e687e@mail.gmail.com>

    Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3
output, I suppose I might as well go and write Bio::Tools::Run::Glimmer.  I
suspect another 4-in-1 module may be possible.  Now that I think about it,
I'll need one for GeneMark, too.
    Comments?  Suggestions on a good module to use as a template?


From hlapp at gmx.net  Thu Feb 15 20:18:56 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:18:56 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>


On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:

> The XEMBL web services are no longer available

What happens if someone invokes the module? Should it maybe return  
nothing and warn()? I don't think it's a good idea if the module just  
silently does not function because its backend is no more.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Feb 15 20:48:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:48:12 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>

On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:

> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>
>> The XEMBL web services are no longer available
>
> What happens if someone invokes the module? Should it maybe return  
> nothing and warn()? I don't think it's a good idea if the module  
> just silently does not function because its backend is no more.
>
> 	-hilmar

Yes, I thought the same.  I have added a warn() noting the  
deprecation to the XEMBL constructor and removed XEMBL tests from  
CVS.  The modules are still there for the time being.

I actually worry more about the internals; it would be a shame to  
toss them altogether.  Would it be worth it to shift this towards a  
SOAP-based interface to DBFetch?  Or, more precisely, how much  
trouble would it be to do so?

chris


From hlapp at gmx.net  Thu Feb 15 20:54:29 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:54:29 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
Message-ID: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>

Well, if dbFetch dosn't have a SOAP based interface, how would you  
want to do this?

	-hilmar

On Feb 15, 2007, at 8:48 PM, Chris Fields wrote:

> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:
>
>> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>>
>>> The XEMBL web services are no longer available
>>
>> What happens if someone invokes the module? Should it maybe return  
>> nothing and warn()? I don't think it's a good idea if the module  
>> just silently does not function because its backend is no more.
>>
>> 	-hilmar
>
> Yes, I thought the same.  I have added a warn() noting the  
> deprecation to the XEMBL constructor and removed XEMBL tests from  
> CVS.  The modules are still there for the time being.
>
> I actually worry more about the internals; it would be a shame to  
> toss them altogether.  Would it be worth it to shift this towards a  
> SOAP-based interface to DBFetch?  Or, more precisely, how much  
> trouble would it be to do so?
>
> chris

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Feb 15 20:59:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:59:46 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
	<FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu>


On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote:

> Well, if dbFetch dosn't have a SOAP based interface, how would you  
> want to do this?
>
> 	-hilmar

DBfetch has a SOAP-based interface:

http://www.ebi.ac.uk/Tools/webservices/services/dbfetch

Just not sure how easy it would be to switch XEMBL code over to using  
it.  We already have Bio::DB::DBFetch so it may be redundant, but I  
don't recall any other SOAP-based tools in BioPerl beyond some stuff  
in bioperl-run (and I'm not sure how up-to-date the DBFetch module is).

chris


From jimhu at tamu.edu  Fri Feb 16 00:20:09 2007
From: jimhu at tamu.edu (Jim Hu)
Date: Thu, 15 Feb 2007 23:20:09 -0600
Subject: [Bioperl-l] Pathway tools output parser
In-Reply-To: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
References: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu>

Hi Chris,

I need to check the list more often!  I never got an answer here, but  
Eric Just pointed out a perl api at TAIR that's linked from the  
BioCyc site.  I've used the lisp parser functions from that to move  
the data to a perl array of arrays, and I'm working on creating  
object classes for BioCyc objects, starting with genes and products.

I need to look at the appropriate ways to link this up to the  
existing codebase for interconverting to Chado and other BioPerl data  
types.

Jim
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote:

>
> Hi Jim
>
> Did you ever get an answer to this? I'm interested in storing  
> pathway data
> in Chado & I remember enough lisp to get it into something perl- 
> manageable
> like XML
>
> On Thu, 25 Jan 2007, Jim Hu wrote:
>
>> Is there a module to parse the lisp object files from Peter Karp's
>> Pathway Tools?   I need a parser to convert the gene and protein
>> objects in EcoCyc releases into something that can be imported into
>> Chado.
>> =====================================
>> Jim Hu
>> Associate Professor
>> Dept. of Biochemistry and Biophysics
>> 2128 TAMU
>> Texas A&M Univ.
>> College Station, TX 77843-2128
>> 979-862-4054
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From lstein at cshl.edu  Fri Feb 16 08:35:19 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:35:19 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D1E2A5.6060104@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>

Hi,

Older versions of Storable can't deal with features that contain subroutine
refs. You should get the current version from CPAN. Note that there is a
slight security problem here if you don't trust the objects stored in the
database. If they contain code refs, the code will be evaluated during
deserialization.

Lincoln

On 2/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database
> and wanted to associated some basic information with them, like exon
> positions. I thought of creating Bio::SeqFeature::Gene::Transcript
> objects and storing them so I could later use features() to see what
> other features overlapped exons. I ran into a fatal error that can be
> replicated with the following simplified one-liner:
>
> perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e
> '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn =>
> "dbi:mysql:test"); $trans =
> Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id
> => "test"); $db->store($trans); @trans = $db->features(-seqid => $id,
> -type => "transcript"); print "@trans\n";'
>
> code sub {
>      package Bio::SeqFeature::Generic;
>      use strict 'refs';
>      my $self = shift @_;
>      foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
>          $f = undef;
>      }
>      $$self{'_gsf_seq'} = undef;
>      foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
>          $$self{'_gsf_tag_hash'}{$t} = undef;
>          delete $$self{'_gsf_tag_hash'}{$t};
>      }
> } did not evaluate to a subroutine reference, at
> /.../Bio/DB/SeqFeature/Store.pm line 2280
>
>
> Is this a bug? Or am I taking the wrong approach?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:47:29 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:47:29 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com>

Hi Sendu,

I'll do a little digging and let you know.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:52:30 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:52:30 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>

It looks like 2.05 or higher is the Storable version to use. It requires
B::Deparse, which is (I think) standard on perl 5.6 or higher.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:55:06 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:06 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>

I like the idea of converting these over to use DBFetch's SOAP services. On
the other hand, it isn't llikely that I'm going to have time to do this
anytime soon.

Probably the best thing to do is to issue a warning and return undef if
someone tries to use othe XEMBL module. I'll make that change.

Lincoln

On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 08:55:47 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:47 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>

Oh, looks like someone has inserted the warnings already. Good.

Lincoln

On 2/16/07, Lincoln Stein <lstein at cshl.edu> wrote:
>
> I like the idea of converting these over to use DBFetch's SOAP services.
> On the other hand, it isn't llikely that I'm going to have time to do this
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return undef if
> someone tries to use othe XEMBL module. I'll make that change.
>
> Lincoln
>
> On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> > both for deprecation in the wiki and in CVS (though I haven't set any
> > timeline):
> >
> > http://www.bioperl.org/wiki/Deprecated_modules
> >
> > The XEMBL web services are no longer available, and it looks like
> > everything is running through DBFetch now.  The XEMBL tests are
> > skipped if no server is detected, so they shouldn't cause any
> > problems with Bioperl installations.
> >
> > Lincoln, was there anything to salvage from these?  I noticed they
> > used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> > interface to DBFetch web services?
> >
> > chris
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Fri Feb 16 08:56:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:56:50 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>	
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>	
	<45D5B42A.1080303@sendu.me.uk>
	<6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
Message-ID: <45D5B822.6080908@sendu.me.uk>

Lincoln Stein wrote:
> It looks like 2.05 or higher is the Storable version to use. It requires 
> B::Deparse, which is (I think) standard on perl 5.6 or higher.

Thanks, now recommended in Build.PL


From cjfields at uiuc.edu  Fri Feb 16 09:05:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Feb 2007 08:05:08 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
	<6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
Message-ID: <ACAF9E26-CBDD-43AC-8D3E-0CADFF5B9576@uiuc.edu>

I added the warning yesterday.

We can add something to the project priority list on modifying XEMBL  
to use DBFetch instead; I like the SOAP-based interface.  I am  
thinking of a similar interface for NCBI eutils but I haven't had  
time to work on it.

chris

On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote:

> Oh, looks like someone has inserted the warnings already. Good.
>
> Lincoln
>
> On 2/16/07, Lincoln Stein <lstein at cshl.edu > wrote:I like the idea  
> of converting these over to use DBFetch's SOAP services. On the  
> other hand, it isn't llikely that I'm going to have time to do this  
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return  
> undef if someone tries to use othe XEMBL module. I'll make that  
> change.
>
> Lincoln
>
>
> On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone  
> ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Feb 16 08:39:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:39:54 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
Message-ID: <45D5B42A.1080303@sendu.me.uk>

Lincoln Stein wrote:
> Hi,
> 
> Older versions of Storable can't deal with features that contain 
> subroutine refs. You should get the current version from CPAN.

Do you have any idea which version of Storable first supported this? I 
can specify that version in Bioperl's Build.PL.

(else I just just specify the latest version)


From eu at otelo-online.de  Sat Feb 17 07:55:08 2007
From: eu at otelo-online.de (eu at otelo-online.de)
Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET)
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18>

Hello @all,

i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.

Can somebody help me? I dont know  whether it is  possible?
Because i need for each amino acid a positive, negative charge and unchargedly.

thx
 

Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig
und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer,
nur  44,85 ?  inkl. DSL- und ISDN-Grundgeb?hr!
http://www.arcor.de/rd/emf-dsl-2


From The_Polymorph at rocketmail.com  Sun Feb 18 14:08:34 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST)
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
Message-ID: <148421.50501.qm@web50801.mail.yahoo.com>

Hi.

In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
1.5.2_100, I noticed the ppm was not found on the activestate
repositories. 

Thanks,

~Caitlin


____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 


From bix at sendu.me.uk  Sun Feb 18 15:36:03 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 18 Feb 2007 20:36:03 +0000
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com>
References: <148421.50501.qm@web50801.mail.yahoo.com>
Message-ID: <45D8B8B3.4000408@sendu.me.uk>

Caitlin wrote:
> Hi.
> 
> In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
> 1.5.2_100, I noticed the ppm was not found on the activestate
> repositories. 

Follow the install instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Its not in the normal activestate repository, but on bioperl.org.


From t.nugent at cs.ucl.ac.uk  Mon Feb 19 12:29:48 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 19 Feb 2007 17:29:48 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk>

Hi everyone,

I've written a perl module to display transmembrane protein topology 
using GD. There are various options, including labels, helix/loop 
dimensions, colour schemes etc but it only requires a string or array 
containing the protein topology (e.g. transmembrane helix start/stop 
points). It produces output like this:

http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png

using the code at the bottom.

Here is a the module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm

I've never submitted anything to Bioperl before - is this sort of thing 
likely to be of use to others? I imagine it would sit alongside some of 
the Bio::Graphics stuff.

Best wishes,

Tim

#!/usr/bin/perl

use strict;
use warnings;
use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
use DrawTransmembrane;

my @topology = (20,45,59,70,86,109,145,168,194,220);

my %labels = ('5' => '5 - Sulphation Site',
               '21' => '1st Helix',
               '47' => '40 - Mutation',
               '60' => 'Voltage Sensor',
               '72' => '72 - Mutation 2',
               '73' => '73 - Mutation 3',
               '138' => '138 - Glycosylation Site',
               '170' => '170 - Phosphorylation Site',
               '200' => 'Last Helix');

my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
cartoon displaying transmembrane helices.',
                                                -topology => \@topology,
                                                -n_terminal => 'out',
                                                -helix_width => 48,
                                                -helix_height => 125,
                                                -short_loop_limit => 10,
                                                -long_loop_limit => 35,
                                                -loop_width => 25,
                                                -colour_scheme => 'yellow',
                                                -labels => \%labels,
                                                -text_offset => -10);

## print the .png file
my $output = 'test.png';
open(OUTPUT, ">$output");
binmode OUTPUT;
print OUTPUT $im->png;
close OUTPUT;

my $system = `display $output`;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From bix at sendu.me.uk  Mon Feb 19 12:42:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 19 Feb 2007 17:42:23 +0000
Subject: [Bioperl-l] t/FeatureHolder.x
Message-ID: <45D9E17F.4030302@sendu.me.uk>

Is this supposed to work? It doesn't get run in the test suite normally 
because of its name.

With a live checkout I get:
./Build test --test_files t/FeatureHolder.x --verbose
t/FeatureHolder....1..6
ok 1
ok 2
Set group tag to: locus_tag
GROUPS:
   GROUP [?]:source

[snip]

   resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) 
Bio::SeqFeature::Generic=HASH(0x1362830)
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [BG:DS07721.3]:gene mRNA CDS
UNFLATTENING GROUP:
   GROUP [BG:DS07721.6]:gene mRNA CDS

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: DUPLICATE ID: AAF53399.1
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359
STACK: 
Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175
STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245
STACK: t/FeatureHolder.x:68
-----------------------------------------------------------
dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay
Failed Test       Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/FeatureHolder.x  255 65280     6    8  3-6
Failed 1/1 test scripts. 4/6 subtests failed.
Files=1, Tests=6,  1 wallclock secs ( 0.55 cusr +  0.04 csys =  0.59 CPU)
Failed 1/1 test programs. 4/6 subtests failed.


It also fails quite differently with 1.5.2.


From cjfields at uiuc.edu  Mon Feb 19 15:04:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 14:04:20 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <45D9E17F.4030302@sendu.me.uk>
References: <45D9E17F.4030302@sendu.me.uk>
Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>

Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know  
if he's stalking the mail list.

Wonder if this has anything to do the feature/annotation changes  
around rel 1.5.

(the other) chris

On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:

> Is this supposed to work? It doesn't get run in the test suite  
> normally
> because of its name.
>
> With a live checkout I get:
> ./Build test --test_files t/FeatureHolder.x --verbose
> t/FeatureHolder....1..6
...


From cjfields at uiuc.edu  Mon Feb 19 16:24:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 15:24:04 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>

I think this is pretty nice!  We can add the code and test script to  
bugzilla and (if someone has time) try to see where it might fit in,  
though Bio::Graphics sounds like a good spot.

Anyone else have ideas on where this could go?

chris

On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've written a perl module to display transmembrane protein topology
> using GD. There are various options, including labels, helix/loop
> dimensions, colour schemes etc but it only requires a string or array
> containing the protein topology (e.g. transmembrane helix start/stop
> points). It produces output like this:
>
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>
> using the code at the bottom.
>
> Here is a the module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>
> I've never submitted anything to Bioperl before - is this sort of  
> thing
> likely to be of use to others? I imagine it would sit alongside  
> some of
> the Bio::Graphics stuff.
>
> Best wishes,
>
> Tim
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
> use DrawTransmembrane;
>
> my @topology = (20,45,59,70,86,109,145,168,194,220);
>
> my %labels = ('5' => '5 - Sulphation Site',
>                '21' => '1st Helix',
>                '47' => '40 - Mutation',
>                '60' => 'Voltage Sensor',
>                '72' => '72 - Mutation 2',
>                '73' => '73 - Mutation 3',
>                '138' => '138 - Glycosylation Site',
>                '170' => '170 - Phosphorylation Site',
>                '200' => 'Last Helix');
>
> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
> cartoon displaying transmembrane helices.',
>                                                 -topology =>  
> \@topology,
>                                                 -n_terminal => 'out',
>                                                 -helix_width => 48,
>                                                 -helix_height => 125,
>                                                 -short_loop_limit  
> => 10,
>                                                 -long_loop_limit =>  
> 35,
>                                                 -loop_width => 25,
>                                                 -colour_scheme =>  
> 'yellow',
>                                                 -labels => \%labels,
>                                                 -text_offset => -10);
>
> ## print the .png file
> my $output = 'test.png';
> open(OUTPUT, ">$output");
> binmode OUTPUT;
> print OUTPUT $im->png;
> close OUTPUT;
>
> my $system = `display $output`;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Feb 19 17:23:56 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 19 Feb 2007 14:23:56 -0800
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
Message-ID: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>


On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:

> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
> if he's stalking the mail list.

occasionally..

> Wonder if this has anything to do the feature/annotation changes
> around rel 1.5.

possibly even before then.

there was a reason for the .x prefix... I think it was intended to  
denote requirements; tests that don't pass yet but should in the future

anyway, this file can go

> (the other) chris
>
> On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:
>
>> Is this supposed to work? It doesn't get run in the test suite
>> normally
>> because of its name.
>>
>> With a live checkout I get:
>> ./Build test --test_files t/FeatureHolder.x --verbose
>> t/FeatureHolder....1..6
> ...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From torsten.seemann at infotech.monash.edu.au  Mon Feb 19 18:20:48 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Feb 2007 10:20:48 +1100
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18>
References: <29037001.1171716908969.JavaMail.ngmail@webmail18>
Message-ID: <a79f6a4b0702191520l55625d6dif027df04b9841587@mail.gmail.com>

> i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
> OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.
> Can somebody help me? I dont know  whether it is  possible?
> Because i need for each amino acid a positive, negative charge and unchargedly.

The latest released Bioperl 1.5.x has a charge() function which does
what you want:

http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html

It returns A, N, C for the charges.

--Torsten


From bix at sendu.me.uk  Tue Feb 20 06:18:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Feb 2007 11:18:14 +0000
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
Message-ID: <45DAD8F6.1030409@sendu.me.uk>

Bio::Graphics::FeatureBase::seq_id is currently implemented as a 
read-only alias to ref():
sub seq_id          { shift->ref()         }


What is the reasoning behind this? Can it be made to handle setting of 
the value as well?:
sub seq_id          { shift->ref(@_)       }


Cheers,
Sendu.


From cjfields at uiuc.edu  Tue Feb 20 08:39:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:39:11 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
	<F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu>


On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote:

> On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:
>
>> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
>> if he's stalking the mail list.
>
> occasionally..
>
>> Wonder if this has anything to do the feature/annotation changes
>> around rel 1.5.
>
> possibly even before then.
>
> there was a reason for the .x prefix... I think it was intended to
> denote requirements; tests that don't pass yet but should in the  
> future
>
> anyway, this file can go

Chris,

I removed it from CVS.  Thanks!

(the other) chris besides chris D.

P.S. I may have some Data::Stag questions for you at some point.  I'm  
guessing you're still at fruitfly.org?


From cjfields at uiuc.edu  Tue Feb 20 08:29:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:29:20 -0600
Subject: [Bioperl-l] Fwd: help on remote blast
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu>

Sanjib,

You shouldn't email the developers directly.  Questions like this  
should go to the bioperl mail list in case I (or others) can't answer  
them immediately.

chris

Begin forwarded message:

> From: "Sanjib Kumar Gupta" <sanjib at bic.boseinst.ernet.in>
> Date: February 20, 2007 1:32:00 AM CST
> To: cjfields at uiuc.edu
> Subject: help on remote blast
>
> Dear Dr. Chris
> I am very new usedr to bioperl. and have been using the script for
> retrieving some blast sequences . But suddenly it has stopped  
> retrieving
> #perl n9.pl
> te.pep
> waiting........
> for a long time
>
> I am attaching the file. Can you please tell me what I should do so  
> that it
> again runs.
>
>
> --
> Sanjib Kumar Gupta
> Bioinformatics Centre
> Bose Institute
> Kolkata 700054, INDIA
> Phone  : +91-33-2355 6626, 2816, 2355 4766
> Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070220/02f96eab/attachment-0003.pl>
-------------- next part --------------

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 09:31:20 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 14:31:20 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
Message-ID: <45DB0638.1030001@cs.ucl.ac.uk>

Thanks Chris, glad it's appreciated.

Is there anything else I can do? If anyone has any requests/suggestions 
please let me know too.

Best wishes,

Tim

Chris Fields wrote:
> I think this is pretty nice!  We can add the code and test script to  
> bugzilla and (if someone has time) try to see where it might fit in,  
> though Bio::Graphics sounds like a good spot.
> 
> Anyone else have ideas on where this could go?
> 
> chris
> 
> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
> 
>> Hi everyone,
>>
>> I've written a perl module to display transmembrane protein topology
>> using GD. There are various options, including labels, helix/loop
>> dimensions, colour schemes etc but it only requires a string or array
>> containing the protein topology (e.g. transmembrane helix start/stop
>> points). It produces output like this:
>>
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>
>> using the code at the bottom.
>>
>> Here is a the module:
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>
>> I've never submitted anything to Bioperl before - is this sort of  
>> thing
>> likely to be of use to others? I imagine it would sit alongside  
>> some of
>> the Bio::Graphics stuff.
>>
>> Best wishes,
>>
>> Tim
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
>> use DrawTransmembrane;
>>
>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>
>> my %labels = ('5' => '5 - Sulphation Site',
>>                '21' => '1st Helix',
>>                '47' => '40 - Mutation',
>>                '60' => 'Voltage Sensor',
>>                '72' => '72 - Mutation 2',
>>                '73' => '73 - Mutation 3',
>>                '138' => '138 - Glycosylation Site',
>>                '170' => '170 - Phosphorylation Site',
>>                '200' => 'Last Helix');
>>
>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>> cartoon displaying transmembrane helices.',
>>                                                 -topology =>  
>> \@topology,
>>                                                 -n_terminal => 'out',
>>                                                 -helix_width => 48,
>>                                                 -helix_height => 125,
>>                                                 -short_loop_limit  
>> => 10,
>>                                                 -long_loop_limit =>  
>> 35,
>>                                                 -loop_width => 25,
>>                                                 -colour_scheme =>  
>> 'yellow',
>>                                                 -labels => \%labels,
>>                                                 -text_offset => -10);
>>
>> ## print the .png file
>> my $output = 'test.png';
>> open(OUTPUT, ">$output");
>> binmode OUTPUT;
>> print OUTPUT $im->png;
>> close OUTPUT;
>>
>> my $system = `display $output`;
>>
>> -- 
>> Tim Nugent (MRes)
>> Research Student
>> Bioinformatics Unit
>> Department of Computer Science
>> University College London
>> Gower Street
>> London WC1E 6BT
>> Tel: 020-7679-0410
>> t.nugent at ucl.ac.uk
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From marian.thieme at lycos.de  Tue Feb 20 08:34:24 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Tue, 20 Feb 2007 13:34:24 +0000
Subject: [Bioperl-l] Alignment
Message-ID: <188661178021328@lycos-europe.com>

Hi all,

perhaps somebody can give some comments in the following matter:

I have a series of sequences which should be aligned against a reference sequence.
In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.

Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
If yes how I have to understand the example in the doc:
use Bio::LocatableSeq;
my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);

Does the "-" sign represents a gap ? When this sequence starts at position 1
why it ends at position 7, because when considering the gap, there are 8 positions.
Does the SimpleAlign object can treat the gap ?


Thanks for your attention,
Marian

Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe

From cjfields at uiuc.edu  Tue Feb 20 09:40:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 08:40:38 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <E1D718F1-E0FA-496B-9798-7EC84E2D4439@uiuc.edu>

You can add the module and test code (the script) to bugzilla:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

Basically file a new bug report but note that it in an enhancement  
request when filling it out.  Attach the code and test script to the  
report after it is generated (note that it may be easier to add all  
of the files together as a zipped archive).  I think you could also  
add the graphical output as a binary file if they are huge files.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions please let me know too.
>
> Best wishes,
>
> Tim
>
> Chris Fields wrote:
>> I think this is pretty nice!  We can add the code and test script  
>> to  bugzilla and (if someone has time) try to see where it might  
>> fit in,  though Bio::Graphics sounds like a good spot.
>> Anyone else have ideas on where this could go?
>> chris
>> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
>>> Hi everyone,
>>>
>>> I've written a perl module to display transmembrane protein topology
>>> using GD. There are various options, including labels, helix/loop
>>> dimensions, colour schemes etc but it only requires a string or  
>>> array
>>> containing the protein topology (e.g. transmembrane helix start/stop
>>> points). It produces output like this:
>>>
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>>
>>> using the code at the bottom.
>>>
>>> Here is a the module:
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>>
>>> I've never submitted anything to Bioperl before - is this sort  
>>> of  thing
>>> likely to be of use to others? I imagine it would sit alongside   
>>> some of
>>> the Bio::Graphics stuff.
>>>
>>> Best wishes,
>>>
>>> Tim
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to  
>>> module
>>> use DrawTransmembrane;
>>>
>>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>>
>>> my %labels = ('5' => '5 - Sulphation Site',
>>>                '21' => '1st Helix',
>>>                '47' => '40 - Mutation',
>>>                '60' => 'Voltage Sensor',
>>>                '72' => '72 - Mutation 2',
>>>                '73' => '73 - Mutation 3',
>>>                '138' => '138 - Glycosylation Site',
>>>                '170' => '170 - Phosphorylation Site',
>>>                '200' => 'Last Helix');
>>>
>>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>>> cartoon displaying transmembrane helices.',
>>>                                                 -topology =>   
>>> \@topology,
>>>                                                 -n_terminal =>  
>>> 'out',
>>>                                                 -helix_width => 48,
>>>                                                 -helix_height =>  
>>> 125,
>>>                                                 - 
>>> short_loop_limit  => 10,
>>>                                                 -long_loop_limit  
>>> =>  35,
>>>                                                 -loop_width => 25,
>>>                                                 -colour_scheme  
>>> =>  'yellow',
>>>                                                 -labels => \%labels,
>>>                                                 -text_offset =>  
>>> -10);
>>>
>>> ## print the .png file
>>> my $output = 'test.png';
>>> open(OUTPUT, ">$output");
>>> binmode OUTPUT;
>>> print OUTPUT $im->png;
>>> close OUTPUT;
>>>
>>> my $system = `display $output`;
>>>
>>> -- 
>>> Tim Nugent (MRes)
>>> Research Student
>>> Bioinformatics Unit
>>> Department of Computer Science
>>> University College London
>>> Gower Street
>>> London WC1E 6BT
>>> Tel: 020-7679-0410
>>> t.nugent at ucl.ac.uk
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Feb 20 10:30:17 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 20 Feb 2007 15:30:17 +0000
Subject: [Bioperl-l] Alignment
In-Reply-To: <188661178021328@lycos-europe.com>
References: <188661178021328@lycos-europe.com>
Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>

I think the SimpleAlign object contains a set of sequences, each of
which is a LocatableSeq object.

These LocatableSeq objects will have gaps, represented by '-' or
whatever other symbol is specified (I think there are methods for it),
and then one can use methods like column_from_residue_number to map
the coordinates between the primary sequence and the aligned sequence.
The perldoc for LocatableSeq has some examples on how to use these
methods.

[Hopefully I haven't written any lie in this message],

Cheers,

    Albert.

On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
> Hi all,
>
> perhaps somebody can give some comments in the following matter:
>
> I have a series of sequences which should be aligned against a reference sequence.
> In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
> The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.
>
> Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
> If yes how I have to understand the example in the doc:
> use Bio::LocatableSeq;
> my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);
>
> Does the "-" sign represents a gap ? When this sequence starts at position 1
> why it ends at position 7, because when considering the gap, there are 8 positions.
> Does the SimpleAlign object can treat the gap ?
>
>
> Thanks for your attention,
> Marian
>
> Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Feb 20 10:30:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:30:15 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>

Sorry, I sent that last one off prematurely.

I could see this being used as a very useful utility if a Bioperl  
object had SeqFeatures which described transmembrane regions, or if  
output from something like TMHMM were parsed and used for input.   
Don't know if it's included, but if not you probably should allow  
labeling of the intracellular/extracellular space to designate  
periplasmic space, mitochondrial matrix, thylakoid, etc.

I think Bio::Graphics namespace is definitely the place to go.  If I  
ever get around to writing up the RNA structural stuff I may put  
something there myself.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions
> please let me know too.
>
> Best wishes,
>
> Tim


From cjfields at uiuc.edu  Tue Feb 20 10:49:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:49:56 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu>


On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:

> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.
>
> These LocatableSeq objects will have gaps, represented by '-' or
> whatever other symbol is specified (I think there are methods for it),
> and then one can use methods like column_from_residue_number to map
> the coordinates between the primary sequence and the aligned sequence.
> The perldoc for LocatableSeq has some examples on how to use these
> methods.
>
> [Hopefully I haven't written any lie in this message],
>
> Cheers,
>
>     Albert.

No lies.  The comparison methods are in SimpleAlign; if you look in  
SimpleAlign.t you'll see several demos on how to go abouot adding  
LocatableSeqs to a SimpleAlign object and then use SimpleAlign  
methods for them.

chris

PS (to marian): I'm a bit behind this week, so the bracket_strings  
stuff is lagging behind; I'm writing up some stuff on a deadline.


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 10:50:10 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 15:50:10 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk>

Labeling of inside/outside and membrane is already possible via 
-inside_label, -outside_label and -membrane_label tags, defaults are 
intracellular, extracellular and plasma membrane.

Was definitely going to add an input/parser for MEMSAT, developed here 
at UCL, and probably a few other popular TM predictors too, e.g. 
PHOBIUS, TMHMM etc. Can already accept topology in the string format 
used by OPM (http://opm.phar.umich.edu/).

Tim


Chris Fields wrote:
> Sorry, I sent that last one off prematurely.
> 
> I could see this being used as a very useful utility if a Bioperl object 
> had SeqFeatures which described transmembrane regions, or if output from 
> something like TMHMM were parsed and used for input.  Don't know if it's 
> included, but if not you probably should allow labeling of the 
> intracellular/extracellular space to designate periplasmic space, 
> mitochondrial matrix, thylakoid, etc.
> 
> I think Bio::Graphics namespace is definitely the place to go.  If I 
> ever get around to writing up the RNA structural stuff I may put 
> something there myself.
> 
> chris
> 
> On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:
> 
>> Thanks Chris, glad it's appreciated.
>>
>> Is there anything else I can do? If anyone has any requests/suggestions
>> please let me know too.
>>
>> Best wishes,
>>
>> Tim
> 
> 

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From cjfields at uiuc.edu  Tue Feb 20 11:09:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 10:09:00 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
	<45DB18B2.8070004@cs.ucl.ac.uk>
Message-ID: <FF7B4076-FA5A-4F44-ADE7-A44D2FCF4599@uiuc.edu>


On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote:

> Labeling of inside/outside and membrane is already possible via - 
> inside_label, -outside_label and -membrane_label tags, defaults are  
> intracellular, extracellular and plasma membrane.
>
> Was definitely going to add an input/parser for MEMSAT, developed  
> here at UCL, and probably a few other popular TM predictors too,  
> e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string  
> format used by OPM (http://opm.phar.umich.edu/).
>
> Tim

I'll definitely have to take a closer look at it when I have time.   
My guess is the best fit for data would be a seqfeatures, either in a  
collection or a Bio::Seq.  As for the parsers you can look at the  
Bio::Tools::Tmhmm module, which scans Tmhmm output and converts  
everything to seqfeatures.

chris


From lstein at cshl.edu  Tue Feb 20 12:25:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 20 Feb 2007 12:25:24 -0500
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
In-Reply-To: <45DAD8F6.1030409@sendu.me.uk>
References: <45DAD8F6.1030409@sendu.me.uk>
Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com>

Just an oversight. I'll fix it.

Lincoln

On 2/20/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Bio::Graphics::FeatureBase::seq_id is currently implemented as a
> read-only alias to ref():
> sub seq_id          { shift->ref()         }
>
>
> What is the reasoning behind this? Can it be made to handle setting of
> the value as well?:
> sub seq_id          { shift->ref(@_)       }
>
>
> Cheers,
> Sendu.
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From khan at cshl.edu  Tue Feb 20 15:42:12 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Tue, 20 Feb 2007 15:42:12 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>

Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


From michael.watson at bbsrc.ac.uk  Tue Feb 20 16:33:19 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 20 Feb 2007 21:33:19 -0000
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
References: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk>

Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Wed Feb 21 03:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Wed Feb 21 03:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive


From sdavis2 at mail.nih.gov  Wed Feb 21 06:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1


From sdavis2 at mail.nih.gov  Wed Feb 21 06:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1


From cjfields at uiuc.edu  Wed Feb 21 07:08:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 06:08:57 -0600
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu>


On Feb 21, 2007, at 5:17 AM, Sean Davis wrote:

> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
>> Hi All,
>>
>> I downloaded module
>> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
>> And I am trying to install it when I got the following error. Can  
>> someone
>> please guide me.
>
> You will probably need to read the INSTALL document.  You need to  
> install a
> couple of libraries first.  Looks like you don't have the staden io- 
> lib
> installed.

Just to note, this module isn't part of BioPerl (I don't even think  
it has a Bioperl interface).  You'll probably need to contact Lincoln  
for details on using this module.

One thing you may run into is errors with the version of io_lib  
installed (a problem I've encountered with bioperl-ext), probably  
from API changes.  If you run into problems with newer versions of  
io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12.


From neetisomaiya at gmail.com  Wed Feb 21 07:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Wed Feb 21 07:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive


From jay at jays.net  Tue Feb 20 19:27:01 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 20 Feb 2007 18:27:01 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>

> On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
>> I have a series of sequences which should be aligned against a 
>> reference sequence.
>> In this special case we dont need to calculate anything, we only need 
>> to represent the sequences and get for instance some columns of 
>> interest.
>> The problem now is, that some sequences have gaps and we need to 
>> represent gaps in the rewference sequence as well as in some 
>> individual sequences.

On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.

Fascinating. In my BLAST-centric universe I went and rolled my own 
solution for SeqLab where I hold onto the Bio::Seq from the reference 
sequences and then hold onto the Bio::Search::HSP::GenericHSP objects 
for all my BLAST hits. From that dataset I can write whatever reports I 
want and/or perform any subsequent actions. I wonder if I should have 
done that differently...

What typically creates .pfam files?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From cjfields at uiuc.edu  Wed Feb 21 08:36:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 07:36:02 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
	<cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu>


On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote:
...
>
> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
>> I think the SimpleAlign object contains a set of sequences, each of
>> which is a LocatableSeq object.
>
> Fascinating. In my BLAST-centric universe I went and rolled my own
> solution for SeqLab where I hold onto the Bio::Seq from the reference
> sequences and then hold onto the Bio::Search::HSP::GenericHSP objects
> for all my BLAST hits. From that dataset I can write whatever  
> reports I
> want and/or perform any subsequent actions. I wonder if I should have
> done that differently...
>
> What typically creates .pfam files?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah

Pfam alignments come in two formats (pfam and stockholm) that can  
both be parsed into SimpleAlign objects via Bio::AlignIO:

my $alnin = Bio::AlignIO->new(-format => 'stockholm',
                               -file => 'dho.sto');

while (my $aln = $alnin->next_aln) {
    # do stuff to $aln SimpleAlign
}

Personally I stick with Stockholm as it's a richer format (with  
annotations and so on), but the parser was rewritten recently (by  
moi!) so may have some bugs still.

I'm a bit confused as to what you do with BLAST files.  You can  
generate a SimpleAlign right from the HSP for most SearchIO parsers:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods

chris


From sanjib at bic.boseinst.ernet.in  Wed Feb 21 01:12:06 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Wed, 21 Feb 2007 11:42:06 +0530
Subject: [Bioperl-l] help on remote blast
In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in>
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors


waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
xx.pep
 
-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded
 
DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
 
<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>
 
---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.
--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070221/5a3382d6/attachment-0003.pl>

From granjeau at tagc.univ-mrs.fr  Wed Feb 21 08:50:39 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 21 Feb 2007 14:50:39 +0100
Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily
Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr>

Hello!

Not clear to me, but I find a work around by checking for empty list 
before adding, here is what I noticed. Adding as members an empty list 
() is not the same as adding a reference to an empty list [], of course, 
but could be thought to be the same. Calling get_members, for the second 
case, I got a list of 0 member, but in the first case I got of 1 member, 
which is not an object at all. I am warned now, but may be the 
documentation should emphasize on using by the reference call.

Best regards,
--Samuel


use Bio::Cluster::SequenceFamily;

$f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$f->add_members( () );
print scalar $f->get_members();
# 1
$g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$g->add_members( [] );
print scalar $g->get_members();
# 0


From stephen.marshall at novartis.com  Wed Feb 21 12:01:00 2007
From: stephen.marshall at novartis.com (stephen.marshall at novartis.com)
Date: Wed, 21 Feb 2007 12:01:00 -0500
Subject: [Bioperl-l] Parsing kegg files
Message-ID: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>

Hello
I"m trying to parse a Kegg file and I can't seem to get at the pathway 
information... Here's a snippet of my code. I only see dblink and 
description as annotation

use Bio::SeqIO;

my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');

while ( my $seq = $stream->next_seq() ) {
        # do something with $seq
        my $id = $seq->display_id();
        print "$id:";
        my $ann = $seq->annotation();
        foreach my $key ( $ann->get_all_annotation_keys() ) {
                my @values = $ann->get_Annotations($key);
                foreach my $value ( @values ) {
                        print "Annotation: ",$key," value: 
",$value->as_text,"\n";
                }
        }

}
_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From prateek.vit at gmail.com  Wed Feb 21 12:40:25 2007
From: prateek.vit at gmail.com (prateek singh yadav)
Date: Wed, 21 Feb 2007 23:10:25 +0530
Subject: [Bioperl-l] Problem in BioPerl Installation
Message-ID: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>

Hello all,

I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
shows this problem.


[root at HX342SBC054 Desktop]# cpan
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.7601)
ReadLine support available (try 'install Bundle::CPAN')

cpan> get bioperl
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
Warning: Found only 25 objects in /root/.cpan/Metadata
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Line-Count header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Last-Updated header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Going to read /root/.cpan/sources/modules/03modlist.data.gz
Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
 at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
        CPAN::Index::rd_modlist('CPAN::Index',
'/root/.cpan/sources/modules/03modlist.data.gz') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 3129
        CPAN::Index::reload('CPAN::Index') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 675
        CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
        CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2078
        CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2157
        CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 201
        eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
        CPAN::shell() called at /usr/bin/cpan line 193

cpan>

Can anyone give me direction  how to configure cpan again or how to install
BioPerl on linux with its complete dependencies. Because I think I have a
problem in CPAN configuration.

Regards,
Prateek

-- 
Prateek Singh
3rd year Bioinformatics(BTech)
Vellore Institute Of Technology
Vellore-632014
prateek.vit at gmail.com


From bosborne11 at verizon.net  Wed Feb 21 12:29:40 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 21 Feb 2007 12:29:40 -0500
Subject: [Bioperl-l] Parsing kegg files
In-Reply-To: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>
Message-ID: <C201EBB4.CEE7%bosborne11@verizon.net>

Stephen,

I don't know what your eventual goals are but you might want to take a look
at bioperl-network. However, there are problems with this package. One, it
only parses DIP tab-delimited and PSI-MI and it does this last one only
partially (you will get the graph though). Two, it seems to have only a
single developer interested in it, that's me, and few users. In my Bioperl
experience projects like this tend to fade away.

http://www.bioperl.org/wiki/Network_package


Brian O.


On 2/21/07 12:01 PM, "stephen.marshall at novartis.com"
<stephen.marshall at novartis.com> wrote:

> Hello
> I"m trying to parse a Kegg file and I can't seem to get at the pathway
> information... Here's a snippet of my code. I only see dblink and
> description as annotation
> 
> use Bio::SeqIO;
> 
> my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');
> 
> while ( my $seq = $stream->next_seq() ) {
>         # do something with $seq
>         my $id = $seq->display_id();
>         print "$id:";
>         my $ann = $seq->annotation();
>         foreach my $key ( $ann->get_all_annotation_keys() ) {
>                 my @values = $ann->get_Annotations($key);
>                 foreach my $value ( @values ) {
>                         print "Annotation: ",$key," value:
> ",$value->as_text,"\n";
>                 }
>         }
> 
> }
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivery of the
> message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, please
> notify the sender immediately by e-mail and delete the material from any
> computer.  Thank you.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Feb 21 13:18:37 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 21 Feb 2007 12:18:37 -0600
Subject: [Bioperl-l] Problem in BioPerl Installation
In-Reply-To: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
References: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx>

You can always rebuild your CPAN configuration by deleting the existing 
.cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke 
CPAN again from root's shell to rebuild the config:

# perl -MCPAN -e shell

Hope this helps.

Regards,
Mauricio.

prateek singh yadav wrote:
> Hello all,
> 
> I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
> shows this problem.
> 
> 
> [root at HX342SBC054 Desktop]# cpan
> Terminal does not support AddHistory.
> 
> cpan shell -- CPAN exploration and modules installation (v1.7601)
> ReadLine support available (try 'install Bundle::CPAN')
> 
> cpan> get bioperl
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
> Warning: Found only 25 objects in /root/.cpan/Metadata
> Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Line-Count header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Last-Updated header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
> Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
> you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
>  at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
>         CPAN::Index::rd_modlist('CPAN::Index',
> '/root/.cpan/sources/modules/03modlist.data.gz') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 3129
>         CPAN::Index::reload('CPAN::Index') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 675
>         CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
> called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
>         CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2078
>         CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2157
>         CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         CPAN::shell() called at /usr/bin/cpan line 193
> 
> cpan>
> 
> Can anyone give me direction  how to configure cpan again or how to install
> BioPerl on linux with its complete dependencies. Because I think I have a
> problem in CPAN configuration.
> 
> Regards,
> Prateek
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Wed Feb 21 13:33:17 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Feb 2007 13:33:17 -0500
Subject: [Bioperl-l] Adding empty member list in
	Bio::Cluster::SequenceFamily
In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr>
References: <45DC4E2F.4060804@tagc.univ-mrs.fr>
Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net>

Fixed in CVS HEAD. -hilmar

On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> Not clear to me, but I find a work around by checking for empty list
> before adding, here is what I noticed. Adding as members an empty list
> () is not the same as adding a reference to an empty list [], of  
> course,
> but could be thought to be the same. Calling get_members, for the  
> second
> case, I got a list of 0 member, but in the first case I got of 1  
> member,
> which is not an object at all. I am warned now, but may be the
> documentation should emphasize on using by the reference call.
>
> Best regards,
> --Samuel
>
>
> use Bio::Cluster::SequenceFamily;
>
> $f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $f->add_members( () );
> print scalar $f->get_members();
> # 1
> $g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $g->add_members( [] );
> print scalar $g->get_members();
> # 0
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Feb 21 14:12:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 13:12:57 -0600
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>

Dmitry,

I'm forwarding this to the mail list.  In the future please post/ 
respond to the regular mail list so other BioPerl developers/users  
can comment.  You'll get feedback much faster here (and maybe even  
some support!).

The issue at hand is whether we can support GenBank accessions/ 
display_id/version with your naming scheme.  My feeling is that  
support for nonalphanumerics was removed to be compliant with the  
GenBank standard for accessions, though I may be wrong.  Maybe  
someone who was around during bioperl 1.2 can elaborate more?

 From http://bugzilla.open-bio.org/show_bug.cgi?id=2214
--------------------------------------------------
....
Thanks for verbose explanation. It seems that I would need to apply
my local patches to the BioPerl module(s). With BioPerl-1.2 there was
no problem with '-' in sequence names.

The problem is that in the project we participate (Vizier project)  
following
sequence name convention was adopted:

VZ##<virus_ICTV>-(<GenBank LOCUS ID>or<strain designation>)-<$$>

VZ Stands for Vizier

## Your 2-digits Partner ID within the VIZIER consortium

<virus_ICTV> Virus name according to the ICTV nomenclature;

<GenBank LOCUS ID>,
<strain designation> If sequence has not been assigned a GenBank  
LOCUS ID,
available strain designation, short as possible, should be used

<$$> Unique 2-digits number on your discretion to label sequence variant
--------------------------------------------------

chris


From gabriel.cardona at uib.es  Thu Feb 22 04:33:14 2007
From: gabriel.cardona at uib.es (gcardona)
Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST)
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
Message-ID: <9096740.post@talk.nabble.com>


Hello,

I am trying to install Bioperl on a Windows system, following the
installation notes in 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
find the package and answers:
Downloading bioperl-1.5.2_100 ... not found

I've looked the contents of
http://bioperl.org/DIST
and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
folder the available version is bioperl-1.5.2_102
Is this a bug? or should I download and install manually?

Thank you in advance,

Gabriel Cardona
-- 
View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Thu Feb 22 07:35:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Feb 2007 12:35:14 +0000
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
In-Reply-To: <9096740.post@talk.nabble.com>
References: <9096740.post@talk.nabble.com>
Message-ID: <45DD8E02.1070404@sendu.me.uk>

gcardona wrote:
> Hello,
> 
> I am trying to install Bioperl on a Windows system, following the
> installation notes in 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
> find the package and answers:
> Downloading bioperl-1.5.2_100 ... not found
> 
> I've looked the contents of
> http://bioperl.org/DIST
> and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
> folder the available version is bioperl-1.5.2_102
> Is this a bug? or should I download and install manually?

Sorry, my mistake. I accidentally moved the ppm to a different folder. 
It should work now though.

I may make a 1.5.2_102 ppm at some point, but there are no relevant 
differences between _102 and _100 as far as Windows users are concerned.


From enrique_rulz at yahoo.com  Thu Feb 22 15:41:37 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
Message-ID: <9107936.post@talk.nabble.com>


Hi every1..
I m facing a great deal of problem in simple pattern matching between
sequence & a pattern ..Program shod be designed such a way that it shod be
able do two things 1) normal matching...For eg: GATCAAT....if TC is
entered... output shod be 2...2) matching using spl character..In same
example if C*T value is entered It shod give o/p as 3 & seq to b displayed
is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
problem..output I m gettin as 1 instead of 3...Code is really simple!

#!/usr/bin/perl
$alphabet = "GATCAAT";
$pattern=  "C*T ";

$alphabet =~ /($pattern)/i;

print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";

====================
OUTPUT!
The entire C*T match began at 1 and ended at 2
====================

but the o/p shod be 3????
& Is there n e chance I can get seq too..I mean instead of C*T'' i need
'CAAT'...????

Well..Its not compulsion to use regex....But I find it quite simple..can
there be n e other method??

Thanx in advance!
Kurt!    
 
-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Feb 22 16:01:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Feb 2007 15:01:03 -0600
Subject: [Bioperl-l] GenBank accession bug?
In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>
	<51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu>


On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:

>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
>
> Chris, I'm a little unsure of what you're saying here (which might  
> mean
> that you're already saying what I'm about to...say). Do you mean it  
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
>
> I would argue any arbitrary ID should be supported as long as that  
> ID is a
> contiguous non-space word (\S+).
>
> Actually the existing accession regex looks like it already  
> supports IDs
> with '-':
>
> /^ACCESSION\s+(\S.*\S)/
>
> It's only the version regex which doesn't (\w doesn't include '-'):
>
> /^\w+\.(\d+)/
>
>
> Anyone else have thoughts or comments on this? Off the top of my  
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
>
> Dave

You're right; the argument comes down simply to whether we would  
support \S+ or just \w+.  I'm neutral on this myself, but I wonder  
how allowing \S+ would affect other modules (for instance, indexing  
for a flat db), where one might just use \w+ for accessions,  
expecting them to be GenBank- or EMBL-like alphanumerics.  The fact  
that \S+ was supported in the past (as indicated in the bug report)  
and then wasn't post 1.2 makes me think there was a reason for  
someone going in and modifying it, but that was before my time on the  
group.

I'll have a look at the CVS history when I have time to see what I  
can dig up.

chris


From mkiwala at watson.wustl.edu  Thu Feb 22 15:36:33 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 22 Feb 2007 14:36:33 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
Message-ID: <45DDFED1.1090503@watson.wustl.edu>

Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?

I get the impression they are designed to do similar things.  If so is 
one deprecated and the other preferred?

If their responsibilities are orthogonal to each other, what sorts of 
tasks are suited to each?

Thanks,
Michael


From dmessina at wustl.edu  Thu Feb 22 15:53:01 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST)
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu>

> The issue at hand is whether we can support GenBank accessions/
> display_id/version with your naming scheme.

Chris, I'm a little unsure of what you're saying here (which might mean
that you're already saying what I'm about to...say). Do you mean it might
be tricky to support both the Genbank standard and Dmitry's
simultaneously?

I would argue any arbitrary ID should be supported as long as that ID is a
contiguous non-space word (\S+).

Actually the existing accession regex looks like it already supports IDs
with '-':

/^ACCESSION\s+(\S.*\S)/

It's only the version regex which doesn't (\w doesn't include '-'):

/^\w+\.(\d+)/


Anyone else have thoughts or comments on this? Off the top of my head, I
can't think of any issues that might arise from doing so (apart from
having to modify all of the SeqIO modules to support it).

Dave


From heikki at sanbi.ac.za  Fri Feb 23 03:25:39 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 23 Feb 2007 10:25:39 +0200
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9107936.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>
Message-ID: <200702231025.39416.heikki@sanbi.ac.za>

Kurt,

There are  few things in your code to note:

- regexp /C*T/ matches any T preceded by zero or more Cs,
  not what you meant
- $- and $+ are among the "expensive" perl functions worth 
  not using unless you have to. Using them once in your 
  code slows execution down considerable. There is always 
  an other way.
- Keep in mind what you want to use the match positions for: 
  Human readable locations usually start counting with 1 but
  perl code uses 0 as the first location. The code below assumes
  you want to print the locations out.

Study my example code below.

Yours,
	-Heikki

###################################################################
#!/usr/bin/perl
$seq = "GATCAAT";
#$pattern=  'C*T';
$pattern=  'C.*T';

while ($seq =~ m/($pattern)/gi) {

    $match = $1;
    $end = pos($seq);
    $start = $end - length($match) +1;

    print "$match : $start - $end\n";
}

###################################################################


On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> Hi every1..
> I m facing a great deal of problem in simple pattern matching between
> sequence & a pattern ..Program shod be designed such a way that it shod be
> able do two things 1) normal matching...For eg: GATCAAT....if TC is
> entered... output shod be 2...2) matching using spl character..In same
> example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> problem..output I m gettin as 1 instead of 3...Code is really simple!
>
> #!/usr/bin/perl
> $alphabet = "GATCAAT";
> $pattern=  "C*T ";
>
> $alphabet =~ /($pattern)/i;
>
> print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
>
> ====================
> OUTPUT!
> The entire C*T match began at 1 and ended at 2
> ====================
>
> but the o/p shod be 3????
> & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> 'CAAT'...????
>
> Well..Its not compulsion to use regex....But I find it quite simple..can
> there be n e other method??
>
> Thanx in advance!
> Kurt!


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From avilella at gmail.com  Fri Feb 23 04:59:49 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Feb 2007 09:59:49 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>

now that we are at this pattern matching thread, I was wondering if
any perl guru could enlighten me on the issue of matching exact
sequence patterns on a gapped target sequence. E.g.:

my $seq = "CGATCAACGAATCGTACGTACTC";
my $gapped_seq =
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

and one would like to get as a result:

"CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"

which is the match of $seq but in $gapped_seq.

Cheers,

    Albert.


On 2/23/07, Heikki Lehvaslaiho <heikki at sanbi.ac.za> wrote:
> Kurt,
>
> There are  few things in your code to note:
>
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth
>   not using unless you have to. Using them once in your
>   code slows execution down considerable. There is always
>   an other way.
> - Keep in mind what you want to use the match positions for:
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
>
> Study my example code below.
>
> Yours,
>         -Heikki
>
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
>
> while ($seq =~ m/($pattern)/gi) {
>
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
>
>     print "$match : $start - $end\n";
> }
>
> ###################################################################
>
>
> On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> > Hi every1..
> > I m facing a great deal of problem in simple pattern matching between
> > sequence & a pattern ..Program shod be designed such a way that it shod be
> > able do two things 1) normal matching...For eg: GATCAAT....if TC is
> > entered... output shod be 2...2) matching using spl character..In same
> > example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> > problem..output I m gettin as 1 instead of 3...Code is really simple!
> >
> > #!/usr/bin/perl
> > $alphabet = "GATCAAT";
> > $pattern=  "C*T ";
> >
> > $alphabet =~ /($pattern)/i;
> >
> > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
> >
> > ====================
> > OUTPUT!
> > The entire C*T match began at 1 and ended at 2
> > ====================
> >
> > but the o/p shod be 3????
> > & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> > 'CAAT'...????
> >
> > Well..Its not compulsion to use regex....But I find it quite simple..can
> > there be n e other method??
> >
> > Thanx in advance!
> > Kurt!
>
>
>
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From js5 at sanger.ac.uk  Fri Feb 23 06:34:37 2007
From: js5 at sanger.ac.uk (James Smith)
Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>

On Fri, 23 Feb 2007, Albert Vilella wrote:

> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
>
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>
> and one would like to get as a result:
>
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>
> which is the match of $seq but in $gapped_seq.

Try...

 my $seq = "CGATCAACGAATCGTACGTACTC";
 my $gapped_seq =
   "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

 my $regexp = '('.join('-*?',split//,$seq).')';

 if( $gapped_seq =~ /$regexp/ ) {
   print "Match is $1\n";
 } else {
   print "No match\n";
 }

 (not sure on the efficiency if $seq is long tho')
James

>
> Cheers,


From khoueiry at ibdm.univ-mrs.fr  Fri Feb 23 08:09:33 2007
From: khoueiry at ibdm.univ-mrs.fr (pierre)
Date: Fri, 23 Feb 2007 14:09:33 +0100
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <1172236173.4309.6.camel@ciona-pierre>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/0e08ebe6/attachment-0001.pl>

From neetisomaiya at gmail.com  Fri Feb 23 07:27:28 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 23 Feb 2007 17:57:28 +0530
Subject: [Bioperl-l] need help urgently - needle output parsing
Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com>

Hi,

I am using needle alignment tool (standalone, on a linux machine), and then
I am using Bioperl to parse the output.
All data - sequence files and alignment outputs are attached with this mail.

I have 2 small sequences :- 693.seq and revcomp693.seq
I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and
80768-4291-5639.84809_84810_84810_1.scf.seq
All these are in fasta format

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84809_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 97
2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 91

All this is correct.

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84810_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is correct)
2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is incorrect, correct position is 330)


Part of my code is as follows :-
---------------------------------------------

# running needle
`$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen
10.0-gapextend
0.5 $output`;

# parsing needle output
my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output);
my $aln = $str->next_aln();
my $pos = $aln->column_from_residue_number('original',1);

$logger->info("Alignment pos is $pos");

####################################

 # running needle
`$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen
10.0 -gapextend 0.5 $comp_output`;

# parsing needle output
my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output);
my $comp_aln = $comp_str->next_aln();
my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1);

$logger->info("Alignment pos is $comp_pos");


Can someone please tell me what is going wrong here?


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data.zip
Type: application/zip
Size: 4456 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/21658b7d/attachment-0003.zip>

From bix at sendu.me.uk  Fri Feb 23 08:55:24 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Feb 2007 13:55:24 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
	<Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
Message-ID: <45DEF24C.1010303@sendu.me.uk>

James Smith wrote:
> On Fri, 23 Feb 2007, Albert Vilella wrote:
> 
>> now that we are at this pattern matching thread, I was wondering if
>> any perl guru could enlighten me on the issue of matching exact
>> sequence patterns on a gapped target sequence. E.g.:
>>
>> my $seq = "CGATCAACGAATCGTACGTACTC";
>> my $gapped_seq =
>> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>>
>> and one would like to get as a result:
>>
>> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>>
>> which is the match of $seq but in $gapped_seq.
> 
> Try...
> 
>  my $seq = "CGATCAACGAATCGTACGTACTC";
>  my $gapped_seq =
>    "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
> 
>  my $regexp = '('.join('-*?',split//,$seq).')';
> 
>  if( $gapped_seq =~ /$regexp/ ) {
>    print "Match is $1\n";
>  } else {
>    print "No match\n";
>  }

That's great stuff. If you were matching thousands of different $seq 
against the same very large $gapped_seq, and only needed the first match 
of $seq in $gapped_seq, the alternative to the above approach (remove 
the gaps from $gapped_seq and do index() matching) will be faster.

Here's one (overly long-winded) way of implementing it, that I found to 
take ~2s vs ~22s for the above regex approach when doing the job on 
999999 copies of $seq:

#!/usr/bin/perl -w
use strict;
use warnings;

my $gapped_seq = 
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

# note the total gap-length at position in gapless 0-based coords
my @gap_lengths;
my $gap_length = 0;
while ($gapped_seq =~ /(-+)/g) {
   my $match = $1;
   my $prev_length = $gap_length;
   $gap_length += length($match);
   my $end = pos($gapped_seq) - $gap_length - 1;
   push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths);
}
push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - 
@gap_lengths - $gap_length));

# remove the gaps
my $gapless_seq = $gapped_seq;
$gapless_seq =~ s/-//g;

# now for each of thousands of seqs...
my $seq = 'CGATCAACGAATCGTACGTACTC';
my @seqs;
for (1..999999) {
   push(@seqs, $seq);
}
foreach my $seq (@seqs) {
   my $start = index($gapless_seq, $seq);
   if ($start == -1) {
     print "No match found for seq '$seq'\n";
     next;
   }
   my $end = $start + length($seq) - 1;

   # calculate the coords in $gapped_seq
   $start = $start + $gap_lengths[$start];
   $end = $end + $gap_lengths[$end];

   my $result = substr($gapped_seq, $start, ($end - $start + 1));
   #print $result, "\n";
}

exit;


From MEC at stowers-institute.org  Fri Feb 23 10:54:57 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 09:54:57 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with
	multiple values
In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>

Lincoln, and other Bio::DB::SeqFeature wanderers:

I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
does not respect the following:
 
"Multiple attributes of the same type are indicated by separating the
values with the comma "," character"  (c.f.
http://www.sequenceontology.org/gff3.shtml)
 
This one-liner demonstrates the problem:
 
perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
"J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
-name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
J	A	PH	1	2	.	.	.
foo=bar;foo=blat;Name=mec

Do you agree this is a problem? 
 
The fix is in the post-sig patch to
/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
stylistic privilege of promoting any ID, Parent, or Name attribute to
the front of column 9, so output is now:

J	A	PH	1	2	.	.	.
Name=mec;foo=bar,blat

Do you agree this is better?

I am poised to commit it, as well as the functionally same patch to the
equivilent function in Bio/Graphics/FeatureBase.pm

All clear?

-- Malcolm Cook

  
*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,498 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     
!      push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     #push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
 

From MEC at stowers-institute.org  Fri Feb 23 12:08:11 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 11:08:11 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	withmultiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F509@exchkc02.stowers-institute.org>

Oy,

I hit send too soon.  The patch I send had my new attribute encoder
commented out.  It should've been: 


*** NormalizedFeature.pm	2 Feb 2007 21:05:42 -0000	1.25
--- NormalizedFeature.pm	23 Feb 2007 17:06:37 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,497 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     # push @result,join '=',$self->escape($t),$self->escape($_)
foreach @values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  

Malcolm


From lstein at cshl.edu  Fri Feb 23 12:16:01 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 12:16:01 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>

Hi Malcom,

You're quite right, and I appreciate your work in tracking down and fixing
it. Before you commit the patch, can you confirm that the loader is working
correctly so that comma-separated values are read back into the data
structure as multiple attributes?

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, and other Bio::DB::SeqFeature wanderers:
>
> I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> does not respect the following:
>
> "Multiple attributes of the same type are indicated by separating the
> values with the comma "," character"  (c.f.
> http://www.sequenceontology.org/gff3.shtml)
>
> This one-liner demonstrates the problem:
>
> perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> J       A       PH      1       2       .       .       .
> foo=bar;foo=blat;Name=mec
>
> Do you agree this is a problem?
>
> The fix is in the post-sig patch to
> /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> stylistic privilege of promoting any ID, Parent, or Name attribute to
> the front of column 9, so output is now:
>
> J       A       PH      1       2       .       .       .
> Name=mec;foo=bar,blat
>
> Do you agree this is better?
>
> I am poised to commit it, as well as the functionally same patch to the
> equivilent function in Bio/Graphics/FeatureBase.pm
>
> All clear?
>
> -- Malcolm Cook
>
>
>
> *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> ***************
> *** 481,494 ****
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   push @result,"ID=".$self->escape($id)                     if defined
> $id;
> !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> $parent;
> !   push @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
> --- 481,498 ----
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> !     # NO! Multiple attributes of the same type are indicated by
> !     # separating the values with the comma "," character - per
> !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> !     #push @result,join '=',$self->escape($t),join(',', map
> {$self->escape($_)} @values);
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   unshift @result,"ID=".$self->escape($id)                     if
> defined $id;
> !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> defined $parent;
> !   unshift @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From aaron.j.mackey at gsk.com  Fri Feb 23 09:36:18 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 23 Feb 2007 09:36:18 -0500
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <45DDFED1.1090503@watson.wustl.edu>
Message-ID: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>

The fundamental difference (in my mind) between a feature and an 
annotation, is that a feature has a location/range, and thus the 
information represented in the feature is applicable only to that 
location/range.  An annotation, on the other hand, is "global", or at 
least non-localizable (note: a feature with a "fuzzy" location of 
"somewhere along this sequence, but I'm not sure where" is still not 
global - if you did/could know the location, you'd describe it as a 
feature, so it shouldn't be represented with an annotation).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:

> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
> 
> I get the impression they are designed to do similar things.  If so is 
> one deprecated and the other preferred?
> 
> If their responsibilities are orthogonal to each other, what sorts of 
> tasks are suited to each?
> 
> Thanks,
> Michael
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From MEC at stowers-institute.org  Fri Feb 23 13:46:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 12:46:00 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>

Lincoln,
 
OK.  I'll do that...
 
...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... 
 
...ok - parse_attributes _looks_ right to me
 
...so, let's try it
 
#load a feature into a new database:
 
bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
-create -user test -pass test <(echo -e
"J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n")
 
#It loaded ok.  Now, let's print it out in GFF3:
 
perl -MBio::DB::SeqFeature::Store -e 'foreach
(Bio::DB::SeqFeature::Store->new(-dsn =>
"dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu
res(-type => "PH:A")) {print $_->gff3_string . "\n"}'
J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat

#output looks good to me

Note, I tried loading attributes foo=bar;foo=blat and it came back
foo=bar,blat.  So, you can load either way.

I'll commit later today.

--Malcolm  

 
________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, February 23, 2007 11:16 AM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes
with multiple values
	
	
	Hi Malcom,
	
	You're quite right, and I appreciate your work in tracking down
and fixing it. Before you commit the patch, can you confirm that the
loader is working correctly so that comma-separated values are read back
into the data structure as multiple attributes? 
	
	Lincoln
	
	
	On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, and other Bio::DB::SeqFeature wanderers:
		
		I find that generating GFF from a Bio::DB::SeqFeature
using gff3_string
		does not respect the following:
		
		"Multiple attributes of the same type are indicated by
separating the 
		values with the comma "," character"  (c.f.
		http://www.sequenceontology.org/gff3.shtml)
		
		This one-liner demonstrates the problem:
		
		perl -MBio::DB::SeqFeature -e 'print
Bio::DB::SeqFeature->new(-seq_id =>
		"J", -start => 1, -end => 2, -primary_tag => 'PH',
-source => 'A',
		-name => 'mec', -attributes => {foo =>  [qw(bar
blat)]})->gff3_string' 
		J       A       PH      1       2       .       .
.
		foo=bar;foo=blat;Name=mec
		
		Do you agree this is a problem?
		
		The fix is in the post-sig patch to
		/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also
took the 
		stylistic privilege of promoting any ID, Parent, or Name
attribute to
		the front of column 9, so output is now:
		
		J       A       PH      1       2       .       .
.
		Name=mec;foo=bar,blat
		
		Do you agree this is better? 
		
		I am poised to commit it, as well as the functionally
same patch to the
		equivilent function in Bio/Graphics/FeatureBase.pm
		
		All clear?
		
		-- Malcolm Cook
		
		
		*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
		--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
		***************
		*** 481,494 ****
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace 
		!
		!     push @result,join
'=',$self->escape($t),$self->escape($_) foreach
		@values;
		    }
		    my $id   = $self->primary_id;
		    my $name = $self->display_name;
		!   push @result,"ID=".$self->escape($id)
if defined 
		$id;
		!   push
@result,"Parent=".$self->escape($parent->primary_id) if defined
		$parent;
		!   push @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result; 
		  }
		
		--- 481,498 ----
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
		!
		!      push @result,join
'=',$self->escape($t),$self->escape($_) foreach 
		@values;
		!     # NO! Multiple attributes of the same type are
indicated by
		!     # separating the values with the comma ","
character - per
		!     # http://www.sequenceontology.org/gff3.shtml.  Do
it this way:
		!     #push @result,join '=',$self->escape($t),join(',',
map
		{$self->escape($_)} @values);
		    }
		    my $id   = $self->primary_id; 
		    my $name = $self->display_name;
		!   unshift @result,"ID=".$self->escape($id)
if
		defined $id;
		!   unshift
@result,"Parent=".$self->escape($parent->primary_id) if 
		defined $parent;
		!   unshift @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result;
		  }
		
		
	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Fri Feb 23 13:49:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Feb 2007 12:49:44 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
References: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
Message-ID: <FEDC420E-AE3A-4AD4-A30B-54F8DF904D84@uiuc.edu>

To add to that, there's a HOWTO describing the differences:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

I agree w/ Aaron; if it has a location it's a feature,  otherwise  
it's an annotation.

chris

On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote:

> The fundamental difference (in my mind) between a feature and an
> annotation, is that a feature has a location/range, and thus the
> information represented in the feature is applicable only to that
> location/range.  An annotation, on the other hand, is "global", or at
> least non-localizable (note: a feature with a "fuzzy" location of
> "somewhere along this sequence, but I'm not sure where" is still not
> global - if you did/could know the location, you'd describe it as a
> feature, so it shouldn't be represented with an annotation).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:
>
>> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
>>
>> I get the impression they are designed to do similar things.  If  
>> so is
>> one deprecated and the other preferred?
>>
>> If their responsibilities are orthogonal to each other, what sorts of
>> tasks are suited to each?
>>
>> Thanks,
>> Michael
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri Feb 23 16:20:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 16:20:26 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com>

Excellent!

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
>  Lincoln,
>
> OK.  I'll do that...
>
> ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ ....
>
> ...ok - parse_attributes _looks_ right to me
>
> ...so, let's try it
>
> #load a feature into a new database:
>
> bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
> -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,
> blat;Name=mec\n")
>
> #It loaded ok.  Now, let's print it out in GFF3:
>
> perl -MBio::DB::SeqFeature::Store -e 'foreach
> (Bio::DB::SeqFeature::Store->new(-dsn =>
> "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type
> => "PH:A")) {print $_->gff3_string . "\n"}'
> J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat
>
> #output looks good to me
>
> Note, I tried loading attributes foo=bar;foo=blat and it came back
> foo=bar,blat.  So, you can load either way.
>
> I'll commit later today.
>
> --Malcolm
>
>
>  ------------------------------
> *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On
> Behalf Of *Lincoln Stein
> *Sent:* Friday, February 23, 2007 11:16 AM
> *To:* Cook, Malcolm
> *Cc:* bioperl list; lstein at cshl.org
> *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with
> multiple values
>
> Hi Malcom,
>
> You're quite right, and I appreciate your work in tracking down and fixing
> it. Before you commit the patch, can you confirm that the loader is working
> correctly so that comma-separated values are read back into the data
> structure as multiple attributes?
>
> Lincoln
>
> On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
> >
> > Lincoln, and other Bio::DB::SeqFeature wanderers:
> >
> > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> > does not respect the following:
> >
> > "Multiple attributes of the same type are indicated by separating the
> > values with the comma "," character"  (c.f.
> > http://www.sequenceontology.org/gff3.shtml)
> >
> > This one-liner demonstrates the problem:
> >
> > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> > -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> > J       A       PH      1       2       .       .       .
> > foo=bar;foo=blat;Name=mec
> >
> > Do you agree this is a problem?
> >
> > The fix is in the post-sig patch to
> > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> > stylistic privilege of promoting any ID, Parent, or Name attribute to
> > the front of column 9, so output is now:
> >
> > J       A       PH      1       2       .       .       .
> > Name=mec;foo=bar,blat
> >
> > Do you agree this is better?
> >
> > I am poised to commit it, as well as the functionally same patch to the
> > equivilent function in Bio/Graphics/FeatureBase.pm
> >
> > All clear?
> >
> > -- Malcolm Cook
> >
> >
> >
> > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> > ***************
> > *** 481,494 ****
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> > @values;
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   push @result,"ID=".$self->escape($id)                     if defined
> >
> > $id;
> > !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> > $parent;
> > !   push @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> > --- 481,498 ----
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> >
> > @values;
> > !     # NO! Multiple attributes of the same type are indicated by
> > !     # separating the values with the comma "," character - per
> > !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> > !     #push @result,join '=',$self->escape($t),join(',', map
> > {$self->escape($_)} @values);
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   unshift @result,"ID=".$self->escape($id)                     if
> > defined $id;
> > !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> > defined $parent;
> > !   unshift @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From enrique_rulz at yahoo.com  Sat Feb 24 16:23:59 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <9137941.post@talk.nabble.com>


Heikki Lehvaslaiho wrote:
> 
> Kurt,
> 
> There are  few things in your code to note:
> 
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth 
>   not using unless you have to. Using them once in your 
>   code slows execution down considerable. There is always 
>   an other way.
> - Keep in mind what you want to use the match positions for: 
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
> 
> Study my example code below.
> 
> Yours,
> 	-Heikki
> 
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
> 
> while ($seq =~ m/($pattern)/gi) {
> 
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
> 
>     print "$match : $start - $end\n";
> }
> 
> ###################################################################
> 
> 


Thanx for the instant reply!...Sorry cudn reply earlier..

Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
& 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
the code which I need to write says T*A shod be only the input not T.*A..So
Can we use replacment reg ex...sumthing like 
$pattern =~  s/.*/*/...or sumthing else...
But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

N e ways thanx a lot again for the code...Hope to listen frm you soon!

Kurt!


-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biology0046 at hotmail.com  Sat Feb 24 23:14:51 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 04:14:51 +0000
Subject: [Bioperl-l] how to change align output format
Message-ID: <BAY109-F2409DB6CAA116F289F8F17B48C0@phx.gbl>

Dear all:

I have problems in changing the output format of clustal alignment.
I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an 
mulitple sequences alignment, then i use the Bio::AlignIO module to write 
out the alignment. Scripts like this:
my 
$aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw');

The output :
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dere_GLEANR_9270       
..............S.............................................
FBgn0000097            
..............S.............................................
dsec_GLEANR_671        
..............S.............................................
dsim_GLEANR_6613       
..............S.............................................
dyak_GLEANR_1669       
..............S.............................................
                                     .


dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dere_GLEANR_9270       
............................................................
FBgn0000097            
............................................................
dsec_GLEANR_671        
............................................................
dsim_GLEANR_6613       
............................................................
dyak_GLEANR_1669       
............................................................

But , I want to change the output format as below, which do not change the 
identical residues into "." character. 
dere_GLEANR_9270       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dyak_GLEANR_1669       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsec_GLEANR_671        
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsim_GLEANR_6613       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
FBgn0000097            
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
                       
**************.*********************************************

dere_GLEANR_9270       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dyak_GLEANR_1669       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsec_GLEANR_671        
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsim_GLEANR_6613       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
FBgn0000097            
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
                       
************************************************************

Are their any parameters in the package that can be changed so that i can 
get the postier output format? Thank you Sincerely!

Jiang

_________________________________________________________________
?????????????????????????????? MSN Hotmail??  http://www.hotmail.com  


From bix at sendu.me.uk  Sun Feb 25 05:53:48 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:53:48 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
Message-ID: <45E16ABC.3060405@sendu.me.uk>

Tels,

I've forwarded this to the author of the module, Nat Goodman, and to the 
Bioperl mailing list 
(http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list).

But actually we have Bio::Graph::* as tentatively deprecated:
http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules
so any further work on it doesn't seem worthwhile.


-------- Original Message --------
Subject: Bio::Graph::SimpleGraph
Date: Sat, 24 Feb 2007 12:07:31 +0100
From: Tels <nospam-abuse at bloodgate.com>

Moin,

I just stumble dover Bio::Graph::SimpleGraph and read this comment:

"This is a simple, hopefully fast undirected graph package. The only reason
this exists is that the standard CPAN Graph pacakge, Graph::Base, is
seriously broken."

Really sad to see people always reinventing the wheel :/

Anyway, I wonder if you would like to make your module support Graph::Easy
(http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit
patches and do testing/documention for that.

All the best,

Tels


From bix at sendu.me.uk  Sun Feb 25 05:45:21 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:45:21 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9137941.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>
	<9137941.post@talk.nabble.com>
Message-ID: <45E168C1.80306@sendu.me.uk>

Kurt Gobain wrote:
> Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
> If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
> o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
> & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
> the code which I need to write says T*A shod be only the input not T.*A..So
> Can we use replacment reg ex...sumthing like 
> $pattern =~  s/.*/*/...or sumthing else...
> But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

These aren't Bioperl questions. For regular expression help see:
http://perldoc.perl.org/perlretut.html

Basically, you want a non-greedy match, so T.*?A

You can convert T*A by doing s/\*/.*?/

Here are some more regexs for you:
s/sum/some/g
s/frm/from/g
s/n e/any/g
etc...


From biology0046 at hotmail.com  Sun Feb 25 08:28:34 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 13:28:34 +0000
Subject: [Bioperl-l] AlignIO problems
Message-ID: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>

hi, all,
I use the AlignIO module to convert the alignment file.
my original file is :
CLUSTAL W(1.81) multiple sequence alignment


dana_GLEANR_11249      
MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
dere_GLEANR_7213       
...V...................I....................................
dgri_GLEANR_6962       
.......................I....................................
FBgn0004638            
.......................I....................................
dmoj_GLEANR_6118       
...........N...........I....................................
dper_GLEANR_18885      
...V...................I....................................
dpse_GLEANR_14384      
...V...................I....................................
dsec_GLEANR_3096       
.................N.....I....................................
dsim_GLEANR_9744       
-----------------------------...............................
dvir_GLEANR_4811       
.......................I....................................
dwil_GLEANR_10869      
.......................I....................................
dyak_GLEANR_13576      
.......................I....................................


dana_GLEANR_11249      
YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
.................L..........................................
dper_GLEANR_18885      
............................................................
dpse_GLEANR_14384      
............................................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      
VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
..............................V.D...........................
dper_GLEANR_18885      
.......................E....................................
dpse_GLEANR_14384      
.......................E....................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
dere_GLEANR_7213       ...............................
dgri_GLEANR_6962       ...............................
FBgn0004638            ...............................
dmoj_GLEANR_6118       ............Q..................
dper_GLEANR_18885      ...............................
dpse_GLEANR_14384      ...............................
dsec_GLEANR_3096       ...............................
dsim_GLEANR_9744       ...............................
dvir_GLEANR_4811       ...............................
dwil_GLEANR_10869      ...............................
dyak_GLEANR_13576      ...............................


I want to change those "." characters back to alphabetic expression, then i 
write the code like this:
use Bio::AlignIO;
my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
                      -format => 'clustalw');
my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
                       -format =>'clustalw');
while (my $aln=$in->next_aln() ){
    $aln->unmatch();
    $aln->set_displayname_flat();
    $out->write_aln($aln);
}

but when i run the code, there are error message like:

-------------------- WARNING ---------------------
MSG: Got a sequence with no letters in it cannot guess alphabet []
---------------------------------------------------

------------- EXCEPTION  -------------
MSG: No sequence with name [dsim_GLEANR_9744/1-182]
STACK Bio::SimpleAlign::displayname 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307
STACK Bio::SimpleAlign::set_displayname_flat 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374
STACK toplevel aligntest.pl:11

--------------------------------------

I don't know where is the problem.

Jiang

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From cjfields at uiuc.edu  Sun Feb 25 14:58:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Feb 2007 13:58:23 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
References: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu>

Bio::AlignIO::clustalw doesn't work with masked sequences; it parses  
the output quite literally as is, so any [.-] are treated as gaps.   
If the seqs are 100% identical then you will have a seq with 100%  
gaps and no sequence, thus giving you the warnings you see.

The best way to accomplish what you want is to not mask the sequence  
alignment to begin with when running clustalw/muscle/whatever.   
Exactly how are you generating these?  When I use clustalw no  
identity masking occurs by default.

chris

On Feb 25, 2007, at 7:28 AM, ? ?? wrote:

> hi, all,
> I use the AlignIO module to convert the alignment file.
> my original file is :
> CLUSTAL W(1.81) multiple sequence alignment
>
>
> dana_GLEANR_11249       
> MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
> dere_GLEANR_7213       ...V...................I....................... 
> .............
> dgri_GLEANR_6962       .......................I....................... 
> .............
> FBgn0004638            .......................I....................... 
> .............
> dmoj_GLEANR_6118       ...........N...........I....................... 
> .............
> dper_GLEANR_18885      ...V...................I....................... 
> .............
> dpse_GLEANR_14384      ...V...................I....................... 
> .............
> dsec_GLEANR_3096       .................N.....I....................... 
> .............
> dsim_GLEANR_9744        
> -----------------------------...............................
> dvir_GLEANR_4811       .......................I....................... 
> .............
> dwil_GLEANR_10869      .......................I....................... 
> .............
> dyak_GLEANR_13576      .......................I....................... 
> .............
>
>
>
> dana_GLEANR_11249       
> YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       .................L............................. 
> .............
> dper_GLEANR_18885      ............................................... 
> .............
> dpse_GLEANR_14384      ............................................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249       
> VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       ..............................V.D.............. 
> .............
> dper_GLEANR_18885      .......................E....................... 
> .............
> dpse_GLEANR_14384      .......................E....................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
> dere_GLEANR_7213       ...............................
> dgri_GLEANR_6962       ...............................
> FBgn0004638            ...............................
> dmoj_GLEANR_6118       ............Q..................
> dper_GLEANR_18885      ...............................
> dpse_GLEANR_14384      ...............................
> dsec_GLEANR_3096       ...............................
> dsim_GLEANR_9744       ...............................
> dvir_GLEANR_4811       ...............................
> dwil_GLEANR_10869      ...............................
> dyak_GLEANR_13576      ...............................
>
>
> I want to change those "." characters back to alphabetic  
> expression, then i write the code like this:
> use Bio::AlignIO;
> my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
>                      -format => 'clustalw');
> my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
>                       -format =>'clustalw');
> while (my $aln=$in->next_aln() ){
>    $aln->unmatch();
>    $aln->set_displayname_flat();
>    $out->write_aln($aln);
> }
>
> but when i run the code, there are error message like:
>
> -------------------- WARNING ---------------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> ---------------------------------------------------
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name [dsim_GLEANR_9744/1-182]
> STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ 
> Bio/SimpleAlign.pm:2307
> STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ 
> bioperl-live/Bio/SimpleAlign.pm:2374
> STACK toplevel aligntest.pl:11
>
> --------------------------------------
>
> I don't know where is the problem.
>
> Jiang
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cristiangary at gmail.com  Sun Feb 25 16:04:57 2007
From: cristiangary at gmail.com (Cristian Gary)
Date: Sun, 25 Feb 2007 18:04:57 -0300
Subject: [Bioperl-l] problem with blast report to ncbi webpage
Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com>

i have a problem with the blast report to the ncbi server.  the time to wait
the Rids dont showme any result.
the problem is the ncbi server o the biperl version.?
pd: the same code works very well a 3 weeks ago.


-- 
"El conocimiento le pertecene  a la humanidad"

"Gnu/linux   -------- free your mind......
www.kubuntu.org


From granjeau at tagc.univ-mrs.fr  Mon Feb 26 04:17:15 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Mon, 26 Feb 2007 10:17:15 +0100
Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object
Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr>

Hello !

I would like to fill a BioSeq object with the output from a dbfetch
request at EI on UniParc database (which replies only XML code, as I am
interested in references). If somebody could tell which BioPerl object
to use or a way or convert it in Swiss format or could tell me the way
to do it or has got a piece of code (is
http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good
starting point), I would appreciate a lot.

Best regards,
--Samuel

<entry accession="UPI00004A0D4A">
<dbReferenceList>
    <dbReference db="EMBL" id="CAI39485" version="1" version_i="1" 
active="Y" created="04-Jan-2005" last="15-Dec-2006"/>
    <dbReference db="UniProtKB/TrEMBL" id="Q5JVT0" version="1" 
version_i="1" active="N" created="15-Feb-2005" last="06-Feb-2007"/>
    <dbReference db="ENSEMBL" id="ENSP00000352958" version_i="2" 
active="Y" created="03-Apr-2006" last="27-Nov-2006"/>
    <dbReference db="IPI" id="IPI00418471" version="4" version_i="4" 
active="N" created="07-Mar-2005" last="07-Mar-2005"/>
    <dbReference db="IPI" id="IPI00646867" version="1" version_i="1" 
active="N" created="06-Sep-2005" last="06-Oct-2006"/>
    <dbReference db="VEGA" id="OTTHUMP00000019225" version_i="1" 
active="N" created="15-Aug-2005" last="02-Dec-2005"/>
</dbReferenceList>
<sequence length="431" crc64="8913D1F04A71CCFB">
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV
YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK
VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE
DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE
EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE
AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD
TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS
LNLRGKHFISL
</sequence>
</entry>


From bix at sendu.me.uk  Mon Feb 26 06:46:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Feb 2007 11:46:39 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
In-Reply-To: <45E16ABC.3060405@sendu.me.uk>
References: <45E16ABC.3060405@sendu.me.uk>
Message-ID: <45E2C89F.1020402@sendu.me.uk>

Nat replied, but I messed up to To:s so his reply didn't make it to the
list. Here's what he said:


Nathan (Nat) Goodman wrote:
Hi Tels

I agree it's sad to reinvent the wheel, but I don't think that's what
happened here. Your module seems to be focused on rendering graphs while
my module is concerned with computations on graphs.

In any case, as Sendu notes, SimpleGraph is in the process of being
deprecated. I fully support this move. It was intended to be a stopgap
until the main Perl Graph module was fixed.  Since that has now
happened, it's time for SimpleGraph to retire.

For the benefit of anyone using Graph: last I checked (six months or
more ago), it had serious performance problems on large graphs (probably
not too much of a surprise), and also was inexplicably slow on graphs
with edge attributes.  I see that the latter bug is marked "resolved" in
CPAN, but there's no indication of when or how.  We've moved to Boost
for graphs as large as the human protein interaction network.

Best,
Nat


From sanjib at bic.boseinst.ernet.in  Mon Feb 26 00:23:36 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Mon, 26 Feb 2007 10:53:36 +0530
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors

waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
xx.pep

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp

<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>

---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.

On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote
> Mailing list subscription confirmation notice for mailing list
> Bioperl-l
> 
> We have received a request from 202.141.148.27 for subscription of
> your email address, "sanjib at bic.boseinst.ernet.in", to the
> bioperl-l at lists.open-bio.org mailing list.  To confirm that you want
> to be added to this mailing list, simply reply to this message,
> keeping the Subject: header intact.  Or visit this web page:
> 
>     http://lists.open-bio.org/mailman/confirm/bioperl-
l/d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Or include the following line -- and only the following line -- in a
> message to bioperl-l-request at lists.open-bio.org:
> 
>     confirm d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Note that simply sending a `reply' to this message should work from
> most mail readers, since that usually leaves the Subject: line in the
> right form (additional "Re:" text in the Subject: is okay).
> 
> If you do not wish to be subscribed to this list, please simply
> disregard this message.  If you think you are being maliciously
> subscribed to the list, or have any other questions, send them to
> bioperl-l-owner at lists.open-bio.org.

--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070226/86a0137c/attachment-0003.pl>

From cjfields at uiuc.edu  Mon Feb 26 09:59:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 08:59:21 -0600
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
	<20070226052336.M74918@bic.boseinst.ernet.in>
Message-ID: <C668C555-39ED-43A9-8B49-C7D0376D971F@uiuc.edu>

I tested this out and got BLAST to work for my test case (single  
fasta seq, since you didn't send any seqs for testing).  It keeps  
querying for the RID in what appears to be an infinite loop (i.e. it  
doesn't get rid of the RID properly); you can see this if you add '- 
verbose => 1' to your parameters.  I don't have time to delve into it  
but from a quick glance it may be due to your looping structure and  
how you are saving your rids.

As for your particular error, could it be something as simple as the  
server was overloaded or down?  It does happen from time to time...

Beyond that I can't make heads or tails of your script.  Was it  
cobbled together from a bunch of others?  If you are doing that you  
can probably expect some bugs to occur.

chris

On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote:

> Hi
> I have been running this script for some time and it was running  
> fine. I am
> using this linux machine with live IP(no proxy). But suudenly it  
> has stopped
> working with this errors
>
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> xx.pep
>
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
> Content-Length: 497
> Content-Type: application/x-www-form-urlencoded
>
> DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
> 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA 
> GDTLDVF
> TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT 
> AFTSLPV
> YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG 
> AAVIAMV
> HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S 
> TATISTI
> CS=off&EXPECT=1e-
> 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& 
> ENTREZ_
> QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
>
> <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Internal Server Error
> </BODY>
> </HTML>
>
> ---------------------------------------------------
>
> Though I am able to see the ncbi page from browser but am unable to  
> ping ot
> trace route to the server.
>
> Please help me.


From cjfields at uiuc.edu  Mon Feb 26 10:05:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 09:05:50 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
References: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu>

Make sure to keep this on the list, others may have some input.

You should be able to test the various sequence objects you're  
retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what  
you're expecting, then track down the problematic sequences.  My  
guess is the odd seqs are due to the way you are using Bio::DB::Fasta  
for each of the files.  I'm wondering if you are having problems with  
indices overwriting one another and are thus getting back blank seq  
objects.

You should probably consider just indexing all of your files  
together; according to the POD you can use a single Bio::DB::Fasta to  
index all of the files in one go (indicate the path and use '-glob')  
and retrieve what you need that way.  Either that or separating them  
into separate directories so the indices are also separate.

chris

On Feb 25, 2007, at 9:50 PM, ? ?? wrote:

> Thank you for your help!
> May be you are right, I use the following code to create my seq  
> object arrays:
>          my $outfilename=$dmel;
>          my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta");
>          my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta");
>          my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta");
>          my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta");
>          my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta");
>          my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta");
>          my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta");
>          my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta");
>          my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta");
>          my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta");
>          my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta");
>          my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta");
>          my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana);
>          my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana);
>          my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere);
>          my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere);
>          my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel);
>          my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel);
>          my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec);
>          my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec);
>          my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim);
>          my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim);
>          my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak);
>          my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak);
>          push @prots, $ana_pep_obj;
>          push @cdna, $ana_nuc_obj;
>          push @prots, $ere_pep_obj;
>          push @cdna, $ere_nuc_obj;
>          push @prots, $mel_pep_obj;
>          push @cdna, $mel_nuc_obj;
>          push @prots, $sec_pep_obj;
>          push @cdna, $sec_nuc_obj;
>          push @prots, $sim_pep_obj;
>          push @cdna, $sim_nuc_obj;
>          push @prots, $yak_pep_obj;
>          push @cdna, $yak_nuc_obj;
>
> then I use the @prots as input for  my  $aln=$aln_factory->align 
> (\@prots);
> This method will create align files with sequences masked.
>
> But if I use fasta files(not an object) which contain protein  
> sequences as input, $inputfile='FBgn0000097.pep';
> @params=('outorder'=>'INPUT');
> $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params);
> $aln=$factory->align($inputfile);
> #$aln->gap_char('-');
> $aln->map_chars('\.','-');
> $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw');
> $aln_out->write_aln($aln);
>
> This methods create files without masking~~~
> I think sequence objects created by "get_Seq_by_id" from sequence  
> databases directly are not appropriate.
>
> Thank you for your suggestion again!
>
> Jiang.
>
>> From: Chris Fields <cjfields at uiuc.edu>
>> To: ????? <biology0046 at hotmail.com>
>> Subject: Re: [Bioperl-l] AlignIO problems
>> Date: Sun, 25 Feb 2007 21:26:34 -0600
>>
>> I ran the same using a local fasta formatted file on my system  
>> which  works (no masking).
>>
>> Of note, the gaps were all marked as '.'.  You're gaps were both  
>> '.'  and '-',  which may mean that something is wrong with the seq  
>> objects  themselves.  Maybe SeqIO is misreading them?
>>
>> chris
>>
>> On Feb 25, 2007, at 7:34 PM, ????? wrote:
>>
>>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry  
>>> out  multiple alignment.
>>> my code is:
>>>         my @clustal_param=('outorder'=>'INPUT');
>>>         my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new  
>>> (@clustal_param);
>>>         my  $aln=$aln_factory->align(\@prots);###@prots is   
>>> array  of protein sequence objects
>>>         my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ 
>>> clustal/ ${outfilename}.aln",-format=>'clustalw');
>>>
>>>         $aln_out->write_aln($aln);
>>> This code produce alignment which mask identity residues.
>>> But if i use clustalW directly, the output is normal.
>>> Thank you for your help~
>>>
>>> Jiang
>>
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From michael.watson at bbsrc.ac.uk  Mon Feb 26 11:00:31 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon, 26 Feb 2007 16:00:31 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
	<6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi Lincoln/List
 
That's great, the axis now appears, but there are no labels.  This in
itself isn't a problem, as long as we can assume that the tick marks are
at 0, 50% and 100%?  If that's true, we can go with what we have,
otherwise I'm going to have to figure out a way to label the y-axis
 
Thanks
Mick

________________________________

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf
Of Lincoln Stein
Sent: 15 February 2007 18:53
To: michael watson (IAH-C)
Cc: BioPerl-List
Subject: Re: [Bioperl-l] The axis of GC content in
Bio::Graphics::glyph:dna


Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20); 

This will leave enough room on the left and right for you to see the Y
axis. Otherwise it runs off the edge of the image (ok, this is a
mis-design, but it was the only way to solve a chicken-and-egg problem
about who gets to say how wide the panel is) 

Lincoln


On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote: 

	Hi
	
	OK I have some great images out of this glyph, but I can't see
the axis,
	and nor is it labelled (ie does it go from 0 - 100%?) so isn't
great for
	publication.  The docs say:
	
	"NOTE: -gc_window=>'auto' gives nice results and is recommended
for 
	drawing GC content. The GC content axes draw slightly outside
the
	panel, so you may wish to add some extra padding on the right
and
	left. "
	
	Any idea how to do this?
	
	Basically, I want a nice GC graph with the axis quite clearly
labelled, 
	and a nice "%GC" title next to it :)
	
	Thanks
	
	Mick
	
	The information contained in this message may be confidential or
legally
	privileged and is intended solely for the addressee. If you have

	received this message in error please delete it & notify the
originator
	immediately.
	Unauthorised use, disclosure, copying or alteration of this
message is
	forbidden & may be unlawful.
	The contents of this e-mail are the views of the sender and do
not 
	necessarily represent the views of the Institute.
	This email and associated attachments has been checked locally
for
	viruses but we can accept no responsibility once it has left our
	systems.
	Communications on Institute computers are monitored to secure
the 
	effective operation of the systems and for other lawful
purposes.
	
	_______________________________________________
	Bioperl-l mailing list
	Bioperl-l at lists.open-bio.org 
	http://lists.open-bio.org/mailman/listinfo/bioperl-l
	

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory 
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Mon Feb 26 12:18:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 11:18:38 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
References: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu>


On Feb 26, 2007, at 9:59 AM, ? ?? wrote:

> Thank you!
> I have checked the sequences retrieved through lots of Bio:DB  
> objects work simultaneously.
> There are not problems you mentioned, the sequences are not  
> overwritten.

Again, keep this on the list.  I have my hands full this month so I  
will be checking the list only very sporadically; someone else may be  
able to help you.

The only explanation for the clustalw output you get is that you are  
not retrieving the correct sequence in some way fundamental way,  
which to me indicates the bug originates either in the way the  
sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my  
thought about conflicting indices) or in the way they are converted  
via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw.

When I have used Bio::DB::Fasta in the past I have never had a  
problem when indexing multiple files and retrieving sequences, so  
beyond running tests with your data I can't help you much beyond the  
above conjecturing.

chris


From jason at bioperl.org  Mon Feb 26 13:45:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 10:45:34 -0800
Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast
In-Reply-To: <20070226095515.68810@gmx.net>
References: <20070226095515.68810@gmx.net>
Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org>

Alex -
I am glad to see of your interest in the module, but I don't  
currently have any time to maintain it and so queries should be sent  
to the BioPerl mailing list.  In general we prefer you don't contact  
developers directly, but use the mailing list so that others can  
learn from questions.

Please note there are several tutorials and documentation on the  
website, you will get a better response from people if you can show  
you have at least tried to use the existing example code to construct  
your program.

-jason
On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote:

> Daer Jason Stajich,
> I hope you can me help.
>
> I am inspired of their module and would like to work with it.
> I am a student to the TFH Wildau.
> I have problems with the understanding of the module.
>
> You could send me an example.
>
> The example is to process a text file (FASTA) with NCBI-Blast (Web).
>
> Parameter:
> Choose database -> Others -> nr
> Limit by entrez query -> Campylobacter -> or select from: ->  
> Bacteria [ORGN]
> Expect -> 10
> Other advanced -> -q-1
>
> output format
> plain text without Graphical Overview
> Number of: -> Descriptions -> 10000
> Alignment view -> query-anchored with identities
>
> All other parameters remain undef.
>
> Thank you for your help.
>
> faithfully Alexander Auner
> -- 
> "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ...
> Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out


From jason at bioperl.org  Mon Feb 26 14:13:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 11:13:00 -0800
Subject: [Bioperl-l] BioPerl leadership additions
Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>

Dear BioPerl Users and Developers,

I want to announce a addition in the leadership of BioPerl.   
Christopher Fields and and Sendu Bala are now members of the BioPerl  
Core developer group to recognize their ongoing leadership in the  
project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
release and have made a significant commitment and contribution to  
the quality of the code and the documentation of the project.  We  
have invited them to be part of the core to recognize their work and  
to feel comfortable to ask them to do more. ;-)

The Core group was established to insure that someone was responsible  
for making code releases, vetting new developers for CVS write  
accounts, and generally dealing with things that might otherwise slip  
through the cracks.  We are very excited to have more people  
contributing to and maintaining the toolkit.  We look forward to  
their help along with all the other developers, as we work towards a  
1.6 release release this year.

As always, while their is a need for some individuals to lead the  
project, we encourage contributions from all levels of expertise to  
improve the code, documentation, and tutorials of the project.

We plan to discuss the progress of the toolkit at this year's  
Bioinformatics Open Source Conference held in Vienna, Austria in  
conjunction with the SIG meetings at ISMB.   We are trying to use  
BOSC 2007 as a chance for the developers of Open Bioinformatics  
Foundation sponsored and related projects to coordinate future  
development and release cycles.

Jason Stajich on behalf of the Core developers


From khan at cshl.edu  Mon Feb 26 15:29:19 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Mon, 26 Feb 2007 15:29:19 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791CA@mailbox02.cshl.edu>

Thanks Michael.  I have the scripts installed.  I can pass an id to indexed fasta file and retrieve the seq.  However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids?
Thanks.

-Sohail

-----Original Message-----
From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk]
Sent: Tuesday, February 20, 2007 4:33 PM
To: Khan, Sohail; Bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file.


Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Feb 26 16:44:49 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 26 Feb 2007 15:44:49 -0600
Subject: [Bioperl-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx>

Congrats Chris & Sendu! Very well-deserved. Keep up the great work.

Cheers!
Mauricio.

Jason Stajich wrote:
> Dear BioPerl Users and Developers,
> 
> I want to announce a addition in the leadership of BioPerl.   
> Christopher Fields and and Sendu Bala are now members of the BioPerl  
> Core developer group to recognize their ongoing leadership in the  
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
> release and have made a significant commitment and contribution to  
> the quality of the code and the documentation of the project.  We  
> have invited them to be part of the core to recognize their work and  
> to feel comfortable to ask them to do more. ;-)
> 
> The Core group was established to insure that someone was responsible  
> for making code releases, vetting new developers for CVS write  
> accounts, and generally dealing with things that might otherwise slip  
> through the cracks.  We are very excited to have more people  
> contributing to and maintaining the toolkit.  We look forward to  
> their help along with all the other developers, as we work towards a  
> 1.6 release release this year.
> 
> As always, while their is a need for some individuals to lead the  
> project, we encourage contributions from all levels of expertise to  
> improve the code, documentation, and tutorials of the project.
> 
> We plan to discuss the progress of the toolkit at this year's  
> Bioinformatics Open Source Conference held in Vienna, Austria in  
> conjunction with the SIG meetings at ISMB.   We are trying to use  
> BOSC 2007 as a chance for the developers of Open Bioinformatics  
> Foundation sponsored and related projects to coordinate future  
> development and release cycles.
> 
> Jason Stajich on behalf of the Core developers
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Tue Feb 27 08:26:30 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 27 Feb 2007 14:26:30 +0100
Subject: [Bioperl-l] parsing blast results
Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>

Hi,
I am using the module Bio::SearchIO to parse some blast results. I would
like to store the ids of the results into an array but I am not sure if this
is possible to do it with an existing subroutine. Does anyone have an idea
whether there is a method included within the module Bio::SearchIO to do so?
Thanks in advance,
L.Pardo


From cjfields at uiuc.edu  Tue Feb 27 09:11:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 08:11:37 -0600
Subject: [Bioperl-l] parsing blast results
In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
Message-ID: <E1B6ED22-1120-4333-AA73-19B57D102EA9@uiuc.edu>


On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote:

> Hi,
> I am using the module Bio::SearchIO to parse some blast results. I  
> would
> like to store the ids of the results into an array but I am not  
> sure if this
> is possible to do it with an existing subroutine. Does anyone have  
> an idea
> whether there is a method included within the module Bio::SearchIO  
> to do so?
> Thanks in advance,
> L.Pardo

Bio::SearchIO doesn't currently have a method to retrieve all the  
accessions in a BLAST result.  The best way to do this is to iterate  
through the objects:

my @accs;

while (my $result = $searchio->next_result) {
     while (my $hit = $result->next_hit) {
         push @accs, $hit->accession;
         # do whatever else here...
     }
}

print join ',', @accs;

I don't think all accessions in the description are parsed out at the  
moment, just the first one (or the one in the hit table).  If you  
want all of them or if you want the NCBI GI you'll need to parse them  
out of the description heading ($hit->description).

chris


From sac at bioperl.org  Tue Feb 27 12:59:22 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 27 Feb 2007 09:59:22 -0800
Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com>

Welcome to the club, Chris & Sendu. Always good to have an infusion of new
blood and capable, motivated hands.

Steve

On 2/26/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Dear BioPerl Users and Developers,
>
> I want to announce a addition in the leadership of BioPerl.
> Christopher Fields and and Sendu Bala are now members of the BioPerl
> Core developer group to recognize their ongoing leadership in the
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer
> release and have made a significant commitment and contribution to
> the quality of the code and the documentation of the project.  We
> have invited them to be part of the core to recognize their work and
> to feel comfortable to ask them to do more. ;-)
>
> The Core group was established to insure that someone was responsible
> for making code releases, vetting new developers for CVS write
> accounts, and generally dealing with things that might otherwise slip
> through the cracks.  We are very excited to have more people
> contributing to and maintaining the toolkit.  We look forward to
> their help along with all the other developers, as we work towards a
> 1.6 release release this year.
>
> As always, while their is a need for some individuals to lead the
> project, we encourage contributions from all levels of expertise to
> improve the code, documentation, and tutorials of the project.
>
> We plan to discuss the progress of the toolkit at this year's
> Bioinformatics Open Source Conference held in Vienna, Austria in
> conjunction with the SIG meetings at ISMB.   We are trying to use
> BOSC 2007 as a chance for the developers of Open Bioinformatics
> Foundation sponsored and related projects to coordinate future
> development and release cycles.
>
> Jason Stajich on behalf of the Core developers
>
> _______________________________________________
> Bioperl-announce-l mailing list
> Bioperl-announce-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l
>


From cjfields at uiuc.edu  Tue Feb 27 15:57:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 14:57:40 -0600
Subject: [Bioperl-l] Bio::SeqIO::FTHelper
Message-ID: <D6922F04-A349-41C4-B4DC-6763E3195B05@uiuc.edu>

Could anyone tell me what FTHelper is used for?  From what I gather  
it rolls up seqfeature data into a lightweight object but then  
creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ 
Swiss), which seems to be a waste of memory and time.  Is there  
something I'm missing (besides my sanity of course)?

chris


From Jay at jays.net  Wed Feb 28 04:39:55 2007
From: Jay at jays.net (Jay Hannah)
Date: Wed, 28 Feb 2007 03:39:55 -0600
Subject: [Bioperl-l] "Command-Line Bioinformatics"
Message-ID: <F7C1E903-1712-40A5-B817-8CDAADECEBF4@jays.net>

Reading this article:
http://www.linuxjournal.com/article/6977
Sequencing the SARS Virus - Linux Journal, Nov 2003

This guy needs Perl and/or BioPerl.  :)

> The sequence file is in FASTA format consisting of a header line  
> and the sequence, split into fixed-width lines. The following  
> counts the number of Gs and Cs in the sequence and presents the  
> total as a fraction of the total number of bases:
>
> > grep -v "^>" AY274119.fa | fold -w 1 |
> tr "ATGC" "..xx" | sort | uniq -c |
> sed 's/[^0-9]//g' | t -s "\012" " " |
> sed 's/\([0-9]*\) \([0-9]*\)/scale = 3;
> ?\2 \/ (\1+\2)/' |
> bc -i
> scale = 3; 12127 / (17624+12127)
> .407
>
> Out of the 29,751 bases in our sequence, 12,127 are either G or C,  
> giving a GC content of 41%.

BioPerl version:

use Bio::SeqIO;
my $io = Bio::SeqIO->new(
   -file   => 'AY274119.fa',
   -format => 'Fasta'
);
my $seq = $io->next_seq->seq;
print ( ($seq =~ tr/GC/GC/) / length ($seq) );

Command-line Perl:

perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ /  
length($_)' AY274119.fa

I'm sure you can Perl Golf my stabs at it.  :)

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From n.saunders at uq.edu.au  Wed Feb 28 05:25:08 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:25:08 +1000
Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E55884.9010908@uq.edu.au>

Dear Bioperlers,

I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used 
in a CGI script.  Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7.

If I load this test CGI script (cgi.pl) in a browser:

BEGIN CODE
----------
#!/usr/bin/perl -Tw
use strict;
use CGI;
use Bio::Factory::EMBOSS;

my $cgi = new CGI;
my $f   = new Bio::Factory::EMBOSS;

print $cgi->header,
       $cgi->start_html,
       $cgi->end_html;
--------
END CODE

I get a 500 server error and the Apache error log reads:
[error] [client 192.168.0.3] Premature end of script headers: cgi.pl

I can fix this in 2 ways:

(1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, 
which isn't a very useful fix.
(2) Remove the -T switch from the shebang line

There seem to be a few old posts on the list regarding "taint-safe" modules.  It 
seems that the new Bio::Factory::EMBOSS object is interfering with the headers 
in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From n.saunders at uq.edu.au  Wed Feb 28 05:30:31 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:30:31 +1000
Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E559C7.1090308@uq.edu.au>

Further to my previous email, adding:

BEGIN {
     $|=1;
     print "Content-type: text/html\n\n";
     use CGI::Carp('fatalsToBrowser');
}

to my CGI script generates:

Insecure $ENV{PATH} while running with -T switch at 
/usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From n.saunders at uq.edu.au  Wed Feb 28 05:50:58 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:50:58 +1000
Subject: [Bioperl-l] CGI taint solved
Message-ID: <45E55E92.10608@uq.edu.au>

Apologies for running a one-man thread, but I realised that I've now answered my 
own question regarding errors with CGI, Bio::Factory::EMBOSS and taint.

Given that the EMBOSS binaries are in /usr/local/bin, adding:

$ENV{'PATH'} = '/usr/local/bin'

near the top of the script does the trick.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From cjfields at uiuc.edu  Wed Feb 28 08:39:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 07:39:24 -0600
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <45E55E92.10608@uq.edu.au>
References: <45E55E92.10608@uq.edu.au>
Message-ID: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>

That could possibly clobber any other program calls from within the  
same script (unless they reside in /usr/local/bin) since you're  
explicitly assigning PATH, not appending:

$ENV{"PATH"} = '/usr/local/bin';

gets me (printing $ENV{"PATH"}):

/usr/local/bin

whereas this:

$ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};

gets me:

/usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin

There's probably a File::* module that does this safely per OS flavor.

chris

On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:

> Apologies for running a one-man thread, but I realised that I've  
> now answered my
> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
> taint.
>
> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>
> $ENV{'PATH'} = '/usr/local/bin'
>
> near the top of the script does the trick.
>
>
> Neil
> -- 
>   School of Molecular and Microbial Sciences
>   University of Queensland
>   Brisbane 4072 Australia
>
> http://nsaunders.wordpress.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Feb 28 10:35:31 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 10:35:31 -0500
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
References: <45E55E92.10608@uq.edu.au>
	<E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
Message-ID: <45E5A143.3080303@bms.com>

Neil, I believe this is your situation:
http://wn.cyberwerks.com/2000/0411.html
my advice: any commands executed from within cgi script should have a 
path hardcoded whenever possible.
If those commands require different path, try writing a wrapper shell 
script that sets the environment (which should be reset to the default 
once the shell script terminates). It all also depends on the type of 
environment you have- it it is not secure you may wish to think hard how 
to eliminate all security loopholes with CGI, I am definitely not an 
expert on this.
Stefan

Chris Fields wrote:
> That could possibly clobber any other program calls from within the  
> same script (unless they reside in /usr/local/bin) since you're  
> explicitly assigning PATH, not appending:
>
> $ENV{"PATH"} = '/usr/local/bin';
>
> gets me (printing $ENV{"PATH"}):
>
> /usr/local/bin
>
> whereas this:
>
> $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};
>
> gets me:
>
> /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
> local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
>
> There's probably a File::* module that does this safely per OS flavor.
>
> chris
>
> On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:
>
>   
>> Apologies for running a one-man thread, but I realised that I've  
>> now answered my
>> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
>> taint.
>>
>> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>>
>> $ENV{'PATH'} = '/usr/local/bin'
>>
>> near the top of the script does the trick.
>>
>>
>> Neil
>> -- 
>>   School of Molecular and Microbial Sciences
>>   University of Queensland
>>   Brisbane 4072 Australia
>>
>> http://nsaunders.wordpress.com
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From lubapardo at gmail.com  Wed Feb 28 12:21:07 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Wed, 28 Feb 2007 18:21:07 +0100
Subject: [Bioperl-l] retrieven ids
Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>

Hi everyone,
I wonder if someone could give an advice of the following:
I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not
want to translate the protein back to DNA, but rather get the DNA coding
sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any
module that allow to get all possible ids for a sequence given a gi protein
?

Thank you very much in advance,
L. Pardo


From johnston at biochem.ucl.ac.uk  Wed Feb 28 12:05:49 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT)
Subject: [Bioperl-l] _rearrange
Message-ID: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>

hi,

Is there a discussion of the rationale behind the _rearrange method
somewhere? I'm probably just being gormless, but I think I'm missing the
point a bit.

Is it okay for a method just to expect named params like
->foo(arg1=>'stuff', arg2=>'things'); ?

Cxx


From ckuanglim at yahoo.com  Wed Feb 28 10:51:50 2007
From: ckuanglim at yahoo.com (Chan Kuang Lim)
Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST)
Subject: [Bioperl-l] Problem of Installing Bioperl
Message-ID: <459942.77644.qm@web60518.mail.yahoo.com>

I have problem of installing bioperl in windows using command-line installation.
In the cmd windows, after 
ppm-shell
search bioperl
install 2

many downloading had done, but the next line is:
Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz


Hope you can answer my question. Thank you.

Regards,
Chan Kuang Lim
Malaysia

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From cjfields at uiuc.edu  Wed Feb 28 13:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 12:30:45 -0600
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu>

 From what I gather it's a convenient utility method that is used for  
consistent and enforced parameter checking/setting for any method,  
including the constructor.

There are a few modules that don't use _rearrange (Bio::WebAgent::new 
() comes to mind).  It's not required that you use it but the naming  
conventions for parameters outlined in _rearrange (in  
Bio::Root::RootI POD) are generally enforced for consistency across  
classes.

As a note, Sendu has committed a related method (_set_from_args) to  
CVS which works rather well, but I don't think it is in the last  
release.

chris

On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote:

> hi,
>
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm  
> missing the
> point a bit.
>
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?
>
> Cxx
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Wed Feb 28 14:31:29 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST)
Subject: [Bioperl-l] retrieven ids
In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu>

Whenever I'm unsure of how to do something, I first look to see if one of
the  HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has
example code which I think will do what you want.

Genbank records typically have the coding sequence of a protein as a
feature, so I would do something like:

- use the RefSeq protein IDs to query Entrez and get back the Genbank
records.

- read the Features HOWTO to refresh my memory on the syntax for grabbing
features.

That HOWTO is at:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

- whip up a little script to loop through the Genbank records one at a
time with SeqIO and pull out the cDNA sequence features.


Dave


From bix at sendu.me.uk  Wed Feb 28 14:38:46 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 19:38:46 +0000
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <45E5DA46.3020503@sendu.me.uk>

Caroline Johnston wrote:
> hi,
> 
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm missing the
> point a bit.
> 
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?

The Bioperl style for named args is -arg1, and wrong case is allowed as 
well. So, make use of _rearrange; it won't do you any harm.


From johnsonm at gmail.com  Wed Feb 28 14:59:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 13:59:09 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark
	and Glimmer
Message-ID: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>

    I happen to need something like Bio::Tools::Run::Genemark, so I'm coding
one up.  When I started on the tests for it, I realized I have a problem.  I
can distribute a fasta file downloaded from GenBank to use as input, but I
can't distribute the model file needed to actually run Genemark (
Genemark.hmm for prokaryotes, gmhmmp, in my case).
    It took *forever* to get a license, and I'm not thrilled with the
prospect of talking them out of a redistributable model file.  I'd love to
distribute the test, but I don't see how I'm going to be able to.
Suggestions?
    Also, I've settled on IPC::Run instead of system().  The docs indicate
the bits of it I'm using should be OK on Windows, except maybe for Win9X.
I don't want to clutter up the console, I don't like embedding stdout/stderr
redirection in command strings, and I don't want to have to worry about
signal handling (What if the child catches a ctrl-c halfway through
parsing?  What if the parent does?).  Anybody object to that?
   One final thing.  I'm lazy, I don't want to deal with parsing arguments
to the constructor, so I'm just calling _rearrange() to deal with it.  The
Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
stuff in Bio::Tools::Run:: takes dashless args.  Objections?


From dmessina at wustl.edu  Wed Feb 28 15:14:56 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST)
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>

> I'm not thrilled with the prospect of talking them out of a
redistributable
> model file.

I suppose it's not possible to fake your own, or at least the parts of it
you're testing for?

If not, I'd put the tests in a skip block while waiting to hear from the
Genemark folks.


> The Bio::Tools:: parsers all take dash options, but it looks like a
bunch of
> the stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu will chime in I'm sure, but I think he was planning to switch
everything  in Bio::Tools::Run over to dashed args anyway...


Dave


From bix at sendu.me.uk  Wed Feb 28 15:52:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 20:52:23 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <45E5EB87.9020106@sendu.me.uk>

Mark Johnson wrote:
>    One final thing.  I'm lazy, I don't want to deal with parsing arguments
> to the constructor, so I'm just calling _rearrange() to deal with it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby 
for an example.


From bix at sendu.me.uk  Wed Feb 28 16:29:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 21:29:32 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
Message-ID: <45E5F43C.9080902@sendu.me.uk>

I have GD 2.35 and GD::SVG 2.33 installed.

I have a working script in which a Bio::Graphics::Panel object is made 
and output with:

print $panel->png;

This is fine. Changing it to:

print $panel->svg;

Gives the error:

Can't locate object method "svg" via package "GD:Image" at 
/.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.


Am I supposed to do something else to get this to work?


Cheers,
Sendu.


From crabtree at tigr.ORG  Wed Feb 28 16:40:52 2007
From: crabtree at tigr.ORG (Jonathan Crabtree)
Date: Wed, 28 Feb 2007 16:40:52 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F6E4.80003@tigr.org>


Sendu-

I believe you must set 'image_class' to 'GD::SVG' when you create the 
Panel (and note that older versions of Bio::Graphics::Panel don't know 
anything about this parameter.)  Here's the relevant part of the Panel 
perldoc:

   -image_class To create output in scalable vector
                graphics (SVG), optionally pass the image
                class parameter 'GD::SVG'. Defaults to
                using vanilla GD. See the corresponding
                image_class() method below for details.

Jonathan


Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
> 
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
> 
> print $panel->png;
> 
> This is fine. Changing it to:
> 
> print $panel->svg;
> 
> Gives the error:
> 
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
> 
> 
> Am I supposed to do something else to get this to work?
> 
> 
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Feb 28 17:01:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 22:01:17 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F6E4.80003@tigr.org>
References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org>
Message-ID: <45E5FBAD.3030404@sendu.me.uk>

Jonathan Crabtree wrote:
> 
> Sendu-
> 
> I believe you must set 'image_class' to 'GD::SVG' when you create the 
> Panel (and note that older versions of Bio::Graphics::Panel don't know 
> anything about this parameter.)  Here's the relevant part of the Panel 
> perldoc:

... Oh! I had no idea there was any perldoc for these modules, hiding 
down there at the bottom. Does anyone want to intersperse the docs?...


From cjfields at uiuc.edu  Wed Feb 28 17:10:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 16:10:54 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>

On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote:

>     I happen to need something like Bio::Tools::Run::Genemark, so  
> I'm coding
> one up.  When I started on the tests for it, I realized I have a  
> problem.  I
> can distribute a fasta file downloaded from GenBank to use as  
> input, but I
> can't distribute the model file needed to actually run Genemark (
> Genemark.hmm for prokaryotes, gmhmmp, in my case).
>     It took *forever* to get a license, and I'm not thrilled with the
> prospect of talking them out of a redistributable model file.  I'd  
> love to
> distribute the test, but I don't see how I'm going to be able to.
> Suggestions?

For bioperl-run tests you have to have the program installed for  
tests to work (otherwise they are passed over).  Therefore one would  
assume if you had the GeneMark program you would have the models as  
well.

You could set up your module to require an env. variable be set (like  
the HMMER module, for instance) which contains the executables and/or  
the models, so that if it isn't set the tests are skipped.

>     Also, I've settled on IPC::Run instead of system().  The docs  
> indicate
> the bits of it I'm using should be OK on Windows, except maybe for  
> Win9X.
> I don't want to clutter up the console, I don't like embedding  
> stdout/stderr
> redirection in command strings, and I don't want to have to worry  
> about
> signal handling (What if the child catches a ctrl-c halfway through
> parsing?  What if the parent does?).  Anybody object to that?

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?   
Otherwise we'll need to add it to the optional dependencies for  
bioperl-run.

>    One final thing.  I'm lazy, I don't want to deal with parsing  
> arguments
> to the constructor, so I'm just calling _rearrange() to deal with  
> it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a  
> bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in  
another thread _rearrange() works as well.

chris


From johnsonm at gmail.com  Wed Feb 28 17:29:36 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:29:36 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
Message-ID: <ebf5eb170702281429u51e8f7fgb9c0591a410500f8@mail.gmail.com>

On 2/28/07, Dave Messina <dmessina at wustl.edu> wrote:
>
> > I'm not thrilled with the prospect of talking them out of a
> redistributable model file.
>
> I suppose it's not possible to fake your own, or at least the parts of it
> you're testing for?


We got a gzipped tarball with some model files and a precompiled executable
(gmhmmp).  As far as building a model file goes, I don't even have two
sticks to rub together.  I suppose it's possible that it's not actually some
weird proprietary format, I'll go dig for some docs...but I don't hold out a
lot of hope.


From sukhinder.sandhu at osumc.edu  Wed Feb 28 16:49:31 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Wed, 28 Feb 2007 16:49:31 -0500
Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx
Message-ID: <C20B631B.1E0%sukhinder.sandhu@osumc.edu>

Hi
I am having trouble installing Bundle::BioPerl through CPAN. I don't know if
this has something to do with my having root priveleges. Can you please
suggest how may I proceed to get over this. I shall really appreciate any
help. I am pasting part of the error it keeps giving after trying to install
every module.
######################
CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz

make: *** No rule to make target
`/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h',
needed by `Makefile'.  Stop.
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible

###############################
Thanks

sukhinder


From sukhinder.sandhu at osumc.edu  Tue Feb 27 23:41:43 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Tue, 27 Feb 2007 23:41:43 -0500
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
Message-ID: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>

Hi
I am trying to install bioperl on my MACOSX and having problems. I try to
following the instructions both at the www.tc.umn.edu..... And in the README
and INSTALL files in the bioperl folder that I downloaded.
The error I get is the following: (end part of the output is copied)
####################
t/versions........ok
t/xs..............skipped
        all skipped: C_support not enabled
Failed Test Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/compat.t     5  1280    60    5   8.33%  25-28 31
4 tests and 31 subtests skipped.
Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay.
make: *** [test] Error 2
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force
Couldn't install Module::Build, giving up.
BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51.
Compilation failed in require at Build.PL line 14.
BEGIN failed--compilation aborted at Build.PL line 14.
###########################################################################
I am not able to figure out whats' going wrong.
And when I try to run the CPAN, I get the follwing error. I have no idea how
to fix these. Any help is greatly appreciated.
############################################################################
[Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e
shell  Terminal does not support AddHistory.

There seems to be running another CPAN process (pid 7207).  Contacting...
Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
    On UNIX try:
    rm /Users/sand60/.cpan/.lock
  and then rerun us.
 at -e line 1
###################################################
And doing what it says, removing some lock file doesn't help. I am wondering
if all this has something to do with having root priveleges on the system
and if so , is there an alternative? Thanks


sukhinder


From stefan.kirov at bms.com  Wed Feb 28 16:44:05 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 16:44:05 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F7A5.3090805@bms.com>

I think you should create the object with -image_class='svg'. Can you 
post the code with wich you create the object?
Stefan

Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
>
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
>
> print $panel->png;
>
> This is fine. Changing it to:
>
> print $panel->svg;
>
> Gives the error:
>
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
>
>
> Am I supposed to do something else to get this to work?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From johnsonm at gmail.com  Wed Feb 28 17:54:02 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:54:02 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
Message-ID: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>

On 2/28/07, Chris Fields <cjfields at uiuc.edu> wrote:

> For bioperl-run tests you have to have the program installed for
> tests to work (otherwise they are passed over).  Therefore one would
> assume if you had the GeneMark program you would have the models as
> well.
>
> You could set up your module to require an env. variable be set (like
> the HMMER module, for instance) which contains the executables and/or
> the models, so that if it isn't set the tests are skipped.


Sounds like a plan.

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?
> Otherwise we'll need to add it to the optional dependencies for
> bioperl-run.


I'd test it, but I don't have access to any Win9x boxes anymore.  IPC::Run
is not a core module, but I think it's worth the dependency.  I considered
IPC::Open3, but it can't be made reliable on Win32, something about not
being able to select() on filehandles, only sockets.  I also looked at
IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection
layered on top of system().  I don't like using system() due to issues with
signals (Such as the user hitting ctrl-c and taking out the child).  I feel
better knowing the wrapped executable is in another process disconnected
from the console.

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in
> another thread _rearrange() works as well.


I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
preferred to the other?


From bix at sendu.me.uk  Wed Feb 28 19:13:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 01 Mar 2007 00:13:29 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules
 for	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
Message-ID: <45E61AA9.9030906@sendu.me.uk>

Mark Johnson wrote:
> I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
> preferred to the other?

_set_from_args() is implemented using _rearrange() iirc. In any case, 
they do different things but _set_from_args() just makes creating 
wrapper modules a lot simpler. Another example: compare revisions 1.15 
and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it 
to use _set_from_args() and _setparams().

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/Alignment/Lagan.pm.diff?r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h

So, its new, but I'd recommend new modules, especially wrappers, make 
use of it.


From bix at sendu.me.uk  Wed Feb 28 19:19:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 01 Mar 2007 00:19:29 +0000
Subject: [Bioperl-l] Problem of Installing Bioperl
In-Reply-To: <459942.77644.qm@web60518.mail.yahoo.com>
References: <459942.77644.qm@web60518.mail.yahoo.com>
Message-ID: <45E61C11.90806@sendu.me.uk>

Chan Kuang Lim wrote:
> I have problem of installing bioperl in windows using command-line installation.
> In the cmd windows, after 
> ppm-shell
> search bioperl
> install 2
> 
> many downloading had done, but the next line is:
> Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz

Does that file exist on your system? Is it larger than 0kb? Can you open 
it yourself?


From cjfields at uiuc.edu  Wed Feb 28 20:19:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 19:19:31 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules
	for	Genemark and Glimmer
In-Reply-To: <45E61AA9.9030906@sendu.me.uk>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
	<45E61AA9.9030906@sendu.me.uk>
Message-ID: <93734147-BDDE-4D73-B8F1-FB4A9D073F9B@uiuc.edu>


On Feb 28, 2007, at 6:13 PM, Sendu Bala wrote:

> Mark Johnson wrote:
>> I'm using _rearrange() now.  I'll look at _set_from_args().  Is  
>> either one
>> preferred to the other?
>
> _set_from_args() is implemented using _rearrange() iirc. In any case,
> they do different things but _set_from_args() just makes creating
> wrapper modules a lot simpler. Another example: compare revisions 1.15
> and 1.16 of Bio::Tools::Run::Alignment::Lagan where I reimplemented it
> to use _set_from_args() and _setparams().
>
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-run/Bio/Tools/Run/ 
> Alignment/Lagan.pm.diff? 
> r1=text&tr1=1.15&r2=text&tr2=1.16&diff_format=h
>
> So, its new, but I'd recommend new modules, especially wrappers, make
> use of it.

Agreed; I think it allows for parameter variations (dashed, dashless,  
etc) and can create on-the-fly simple get/setters, so is particularly  
suited for wrappers.

_rearrange() will always have use in situations where using named  
parameters helps (long arg lists) but you don't want get/setters,  
just values.


From dmessina at wustl.edu  Wed Feb 28 20:40:39 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 19:40:39 -0600 (CST)
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
In-Reply-To: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>
References: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>
Message-ID: <58485.75.33.119.169.1172713239.squirrel@gscmail.wustl.edu>

> t/compat.t     5  1280    60    5   8.33%  25-28 31

This is the test that failed. I think you snipped the part above where the
actual errors causing the failure was printed.


> There seems to be running another CPAN process (pid 7207). Contacting...
> Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
>     On UNIX try:
>     rm /Users/sand60/.cpan/.lock
>   and then rerun us.
>  at -e line 1
> ###################################################
> And doing what it says, removing some lock file doesn't help.

Are you sure the lock file is really being removed? If so, what was the
error you got when running it after doing that?


Also, this line is important:
>  /usr/bin/make test -- NOT OK

It looks like you're trying to install on OS X. By default, OS X has perl
but not make. So /usr/bin/make probably doesn't exist on your system,
along with lots of other UNIX tools you'll want. To verify this, type:

which /usr/bin/make

on the command line. If you get:
/usr/bin/make: Command not found.

you'll need to install the OS X developer tools, called Xcode. You'll need
to register first, but you can get the latest version at:
http://developer.apple.com/tools/download/

After you do that, reread the BioPerl install docs and try to install
again. Since you don't have root on your machine, be sure to read the part
of the install instructions that describe what to do.


Dave


From hlapp at gmx.net  Wed Feb 28 23:16:38 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 28 Feb 2007 23:16:38 -0500
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
	<ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>
Message-ID: <EE9CB4BA-3C6C-4F38-85DB-E0A21FCD8B07@gmx.net>


On Feb 28, 2007, at 5:54 PM, Mark Johnson wrote:

> I don't like using system() due to issues with
> signals (Such as the user hitting ctrl-c and taking out the  
> child).  I feel
> better knowing the wrapped executable is in another process  
> disconnected
> from the console.

I'm not sure how the user would be able to take out the child hitting  
ctrl-c if you run it through system() (except if the parent  
terminates first - but maybe then terminating a run-away child is in  
good order).

I haven't read the IPC::run POD in full detail but you will want to  
make sure that if the parent gets killed the child does get killed  
too, or otherwise you'll have a run-away process that novices will  
have trouble with understanding or terminating.

Other than that though IPC::run seems like a useful module, so  
incurring this as a dependency should be fine.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cuiw at ncbi.nlm.nih.gov  Thu Feb  1 14:47:38 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Thu, 1 Feb 2007 09:47:38 -0500
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1059D.1070100@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BB960@NIHCESMLBX8.nih.gov>

This is a simple test from gene ID 3632373 (protein is 46100068) to
contig coordinates: 

perl -MLWP::Simple -e 'map {print $_, "\n" if
/<(Gene-source_src.*?>)(.*)?<$1/} (split "\n",
get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i
d=3632373&retmode=xml}))'

You need to translate protein id to gene id though. 

If the genome is available at Map Viewer, try (the contig name is
NW_101115 from last step)
http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA
PS=genes&cmd=txt

Wenwu Cui, PhD

-----Original Message-----
From: Rainer Machne [mailto:raim at tbi.univie.ac.at] 
Sent: Wednesday, January 31, 2007 4:10 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Dear Bioperl list,

hoping not be on the wrong email list, i would have a short question:

Is there a standard way or are there nice (Bioperl) tools to come from a

gene id (gi) other ids (see below) to the genomic coordinates of the 
respective gene?

We have Fasta files retrieved from NCBI protein Blast in fungal genomes:

 >gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago 
maydis 521]
or
 >gi|50292953|ref|XP_448909.1| unnamed protein product [Candida
glabrata]

(we only have gi, ref and gb in my set).

I retrieved all my fasta files from whole fungal genomes with available 
protein sequences at
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi

As I only searched whole finished genomes (not shotgun), I thought it 
would then be easy to get the genomic coordinates and retrieve upstream 
sequences, but we have failed so far to find a consistent way to do this

automatically. Many of the gi entries refer to mRNAs or partial mRNAs 
and the way to the coordinates seems to differ for each case.

Any suggestions would be appreciated.

with kind regards,
Rainer Machne

University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From raim at tbi.univie.ac.at  Thu Feb  1 12:54:21 2007
From: raim at tbi.univie.ac.at (Rainer Machne)
Date: Thu, 01 Feb 2007 13:54:21 +0100
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
Message-ID: <45C1E2FD.3070709@tbi.univie.ac.at>

Barry and Jason,

thanks for your quick and very helpful replies.

I guess we should have done (or repeat) our blast search at 
http://fungal.genome.duke.edu/
to get better mapping from proteins to genomes ?

As I retrieved all my proteins via whole genome blasts we should find 
(most of) them in the genbank files ... a good opportunity for me to 
learn some Bioperl and the other packages you mentioned in case we want 
to do more complex analysis later :-)

Thank you very much!

Rainer


Barry Moore wrote:
> Rainer,
> 
> We use a perl library called CGL written by Mark Yandell and  colleagues 
> (which in turn uses Chris Mungal's BioChaos and  Unflattener.pm referred 
> to by Jason) for this type of task.  The  basic pipeline is convert 
> GenBank files to Chaos XML, then use CGL  with those XML files to get a 
> nice object oriented access to exons,  transcripts, proteins, 
> coordinates and more for of those genes.  I am  currently using this 
> with good success on most GenBank genomes  (unfortunately I haven't been 
> working with the fungal genomes, but it  should work fine).  The Ensembl 
> API provides similar functionality  for Ensembl genomes - but not very 
> many fungi there.
> 
> http://www.yandell-lab.org/cgl/
> http://www.ensembl.org/info/software/core/core_tutorial.html
> 
> Feel free to contact Mark or myself  directly if you are interested  in 
> using CGL.
> 
> Barry
> 
> On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:
> 
>> Dear Bioperl list,
>>
>> hoping not be on the wrong email list, i would have a short question:
>>
>> Is there a standard way or are there nice (Bioperl) tools to come  from a
>> gene id (gi) other ids (see below) to the genomic coordinates of the
>> respective gene?
>>
>> We have Fasta files retrieved from NCBI protein Blast in fungal  genomes:
>>
>>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
>>
>> maydis 521]
>> or
>>
>>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida  glabrata]
>>
>>
>> (we only have gi, ref and gb in my set).
>>
>> I retrieved all my fasta files from whole fungal genomes with  available
>> protein sequences at
>> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
>>
>> As I only searched whole finished genomes (not shotgun), I thought it
>> would then be easy to get the genomic coordinates and retrieve  upstream
>> sequences, but we have failed so far to find a consistent way to do  this
>> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
>> and the way to the coordinates seems to differ for each case.
>>
>> Any suggestions would be appreciated.
>>
>> with kind regards,
>> Rainer Machne
>>
>> University of Vienna
>> Department for Theoretical Chemistry
>> Theoretical Biochemistry Group
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From cjfields at uiuc.edu  Thu Feb  1 17:55:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 11:55:27 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <45C1E2FD.3070709@tbi.univie.ac.at>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
Message-ID: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>


On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:

> Barry and Jason,
>
> thanks for your quick and very helpful replies.
>
> I guess we should have done (or repeat) our blast search at
> http://fungal.genome.duke.edu/
> to get better mapping from proteins to genomes ?
>
> As I retrieved all my proteins via whole genome blasts we should find
> (most of) them in the genbank files ... a good opportunity for me to
> learn some Bioperl and the other packages you mentioned in case we  
> want
> to do more complex analysis later :-)
>
> Thank you very much!
>
> Rainer

If the data is available in GenBank you could run the BLAST searches  
at NCBI and limit the search with an Entrez query:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query

Most (all?) genome files are tagged as complete

I'm not sure but there might be a way of doing this via  
Bio::Tools::Run::RemoteBlast.  Jason, any ideas?

chris


From cjfields at uiuc.edu  Thu Feb  1 18:09:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 1 Feb 2007 12:09:16 -0600
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <748CC48E-D224-4234-A5C4-E33968F17418@uiuc.edu>

> If the data is available in GenBank you could run the BLAST searches
> at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete

sorry, didn't finish that...

"Most (all?) genome files are tagged as complete, wgs, in progress,  
etc. and can be limited by taxonomy using Fungi[ORGN] or similar."

chris


From jason at bioperl.org  Thu Feb  1 18:36:02 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 10:36:02 -0800
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
In-Reply-To: <E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
References: <45C1059D.1070100@tbi.univie.ac.at>
	<CB6F853A-B5BF-43FE-B26B-C203724677A3@genetics.utah.edu>
	<45C1E2FD.3070709@tbi.univie.ac.at>
	<E0AE71DF-DF6A-4E8A-866F-172AC5EF7B8E@uiuc.edu>
Message-ID: <D8E2FDBC-AA2E-4EB9-8CB1-F3610776B41C@bioperl.org>


On Feb 1, 2007, at 9:55 AM, Chris Fields wrote:

>
> On Feb 1, 2007, at 6:54 AM, Rainer Machne wrote:
>
>> Barry and Jason,
>>
>> thanks for your quick and very helpful replies.
>>
>> I guess we should have done (or repeat) our blast search at
>> http://fungal.genome.duke.edu/
>> to get better mapping from proteins to genomes ?
>>

Well I'm not quite sure of your exact goals.  To find upstream  
regions of known genes, or look at upstream regions of orthologous  
genes?

You can first figure out orthologs based on protein similarities,  
then go in an extract upstream regions for the orthologous genes (I  
provide a link to a big all-vs-all FASTA result at the bottom of the  
page if you want those results, as well as some pairiwise orthology  
assignments, although you may want more or less stringent parameters).

All the GFF and AA data is freely available for download on the site  
for each genome we've annotated or for annotation we've re-formatted  
so you can do things locally and/or modify it to your liking.


>> As I retrieved all my proteins via whole genome blasts we should find
>> (most of) them in the genbank files ... a good opportunity for me to
>> learn some Bioperl and the other packages you mentioned in case we  
>> want
>> to do more complex analysis later :-)
>>
>> Thank you very much!
>>
>> Rainer
>
> If the data is available in GenBank you could run the BLAST  
> searches at NCBI and limit the search with an Entrez query:
>
> http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#entrez_query
>
> Most (all?) genome files are tagged as complete
>
> I'm not sure but there might be a way of doing this via  
> Bio::Tools::Run::RemoteBlast.  Jason, any ideas?
>
> chris

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From reenayadav at gmail.com  Thu Feb  1 18:38:03 2007
From: reenayadav at gmail.com (Reena Yadav)
Date: Fri, 2 Feb 2007 00:08:03 +0530
Subject: [Bioperl-l] pdb parser
Message-ID: <76f897dd0702011038v7afe0207gb05465478e026205@mail.gmail.com>

hi need to extract pdb atomic coordinates (1ake), and do certain
calculations.
i am going stepwise:
steps that involved are:
(1) reading the atomic coordinates
(2) read the result in a file.

need to understand how to whole xyz line in another file.
could someone help.
R.


From jason at bioperl.org  Thu Feb  1 13:06:42 2007
From: jason at bioperl.org (sandhya khatal)
Date: Thu, 1 Feb 2007 13:06:42 +0000
Subject: [Bioperl-l] Regarding Bioperl program
Message-ID: <75899ED1-72C6-4272-8CAC-028CF133A0B4@gmail.com>

Respected Sir,
                      I want to do a program which gives dendrogram like
UPGMA a clustering method, but i want this dendrogram by using single
linkage or centroid method.Can u help me for this.U have given the  
code for
tree but i want dendrogram as output by using above any method.

Thanks for anticipating.

Regards,
Sandhya Khatal.


From jason at bioperl.org  Fri Feb  2 00:55:26 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 16:55:26 -0800
Subject: [Bioperl-l] Fwd: Regarding Bioperl program
References: <394d31ee0702010506j4bbd79dck41d5ac2162eaafdd@mail.gmail.com>
Message-ID: <40020502-3421-407D-85EB-24F420AB699C@bioperl.org>

re-forwarding Sandhya's email to the list so the email address is  
visible.

The approach that is coded in bioperl is for distance based data such  
as evolutionary distance of DNA or protein sequences - I assume you  
are talking about clustering expression data? You may want to focus  
on the available literature and toolkits that focus on expression  
data - something BioPerl doesn't deliberately focus on right now.

-jason
Begin forwarded message:

> From: "sandhya khatal" <sandhya.khatal at gmail.com>
> Date: February 1, 2007 5:06:42 AM PST
> To: jason at bioperl.org
> Subject: Regarding Bioperl program
>
> Respected Sir,
>                      I want to do a program which gives dendrogram  
> like
> UPGMA a clustering method, but i want this dendrogram by using single
> linkage or centroid method.Can u help me for this.U have given the  
> code for
> tree but i want dendrogram as output by using above any method.
>
> Thanks for anticipating.
>
> Regards,
> Sandhya Khatal.

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From lzhtom at hotmail.com  Fri Feb  2 03:20:10 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:20:10 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F24A936E35D7C6B9059EE3CC79B0@phx.gbl>


_________________________________________________________________
???? MSN Explorer:   http://explorer.msn.com/lccn/  


From lzhtom at hotmail.com  Fri Feb  2 03:27:39 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Fri, 02 Feb 2007 03:27:39 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
Message-ID: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>

Sorry guys, the former empty mail was sent out by mistake.

I'm using Bio::index::Fasta to index a file containing lots of sequences in 
fasta format. All is fine except one thing.

According to the bioperl tutorial and the documents, the following code 
will make a indexed file:

my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
                                     -write_flag => 1);
    $inx->make_index("test.fasta");

And in another script I can access the indexed file by sayinig

$ENV{BIOPERL_INDEX} = "."; # find index in current directory
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
my $seq=$inx->fetch("ent1001");        #fetch the sequence named ent1001

However, after running the first script, I cannot find a new file 
test.fasta.idx in my current directory. And not surprisingly, when I ran 
the second script, perl told me it couldn't find "test.fasta.idx".

What's going on here?

Thanks a lot!

_________________________________________________________________
?????????????? MSN Messenger:  http://messenger.msn.com/cn  


From jason at bioperl.org  Fri Feb  2 06:24:44 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 1 Feb 2007 22:24:44 -0800
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
References: <BAY110-F9A27ED904272F0D2D4BB6C79B0@phx.gbl>
Message-ID: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>

I don't think BIOPERL_INDEX does anything in the module so that  
documentation is not quite right.  the env variable is used in the  
scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job  
went bad somewhere.

you need to specify the full path you want with -filename - you can  
just prepen the BIOPERL_INDEX to the filename like.
-filename => $ENV{BIOPERL_INDEX}."/$index"

-jason
On Feb 1, 2007, at 7:27 PM, zhihua li wrote:

> Sorry guys, the former empty mail was sent out by mistake.
>
> I'm using Bio::index::Fasta to index a file containing lots of  
> sequences in fasta format. All is fine except one thing.
>
> According to the bioperl tutorial and the documents, the following  
> code will make a indexed file:
>
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
>                                     -write_flag => 1);
>    $inx->make_index("test.fasta");
>
> And in another script I can access the indexed file by sayinig
>
> $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> my $seq=$inx->fetch("ent1001");        #fetch the sequence named  
> ent1001
>
> However, after running the first script, I cannot find a new file  
> test.fasta.idx in my current directory. And not surprisingly, when  
> I ran the second script, perl told me it couldn't find  
> "test.fasta.idx".
>
> What's going on here?
>
> Thanks a lot!
>
> _________________________________________________________________
> ?????????????? MSN Messenger:  http:// 
> messenger.msn.com/cn
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From marian.thieme at lycos.de  Fri Feb  2 10:06:09 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 10:06:09 +0000
Subject: [Bioperl-l] seqDiff
Message-ID: <101051013116870@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/cb3feed1/attachment-0004.html>

From marian.thieme at lycos.de  Fri Feb  2 11:37:05 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Fri, 2 Feb 2007 11:37:05 +0000
Subject: [Bioperl-l] susp. header
Message-ID: <188661178024725@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/d3c3535c/attachment-0004.html>

From lubapardo at gmail.com  Fri Feb  2 14:31:06 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Fri, 2 Feb 2007 15:31:06 +0100
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
Message-ID: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>

Hello, (I am using bioperl-1.5.2_100, linux machine)
I am trying to get the ids of a list of genes using the module
Bio::DB::Query:GenBank. I have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1 !!\n";
my @a1=<READER_1>;
close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives me
the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards,
Luba Pardo


From hlapp at gmx.net  Fri Feb  2 15:44:02 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:44:02 -0500
Subject: [Bioperl-l] susp. header
In-Reply-To: <188661178024725@lycos-europe.com>
References: <188661178024725@lycos-europe.com>
Message-ID: <EE6A34C7-0579-487E-B529-1F82E714793D@gmx.net>

You are sending HTML emails. You should configure your mailer to  
ideally just send plain text. If you really must have fancy formatted  
emails (i.e., HTML-formatted emails), then configure it such that the  
mailer will send a plain text and a HTML version.

(Many spam filters will flag email the body of which consists of only  
an HTML attachment.)

	-hilmar

On Feb 2, 2007, at 6:37 AM, marian thieme wrote:

> why each message I sent to this list is considered to have a susp.  
> header ?
>
> Marian
>
>  Schreiben Sie sich kostenlos ein und erhalten Sie eine Liste mit  
> 20 Singles aus Ihrer Umgebung.Meetic.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 16:03:16 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 11:03:16 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <1170432196.2706.661.camel@localhost.localdomain>

Hi Hilmar,

That is a good idea; when I started down this road, it felt like there
would only be a few things that I might want to allow to be different,
but I think you are right that having one standard implementation that
can be subclassed for legacy systems is a good thing.

Scott


On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> 
> > The second main change was to introduce a -flybase_compat argument  
> > when
> > initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> > (that are compatable with flybase) will be used, but now the default
> > will be to use current standards:
> 
> Just my $0.02 ... obviously, Flybase may be the only organization  
> that uses an 'old style' or any other way not compliant with 'current  
> standards' (presumably SO), but if it's not the only one then this  
> approach won't scale.
> 
> Also, an argument -flybase_compat suggests to the unsuspecting that  
> this is an endorsed flavor of the standard and fine to use for  
> everyone else too.
> 
> If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
> compliant with the standard as we all want it, keep it free from  
> litter caused by usage of old versions of SO, and create a second  
> module fb-chadoxml.pm that inherits from the first and merely  
> overrides a few things so that it works for Flybase. This way, other  
> organizations with similar needs can follow the path and create their  
> own xyz-chadoxml.pm, rather than having to muck around in the  
> chadoxml.pm that comes with the distribution.
> 
> I'm not sure I fully grasp the underlying issue, so I may not make  
> much sense here. Apologies if that's the case ...
> 
> 	-hilmar
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/2488afc4/attachment.sig>

From bosborne11 at verizon.net  Fri Feb  2 15:27:44 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 02 Feb 2007 10:27:44 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
Message-ID: <C1E8C2A0.C967%bosborne11@verizon.net>

Hilmar,

I second your motion, good idea. Let's keep the standard module nice and
clean.

Brian O.


On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

> and create a second
> module fb-chadoxml.pm that inherits from the first and merely
> overrides a few things so that it works for Flybase


From Kevin.M.Brown at asu.edu  Fri Feb  2 15:52:20 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Fri, 2 Feb 2007 08:52:20 -0700
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;
References: <58ff33550702020631l4e7bc59dmabcf8c72fa67a6d5@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B402AABA1C@EX02.asurite.ad.asu.edu>

It looks like you have some problems with the code you posted.

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i < @a1;$i++ ) {
        
# is this necessary as you don't seem to use it anywhere later in your
code.
my @a1_s=split/\s+/,$a1[$i];

# you enclosed the variable in '' which means perl won't evaluate it
# changed the query so that perl can evaluate the variable
my $query_string = ' Homo Sapiens[Organism] AND '.$a1[$i] .' '; 
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Luba Pardo
Sent: Friday, February 02, 2007 7:31 AM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Problem using Bio::DB::Query::GenBank;

Hello, (I am using bioperl-1.5.2_100, linux machine) I am trying to get
the ids of a list of genes using the module Bio::DB::Query:GenBank. I
have the following code:

use Bio::DB::Query::GenBank;
use strict;
use warnings;

open (READER_1,"list.txt") || die "\n I can't open the file READER_1
!!\n"; my @a1=<READER_1>; close (READER_1);

for (my $i=0; $i<=$#a1;$i=$i+1 ) {
        my @a1_s=split/\s+/,$a1[$i];

my $query_string = ' Homo Sapiens[Organism] AND $a1[$i] ';
   my $query = Bio::DB::Query::GenBank->new(-db=>'Protein',

-query=>$query_string
                                            );
   my $count = $query->count;
   my @ids   = $query->ids;


print " gene: $a1[$i] first id is $ids[0]  o no? \n";

I want to tell the program to get all the genes contained in the file
list.txt and to retrieve the ids from GenBank. However the program gives
me the following error:

------------EXCEPTION: Bio::Root::Exception -------------
MSG: Id list has been truncated even after maxids requested
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.1/Bio/Root/Root.pm:359
STACK: Bio::DB::Query::WebQuery::_fetch_ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:236
STACK: Bio::DB::Query::WebQuery::ids
/usr/lib/perl5/site_perl/5.8.1/Bio/DB/Query/WebQuery.pm:200
STACK: query.pl:27
------------------
Is that a problem if I try to use the $a1[$i] to replace the name of the
gene?
I thank before hand for the attention you may pay to this message
Regards, Luba Pardo _______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Fri Feb  2 16:37:49 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 10:37:49 -0600
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170432196.2706.661.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
Message-ID: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>

I was going to suggest maybe allowing one to switch out XML handlers/ 
writers based on the style (ala XML::SAX), but I see that chadoxml  
currently uses XML::Writer and there is no next_seq() implemented.   
Oh well...

chris

On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:

> Hi Hilmar,
>
> That is a good idea; when I started down this road, it felt like there
> would only be a few things that I might want to allow to be different,
> but I think you are right that having one standard implementation that
> can be subclassed for legacy systems is a good thing.
>
> Scott
>
>
> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>
>>> The second main change was to introduce a -flybase_compat argument
>>> when
>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>> cvterms
>>> (that are compatable with flybase) will be used, but now the default
>>> will be to use current standards:
>>
>> Just my $0.02 ... obviously, Flybase may be the only organization
>> that uses an 'old style' or any other way not compliant with 'current
>> standards' (presumably SO), but if it's not the only one then this
>> approach won't scale.
>>
>> Also, an argument -flybase_compat suggests to the unsuspecting that
>> this is an endorsed flavor of the standard and fine to use for
>> everyone else too.
>>
>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>> compliant with the standard as we all want it, keep it free from
>> litter caused by usage of old versions of SO, and create a second
>> module fb-chadoxml.pm that inherits from the first and merely
>> overrides a few things so that it works for Flybase. This way, other
>> organizations with similar needs can follow the path and create their
>> own xyz-chadoxml.pm, rather than having to muck around in the
>> chadoxml.pm that comes with the distribution.
>>
>> I'm not sure I fully grasp the underlying issue, so I may not make
>> much sense here. Apologies if that's the case ...
>>
>> 	-hilmar
> -- 
> ---------------------------------------------------------------------- 
> --
> Scott Cain, Ph. D.                                    
> cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Fri Feb  2 16:45:30 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 2 Feb 2007 11:45:30 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
Message-ID: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>

There must be at least a stub for next_seq(). It may throw a not- 
implemented exception, but it should not just be absent.

	-hilmar

On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:

> I was going to suggest maybe allowing one to switch out XML  
> handlers/writers based on the style (ala XML::SAX), but I see that  
> chadoxml currently uses XML::Writer and there is no next_seq()  
> implemented.  Oh well...
>
> chris
>
> On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
>
>> Hi Hilmar,
>>
>> That is a good idea; when I started down this road, it felt like  
>> there
>> would only be a few things that I might want to allow to be  
>> different,
>> but I think you are right that having one standard implementation  
>> that
>> can be subclassed for legacy systems is a good thing.
>>
>> Scott
>>
>>
>> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
>>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
>>>
>>>> The second main change was to introduce a -flybase_compat argument
>>>> when
>>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
>>>> cvterms
>>>> (that are compatable with flybase) will be used, but now the  
>>>> default
>>>> will be to use current standards:
>>>
>>> Just my $0.02 ... obviously, Flybase may be the only organization
>>> that uses an 'old style' or any other way not compliant with  
>>> 'current
>>> standards' (presumably SO), but if it's not the only one then this
>>> approach won't scale.
>>>
>>> Also, an argument -flybase_compat suggests to the unsuspecting that
>>> this is an endorsed flavor of the standard and fine to use for
>>> everyone else too.
>>>
>>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
>>> compliant with the standard as we all want it, keep it free from
>>> litter caused by usage of old versions of SO, and create a second
>>> module fb-chadoxml.pm that inherits from the first and merely
>>> overrides a few things so that it works for Flybase. This way, other
>>> organizations with similar needs can follow the path and create  
>>> their
>>> own xyz-chadoxml.pm, rather than having to muck around in the
>>> chadoxml.pm that comes with the distribution.
>>>
>>> I'm not sure I fully grasp the underlying issue, so I may not make
>>> much sense here. Apologies if that's the case ...
>>>
>>> 	-hilmar
>> -- 
>> --------------------------------------------------------------------- 
>> ---
>> Scott Cain, Ph. D.                                    
>> cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                      
>> 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cain.cshl at gmail.com  Fri Feb  2 17:02:32 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 12:02:32 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
References: <1170359746.2706.622.camel@localhost.localdomain>
	<675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>
	<1170432196.2706.661.camel@localhost.localdomain>
	<64E2727F-2052-417B-878E-0F7135A72FBC@uiuc.edu>
	<3A3FF1B0-129A-410D-9DE2-D1CB28015C9A@gmx.net>
Message-ID: <1170435752.2706.676.camel@localhost.localdomain>

Ah, I'll go ahead and add one, though it will just throw an exception
because this is a write-only adapter.

Scott


On Fri, 2007-02-02 at 11:45 -0500, Hilmar Lapp wrote:
> There must be at least a stub for next_seq(). It may throw a not- 
> implemented exception, but it should not just be absent.
> 
> 	-hilmar
> 
> On Feb 2, 2007, at 11:37 AM, Chris Fields wrote:
> 
> > I was going to suggest maybe allowing one to switch out XML  
> > handlers/writers based on the style (ala XML::SAX), but I see that  
> > chadoxml currently uses XML::Writer and there is no next_seq()  
> > implemented.  Oh well...
> >
> > chris
> >
> > On Feb 2, 2007, at 10:03 AM, Scott Cain wrote:
> >
> >> Hi Hilmar,
> >>
> >> That is a good idea; when I started down this road, it felt like  
> >> there
> >> would only be a few things that I might want to allow to be  
> >> different,
> >> but I think you are right that having one standard implementation  
> >> that
> >> can be subclassed for legacy systems is a good thing.
> >>
> >> Scott
> >>
> >>
> >> On Fri, 2007-02-02 at 10:09 -0500, Hilmar Lapp wrote:
> >>> On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:
> >>>
> >>>> The second main change was to introduce a -flybase_compat argument
> >>>> when
> >>>> initializing the Bio::SeqIO writer, so that 'old style' cv and  
> >>>> cvterms
> >>>> (that are compatable with flybase) will be used, but now the  
> >>>> default
> >>>> will be to use current standards:
> >>>
> >>> Just my $0.02 ... obviously, Flybase may be the only organization
> >>> that uses an 'old style' or any other way not compliant with  
> >>> 'current
> >>> standards' (presumably SO), but if it's not the only one then this
> >>> approach won't scale.
> >>>
> >>> Also, an argument -flybase_compat suggests to the unsuspecting that
> >>> this is an endorsed flavor of the standard and fine to use for
> >>> everyone else too.
> >>>
> >>> If Flybase is idiosyncratic in this way, why not make chadoxml.pm
> >>> compliant with the standard as we all want it, keep it free from
> >>> litter caused by usage of old versions of SO, and create a second
> >>> module fb-chadoxml.pm that inherits from the first and merely
> >>> overrides a few things so that it works for Flybase. This way, other
> >>> organizations with similar needs can follow the path and create  
> >>> their
> >>> own xyz-chadoxml.pm, rather than having to muck around in the
> >>> chadoxml.pm that comes with the distribution.
> >>>
> >>> I'm not sure I fully grasp the underlying issue, so I may not make
> >>> much sense here. Apologies if that's the case ...
> >>>
> >>> 	-hilmar
> >> -- 
> >> --------------------------------------------------------------------- 
> >> ---
> >> Scott Cain, Ph. D.                                    
> >> cain.cshl at gmail.com
> >> GMOD Coordinator (http://www.gmod.org/)                      
> >> 216-392-3087
> >> Cold Spring Harbor Laboratory
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/9acaa3c3/attachment.sig>

From peili at morgan.harvard.edu  Fri Feb  2 15:56:56 2007
From: peili at morgan.harvard.edu (Peili Zhang)
Date: Fri, 02 Feb 2007 10:56:56 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <C1E8C2A0.C967%bosborne11@verizon.net>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
Message-ID: <1170431816.6583.47.camel@jacks>

i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
because i wrote it for fb's data loading task. no need to worry about
flybase compatibility in making the module generic. in fact, at flybase,
i tweak the module frequently to make it work for different scenarios.

cheers,
peili
 
On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> Hilmar,
> 
> I second your motion, good idea. Let's keep the standard module nice and
> clean.
> 
> Brian O.
> 
> 
> On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> 
> > and create a second
> > module fb-chadoxml.pm that inherits from the first and merely
> > overrides a few things so that it works for Flybase
> 
> 
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> 


From cain.cshl at gmail.com  Fri Feb  2 18:05:47 2007
From: cain.cshl at gmail.com (Scott Cain)
Date: Fri, 02 Feb 2007 13:05:47 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170431816.6583.47.camel@jacks>
References: <C1E8C2A0.C967%bosborne11@verizon.net>
	<1170431816.6583.47.camel@jacks>
Message-ID: <1170439549.2706.683.camel@localhost.localdomain>

Hi Peili,

A little bit ago I checked in Bio::SeqIO::flybase_chadoxml that is
fairly simple.  My suggestion is that when you make tweaks for different
scenarios, that you turn the things you are tweaking into methods in
BSIO::chadoxml and then override them in flybase_chadoxml (and commit at
least the chadoxml module) to make it more flexible when other people
have similar scenarios.

Scott


On Fri, 2007-02-02 at 10:56 -0500, Peili Zhang wrote:
> i 'third' Hilmar's opinion. flybase's fingerprint is shown in the module
> because i wrote it for fb's data loading task. no need to worry about
> flybase compatibility in making the module generic. in fact, at flybase,
> i tweak the module frequently to make it work for different scenarios.
> 
> cheers,
> peili
>  
> On Fri, 2007-02-02 at 10:27, Brian Osborne wrote:
> > Hilmar,
> > 
> > I second your motion, good idea. Let's keep the standard module nice and
> > clean.
> > 
> > Brian O.
> > 
> > 
> > On 2/2/07 10:09 AM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
> > 
> > > and create a second
> > > module fb-chadoxml.pm that inherits from the first and merely
> > > overrides a few things so that it works for Flybase
> > 
> > 
> > 
> > -------------------------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services, security?
> > Get stuff done quickly with pre-integrated technology to make your job easier.
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmod-schema at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> > 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070202/a6d23204/attachment.sig>

From cjfields at uiuc.edu  Fri Feb  2 20:33:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 2 Feb 2007 14:33:46 -0600
Subject: [Bioperl-l] seqDiff
In-Reply-To: <101051013116870@lycos-europe.com>
References: <101051013116870@lycos-europe.com>
Message-ID: <C752CE9D-61A7-4DF2-958E-7162723D0BA9@uiuc.edu>

Judging by the code you'll have to recreate the SeqDiff while  
iterating through various alleles; there is no method to remove  
particular variants or purge them (at least I couldn't find one).

I also noticed SeqDiff doesn't support deletions/insertions either;  
using a null allele (no seq) or leaving out either the mutant or  
original allele leads to errors.  I'll look into the latter, and I  
may try to add a method to at least purge variants and reset dna_mut().

chris

On Feb 2, 2007, at 4:06 AM, marian thieme wrote:

> HI,
>
> is there a way to put out all mutated sequences of a seqdiff object ?
> Suppose I add some variants via:
>
> $dnamut->add_Allele($a2);
> $dnamut->add_Allele($a3);
> $seqDiff->add_Variant($dnamut);
>
> and afterwards want to access the alternative sequences via
> $seqDiff->dna_mut()
>
> which allele is choosen when using dna_mut(), respective can I  
> control to access the first or the second alternate sequence ?
> If yes, how can I do this ?
>
> Regards,
> Marian
>
> Brauchst du eine Schocktherapie gegen den Alltag? L?chle! Die warme  
> Sonne von Ibiza und ein bisschen Sand vom Mittelmeer ist die  
> Therapie, die du brauchst. Plan deinen Urlaub in Spanien auf  
> www.spain.info
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From MEC at stowers-institute.org  Fri Feb  2 21:47:08 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 2 Feb 2007 15:47:08 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature treamtent of tags and
	annotations
Message-ID: <CED81D34E37D5043A1211565277A51E50768EDB3@exchkc02.stowers-institute.org>

Lincoln,
 
I don't think that adding this directive is a good idea after all
either.
 
But, I see that you remap the ID= to a load_id attribute which is
preserved in the Bio::DB::SeqFeatureStore database.
 
And then it gets squelched during GFF production by
NormalizedFeature::format_attributes.
 
However, if ID is prone to clashes, then certainly simply renaming the
attribute to be load_id does not preclude clashes from happening, and
only courts disaster.  Don't you think?
 
I'm a little blurry on the GFF3Loader, but it looks like you're using
load_id to facilitate loading parent/child features out of order.  Is
that right?  If so, I suggest you delete all load_id features
immediately after performing a load.  It has not further use.
 
Or, you might consider instead of `round-trip-ids` directive, rather,
give the GFF3Loader  an IDAttribute option which would allow the use of
the loader to preserve the ID values, but to use a named
 
In my case, munging flybase gff,  I would then use it like this:
 
bp_seqfeature_load.PLS --fast --IDAttribute flybaseID
 
which would preserve the ID values in the database but under the
FlybaseID attribute for features so loaded.
 
---------------------------------------------
 
On a related topic:
I just committed this patch to Bio::DB::SeqFeature::NormalizedFeature

_create_subfeatures : ensure that subfeatures get the `source` of their
parent

While doing this I wonder: what is the -class that subfeatures are
getting from their parent...??? I left it in place. Please advise! Fix
my thinking....

----------------------------------------------

Further, I observe that Bio::Graphics::FeatureBase::new handles the
-segments option is to call add_segment.  So, when I create a new
Bio::DB::SeqFeature with -segments [[ 100,200 ] [300,400]], the
-segments option gets handled by Bio::Graphics::FeatureBase::new, which,
as mentioned, calls add_segment. The surprising thing to me when thrying
to trace through the class modules and understand what is going on is
that what gets run at this point is not
Bio::Graphics::FeatureBase::add_segment, but rather
Bio::DB::SeqFeature::add_segment, whose semantics is different in at
least one regard, namely, that it does not set the start and stop of the
parent feature from the min and max of the segments.

I have committed a patch to Bio::Graphics::FeatureBase with a comment to
this effect, and have also patched it's add_segment method to copy the
parent's source into the segment.

I hope my commits and suggestions further the cause.  Let me know if
not!
 
-- Malcolm
 

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Tuesday, January 30, 2007 4:46 PM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature treamtent of tags and
annotations
	
	
	I've fixed the first issue in CVS. Sorry for the inconsistency.
add_tag_value(), delete_tag_value() and get_Annotations() now all work
as expected.
	
	The problem with the ID column is that it is supposed to be
LOCAL to the GFF3 file and is not intended to be stored in the database.
In contrast, Name can survive roundtripping. Perhaps the thing to do is
to add a flag to the GFF3 file that turns on ID round-tripping, e.g.
	
	##round-trip-ids: 1
	
	If you like this idea, I can implement it.
	
	Lincoln
	
	
	On 1/29/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln,
		 
		Thanks for your suggestions on approach to my problems
augmenting Flybase annotation.  I am trying to follow them and finding
the following oddities
		 
		The first issue relates to the intermix of 'annotations'
and 'tag values'.  I find that Bio::DB::SeqFeature implements some of
the 'tag' methods and some of the 'Annotation' methods.  Here is a perl
one-liner that shows values stored using add_tag_value are not retreived
with get_tag_values, but rather with get_Annotations.
		 
		> perl -MBio::DB::SeqFeature -e 'my
$f=Bio::DB::SeqFeature->new; $f->add_tag_value("x",666); print
"get_tag_values:\t" . $f->get_tag_values("x") . "\nget_Annotations:\t" .
$f->get_Annotations("x");'
		 
		whose output is:
		get_tag_values: 
		get_Annotations:    666
		 
		Tracing this shows me that this results from the fact
that:
		 
		
		Bio::DB::SeqFeature uses of Bio::Graphics::FeatureBase
(via Bio::DB::SeqFeature::NormalizedFeature) which does not support
-tags in ->new but rather -attributes, viz:
		 
		
		  -attributes   a hashref of tag value attributes, in
which the key is the tag
		                  and the value is an array reference of
values
		 
		
		And though Bio::Graphics::FeatureBase purports to
implement Bio::SeqFeatureI, it only partially implements the  'tag'
methods (now deprecated and relegated to Bio::AnnotatableI).  In
particular, the '*' methods Bio::SeqFeatureI are not implemented in
Bio::Graphics::FeatureBase 

		  has_tag
		*  add_tag_value
		  get_tag_values
		  get_all_tags
		*  remove_tag
		  get_tagset_values
		  get_Annotations

		As a result, add_tag_value and remove_tag are inherited
from different modules whose understanding of tags is not the same!

		This one-liner :

		>perl -MClass::ISA -MClass::Inspector
-MBio::DB::SeqFeature -e 'my @c =
Class::ISA::self_and_super_path("Bio::DB::SeqFeature"); foreach my $fn
qw(add_tag_value get_tag_values) {print "\n$fn:\t", join "\t", (grep
{Class::Inspector->function_exists($_, $fn)} @c)}'

		confirms that they are defined in different packages,
namely:

		add_tag_value: Bio::AnnotatableI 
		get_tag_values: Bio::Graphics::FeatureBase
Bio::AnnotatableI

		
		Proposed solution...  hmmmm ..... I dunno.... maybe the
following patch to Bio::Graphics::FeatureBase->add_tag_value :
		 
		sub add_tag_value {
		  my ($self,$tag, at vals) = @_;
		  push @{$self->{attributes}{$tag}}, @vals;
		}
		
		
		It fixes my use case for now but I'm still concerned and
confused about this variety of methods.  
		 
		Suggestions?
		 

------------------------------------------------------------------------
-

		Also, I think that any "ID" in column 9 of GFF3 float
file should be preserved through a round-trip through a
Bio::DB::SeqFeature store, but this is not yet possible since any ID
attribute in GFF3 column 9 is being lost by GFF3Loader, causing me to
locally patch GFF3Loader::handle_feature method to add the following:

		  # mec at stowers-institute.org
<mailto:mec at stowers-institute.org>  , wondering why not all attributes
are
		  # carried forward, adds ID tag in particular service
of
		  # round-tripping ID, which, though present in database
as load_id
		  # attribute, was getting lost as itself
		  $unreserved->{ID}= $reserved->{ID}     if exists
$reserved->{ID}; 

		Poised to patch.... what d'you think?

		Malcolm Cook
		Stowers Institute for Medical Research - Kansas City,
Missouri
		  

________________________________

			From: lincoln.stein at gmail.com [mailto:
lincoln.stein at gmail.com <mailto:lincoln.stein at gmail.com> ] On Behalf Of
Lincoln Stein
			Sent: Tuesday, December 19, 2006 3:58 PM
			To: Cook, Malcolm
			Cc: bioperl list; lstein at cshl.org
			Subject: Re: bp_seqfeature_load /
Bio::DB::SeqFeature::Store::GFF3Loader problems augmenting Flybase
annotation
			
			
			Hi Malcom,
			
			Your second guess was right. The use case of
augmenting an existing gene with additional splice forms isn't provided
for. You can get the functionality by making direct calls to
Bio::DB::SeqFeature::Store methods:
			
			my @genes =
$db->get_features_by_name('FBgn0017545');
			@genes == 1 or die "Didn't get exactly one
gene";
			my $parent = $genes[0];
			
			my $parent = $genes[0];
			my $chr    = $parent->seq_id;
			my $start  = $parent->start;
			my $end    = $parent->end;
			my $strand = $parent->strand;
			
			my $new_splice_form =
$db->new_feature(-primary_tag => 'mRNA',
			                       -source      => 'added',
			                       -seq_id   => '4r',
			                       -strand   => $strand,
			                       -start    => $start+10,
			                       -end      => $end,
			                       );
			$parent->add_SeqFeature($new_splice_form);
			
			for my $pos
([$start+10,$start+100],[$start+200,$end]) {
			  my ($e_start,$e_end) = @$pos;
			  my $exon =
Bio::DB::SeqFeature->new(-primary_tag => 'exon',
			                                      -store
=> $db,
			                      -seq_id      => '4r',
			                      -strand     => $strand,
			                      -start       => $e_start,
			                      -end         => $e_end);
			  $new_splice_form->add_SeqFeature($exon);
			}
			
			I found a bug in updating the seqfeature
database when I wrote this script, so you'll have to get the latest
biperl live. I think you can use this to write a splice form updating
script.
			
			In order to support the idea of adding new
splice forms to an existing gene using the GFF3 format, I will have to
either modify the loader, or write a separate script (probably better to
do the latter). It shouldn't be hard if you'd like to give it a try.
			
			Lincoln
			
			
			On 12/19/06, Cook, Malcolm
<MEC at stowers-institute.org <mailto:MEC at stowers-institute.org>  > wrote: 

				Lincoln and fellow Bio::DB::SeqFeature
travelers,
				
				I find that using bp_seqfeature_load.PLS
to load subfeatures of genes
				already loaded using
bp_seqfeature_load.PLS fails with
				
				------------- EXCEPTION  ------------- 
				MSG: FBgn0017545 doesn't have a primary
id
				STACK
	
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree_in_tables
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:682
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::build_object_tree 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:663
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:372
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load_fh 
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:345
				STACK
Bio::DB::SeqFeature::Store::GFF3Loader::load
	
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/GFF3Loader.pm:242
				STACK toplevel
	
/home/mec/cvs/bioperl-live/scripts/Bio-SeqFeature-Store/bp_seqfeature_lo

				ad.PLS:76
				
				Where FBgn0017545 is the ID of a gene
previously loaded.
				
				I am unsure how to remedy my situation
and welcome any advise on correct
				or improved approach to my problem.
				
				Here's some detail if it helps.  I am
developing a pipeline to design a 
				microarray probes capable of
distinguishing among splice variants in
				drosophila (using latest Flybase
dmel_r5.1 annotation).  So I
				
				1) load a filtered selection of Flybase
annotation using
				bp_seqfeature_load.  (for testing
purposes, I am using a single gene's 
				worth of annotation, FBgn0017545.gff,
attached).  This is done as
				follows:
				
				        > bp_seqfeature_load.PLS
--create FBgn0017545.gff
				
				2) analyze all the genes in the
database, and create GFF3 output each 
				feature of which has a 'Parent' that is
a previously loaded gene (i.e.
				FBgn0017545).  (These features represent
the unique introns, splice
				sites, and exonic design targets. Output
of this analysis,
				FBgn0017545_matd.gff, is also attached) 
				
				3) load these analysis results into the
same database, as follows:
				
				        > bp_seqfeature_load.PLS
FBgn0017545_matd.gff
				
				It is at this point that I get the above
error.
				
				However, I don't get any error and the
data loads fine if I load the two
				files together, as follows: 
				
				        > bp_seqfeature_load.PLS
--create <(cat FBgn0017545.gff
				FBgn0017545_matd.gff)
				
				So, I suspect that either I am
misunderstanding when/how to use
				bp_seqfeature_load.PLS or else this use
case has not yet arisen and must 
				be provided for somehow.
				
				I am running against bioperl-live
				
				Thanks for your thoughts and assistance,
				
				Malcolm Cook
				Database Applications Manager -
Bioinformatics
				Stowers Institute for Medical Research -
Kansas City, Missouri 
				
				
			-- 
			Lincoln D. Stein
			Cold Spring Harbor Laboratory
			1 Bungtown Road
			Cold Spring Harbor, NY 11724
			(516) 367-8380 (voice)
			(516) 367-8389 (fax)
			FOR URGENT MESSAGES & SCHEDULING, 
			PLEASE CONTACT MY ASSISTANT, 
			SANDRA MICHELSEN, AT michelse at cshl.edu
<mailto:michelse at cshl.edu>  


	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From neha_bafs at yahoo.co.in  Mon Feb  5 17:59:03 2007
From: neha_bafs at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 17:59:03 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From jason at bioperl.org  Mon Feb  5 18:10:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 10:10:42 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
References: <20070205175903.30379.qmail@web8701.mail.in.yahoo.com>
Message-ID: <46219DCD-8C6E-4DBE-82F2-D4B58207AD54@bioperl.org>

you want to write the TREE out not the TREE WRITER.

$treeout->write_tree($tree)

not
$treeout->write_tree($treeout);

On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:

> Hello everyone,
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
> /*------------------------------------------------------------*/
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
> use Bio::TreeIO;
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
> exit 0;
>
>
> /*------------------------------------------------------------*/
>
> Running the script through command line:
> Gives the following error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Questions:-
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 18:05:26 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 18:05:26 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
Message-ID: <288335.22352.qm@web8412.mail.in.yahoo.com>

Hello everyone,

I am trying to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :

/*------------------------------------------------------------*/

$ cat nexus.pl
#!/usr/bin/perl -w

use Bio::TreeIO;

($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }

exit 0;


/*------------------------------------------------------------*/

Running the script through command line:
Gives the following error:

$  ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23

--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Questions:-

1. Please let me know if I am using the correct version.
If not, please point me to the latest one.

2. Provided that the version I am using is the right one, please let me know what is wrong with the script.

Thank  you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From hlapp at duke.edu  Fri Feb  2 15:09:57 2007
From: hlapp at duke.edu (Hilmar Lapp)
Date: Fri, 2 Feb 2007 10:09:57 -0500
Subject: [Bioperl-l] [Gmod-schema] beginning work on SeqIO::chadoxml
In-Reply-To: <1170359746.2706.622.camel@localhost.localdomain>
References: <1170359746.2706.622.camel@localhost.localdomain>
Message-ID: <675F525B-D3EB-4C0A-A777-7683F2A8F823@duke.edu>


On Feb 1, 2007, at 2:55 PM, Scott Cain wrote:

> The second main change was to introduce a -flybase_compat argument  
> when
> initializing the Bio::SeqIO writer, so that 'old style' cv and cvterms
> (that are compatable with flybase) will be used, but now the default
> will be to use current standards:

Just my $0.02 ... obviously, Flybase may be the only organization  
that uses an 'old style' or any other way not compliant with 'current  
standards' (presumably SO), but if it's not the only one then this  
approach won't scale.

Also, an argument -flybase_compat suggests to the unsuspecting that  
this is an endorsed flavor of the standard and fine to use for  
everyone else too.

If Flybase is idiosyncratic in this way, why not make chadoxml.pm  
compliant with the standard as we all want it, keep it free from  
litter caused by usage of old versions of SO, and create a second  
module fb-chadoxml.pm that inherits from the first and merely  
overrides a few things so that it works for Flybase. This way, other  
organizations with similar needs can follow the path and create their  
own xyz-chadoxml.pm, rather than having to muck around in the  
chadoxml.pm that comes with the distribution.

I'm not sure I fully grasp the underlying issue, so I may not make  
much sense here. Apologies if that's the case ...

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From jason at bioperl.org  Mon Feb  5 19:43:09 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 11:43:09 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <209988.63723.qm@web8715.mail.in.yahoo.com>
References: <209988.63723.qm@web8715.mail.in.yahoo.com>
Message-ID: <9E477447-67F5-46CA-BCC1-47BB4170EC76@bioperl.org>

please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your  
code so I don't know if you still have a typo.

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file => $filein, -format => $format);
my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");

while( my $t = $in->next_tree ) {
  $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

> Thank you very much for the reply.
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
> --------------------------------------
>
> Please help me out with this script.
>
> Thank you.
> Regards,
> Neha
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
> $treeout->write_tree($tree)
>
> not
> $treeout->write_tree($treeout);
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
> Hello everyone,
>
>
> I am trying to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
> /*------------------------------------------------------------*/
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
> use Bio::TreeIO;
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE, nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
> exit 0;
>
>
>
>
> /*------------------------------------------------------------*/
>
>
> Running the script through command line:
> Gives the following error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
> Questions:-
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From nehadnahar at yahoo.co.in  Mon Feb  5 19:58:08 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Mon, 5 Feb 2007 19:58:08 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <99196.23114.qm@web8711.mail.in.yahoo.com>
Message-ID: <36024.1212.qm@web8405.mail.in.yahoo.com>


Hi,
Thank you for the code.
I tried it but I still get the same exception.

------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus1.pl:18


Please find attached the perl file(nexus.pl).


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm

Please let me know if I am using the correct version.If not, please point me to the latest one.

Thank you.
Regards,
nnahar


Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list when asking a question or followup.

Sorry I don't know what you are doing wrong - you didn't resend your code so I don't know if you still have a typo.  

This code works fine for me

use Bio::TreeIO;
use strict;
my ($filein,$fileout) = @ARGV;
my ($format,$oformat) = qw(newick nexus);
my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");


while( my $t = $in->next_tree ) { 
 $out->write_tree($t);
}


On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:

Thank you very much for the reply.


I fixed the code as per your suggestion,but now am getting a different error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


-------------  EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Please help me out with this script.


Thank you.
Regards,
Neha


Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE out not the TREE WRITER.


$treeout->write_tree($tree) 


not 
$treeout->write_tree($treeout);


On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:


Hello everyone,


I am trying  to convert newick tree to nexus format.
Using the script (refered from and email from George dated Wed Sep 22 11:52:47 EDT 2004) :


/*------------------------------------------------------------*/


$ cat nexus.pl
#!/usr/bin/perl -w


use Bio::TreeIO;


($NEWICKFILE, $NEXUSFILE) = @ARGV;
print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
my $treeio = new Bio::TreeIO(-format => 'newick', -file   => "$NEWICKFILE");
my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => ">$NEXUSFILE");
while(my $tree = $treeio->next_tree) {
        $treeout->write_tree($treeout);
    }


exit 0;


/*------------------------------------------------------------*/


Running the script through command line:
Gives the following error:


$ ./nexus.pl mrp-input.txt nexus.out
newickfile=mrp-input.txt, nexusfile=nexus.out


------------- EXCEPTION  -------------
MSG: Cannot call method write_tree on Bio::TreeIO object must use a subclass
STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/5.8.8/Bio/TreeIO/nexus.pm:170
STACK toplevel ./nexus.pl:23


--------------------------------------


Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/~sendu/bioperl/Bio/TreeIO.pm


Questions:-


1. Please let me know if I am using the correct version.
If not, please point me to the latest one.


2. Provided that the version I am using is the right one, please let me know what is wrong with the script.


Thank you.
Regards,
Neha.


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"


---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441


http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
     

---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
 

 --
Jason Stajich 
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441

http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/
 

-Neha Nahar
  " Work  for cause and not for applause, live to express and not to impress !"         

---------------------------------
  Here?s a new way to find what you're looking for - Yahoo! Answers 


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nexus.pl
Type: application/x-perl
Size: 811 bytes
Desc: 1389215665-nexus.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070205/c6453dcf/attachment.pl>

From jason at bioperl.org  Mon Feb  5 22:15:52 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 5 Feb 2007 14:15:52 -0800
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <36024.1212.qm@web8405.mail.in.yahoo.com>
References: <36024.1212.qm@web8405.mail.in.yahoo.com>
Message-ID: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>

Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich <jason at bioperl.org> wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>  				
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> <nexus.pl>


From lzhtom at hotmail.com  Tue Feb  6 03:31:56 2007
From: lzhtom at hotmail.com (zhihua li)
Date: Tue, 06 Feb 2007 03:31:56 +0000
Subject: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
In-Reply-To: <CFD213B9-5195-450F-80ED-E956EEF50F59@bioperl.org>
Message-ID: <BAY110-F28F9C9145AC24F2D0E0D34C79F0@phx.gbl>

Thanks a lot!

After checking out the script bp_index, I changed the syntax to:
 my $inx = Bio::Index::Fasta->new("test.fasta.idx", 'WRITE');
$inx->make_index("test.fasta");


Now I have a index file test.fasta.idx in my current directory. And I can 
use it in my later script
by saying 
 my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");

So now everything is OK. But I don't understand why I have to use that 
syntax. And why the syntax provided in the document didn't work?


>From: Jason Stajich <jason at bioperl.org>
>To: zhihua li <lzhtom at hotmail.com>
>CC: bioperl-l at lists.open-bio.org, arokfl at yahoo.com
>Subject: Re: [Bioperl-l] Bio::index::Fasta- where's the indexed file?
>Date: Thu, 1 Feb 2007 22:24:44 -0800
>
>I don't think BIOPERL_INDEX does anything in the module so that
>documentation is not quite right.  the env variable is used in the
>scripts/index/bp_index and bp_fetch scripts so maybe a cut+paste job
>went bad somewhere.
>
>you need to specify the full path you want with -filename - you can
>just prepen the BIOPERL_INDEX to the filename like.
>-filename => $ENV{BIOPERL_INDEX}."/$index"
>
>-jason
>On Feb 1, 2007, at 7:27 PM, zhihua li wrote:
>
> > Sorry guys, the former empty mail was sent out by mistake.
> >
> > I'm using Bio::index::Fasta to index a file containing lots of
> > sequences in fasta format. All is fine except one thing.
> >
> > According to the bioperl tutorial and the documents, the following
> > code will make a indexed file:
> >
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx",
> >                                     -write_flag => 1);
> >    $inx->make_index("test.fasta");
> >
> > And in another script I can access the indexed file by sayinig
> >
> > $ENV{BIOPERL_INDEX} = "."; # find index in current directory
> > my $inx = Bio::Index::Fasta->new(-filename => "test.fasta.idx");
> > my $seq=$inx->fetch("ent1001");        #fetch the sequence named
> > ent1001
> >
> > However, after running the first script, I cannot find a new file
> > test.fasta.idx in my current directory. And not surprisingly, when
> > I ran the second script, perl told me it couldn't find
> > "test.fasta.idx".
> >
> > What's going on here?
> >
> > Thanks a lot!
> >
> > _________________________________________________________________
> > ???????????????????????????????????????? MSN Messenger:  http://
> > messenger.msn.com/cn
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>--
>Jason Stajich
>Miller Research Fellow
>University of California, Berkeley
>lab: 510.642.8441
>http://pmb.berkeley.edu/~taylor/people/js.html
>http://fungalgenomes.org/
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

_________________________________________________________________
???????? MSN Explorer:   http://explorer.msn.com/lccn/  


From johnston at biochem.ucl.ac.uk  Tue Feb  6 11:52:08 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Tue, 6 Feb 2007 11:52:08 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
Message-ID: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>

Hello,

I've just joined the list - I'm a Bioinformatics PhD student at Essex
University doing transcriptomics-related things. Mainly microarray
analysis and more recently looking at RNA structure prediction.

I was thinking about having a go at writing a bioperl-run wrapper around
some of the structure prediction stuff, but according to the wiki this is
being done already (at least for the Vienna tools). I spoke to Albert
Vilella at the EBI the other day and he said Chris Fields was the man to
speak to. So could he (or anyone) let me know what the current state of
RNA structure prediction tools in bioperl is?

Cheers,
Cass xx


From marian.thieme at lycos.de  Tue Feb  6 13:52:10 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Tue, 06 Feb 2007 14:52:10 +0100
Subject: [Bioperl-l] dbSNP
Message-ID: <45C8880A.7030702@lycos.de>

Hello all,

I looked for a method/class/function/script in the docuementation which
provides the opportunity to generate a snp assay suited to submit to
dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/
http://www.ncbi.nlm.nih.gov/projects/SNP/how_to_submit.html)
I didnt find those code, but I recognized that there is at least a xml
parser to read dbSNP reports.

Does anybody know if there is also an output class to generate dbSNP
reports ? I could imagine, that at least the snp assay section is worth
to be implemented.

This example is given by ncbi:


TYPE:SNPASSAY
HANDLE:WI
BATCH: 1.98
MOLTYPE:Genomic
METHOD:RESEQ
SYN NAMES:WI-SNP,DnaId,MapDna
COMMENT:
Here is where some public comment that applies to the entire
batch of SNPS could be put.
PRIVATE:
Here is where a note to NCBI regarding processing that would
not be seen by the outside, could be put.
Note that these are is not exactly real SNPs, as
the data were modified.
||
SNP:WI|WIAF-1234567
SYNONYM:EST4291092,EST8291092,EST7291092
ACCESSION:H30533
LENGTH:101
5'_ASSAY:GGCAGGGAAGGAAAATCCTAGGGNCAGCATTGGGGAGGGGGGGACTCTG
OBSERVED:C/T
3'_ASSAY:TAAATTTATTGGGCAACAGGCTGCAGGTGAGGGGGCTGACAGGAGGAGGGA
||
SNP:WI|WIAF-1722
SYNONYM:STS-T17494,STS-T17494,STS-T17494
ACCESSION:T17494
LENGTH:269
5'_FLANK:CTTTCCCTCATCCCCTCTTCCACCACACCATCCCGGAACAAGTGCTCCAGGATT
5'_ASSAY:CCCTGCCCACTGGCCATTTTGGAGTGTGTCC
OBSERVED:A/T
3'_ASSAY:GTGGGTAGCAATGTGGAAACCACCAGGGCCTTTGTGGAGAAAA
3'_FLANK:TGGAGGGGGTTGAGGGAGTCCCAGGAGGGGCTTATTTGAGGGCCTTTGCCACTT
    GCTCATAGGCGAGCTCGATCTCCTCATCATCTGGACAGGTGGAAGCGAATTCTT
    CCCGGGCGTAGGCATTGCTCAAGTACCGAT
||


Regards,
Marian

P.S. this is not in contradiction to my first request about the brackets 
notation. We need both formats.


From cjfields at uiuc.edu  Tue Feb  6 16:45:36 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 6 Feb 2007 10:45:36 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
Message-ID: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>

On Feb 6, 2007, at 5:52 AM, Caroline Johnston wrote:

> Hello,
>
> I've just joined the list - I'm a Bioinformatics PhD student at Essex
> University doing transcriptomics-related things. Mainly microarray
> analysis and more recently looking at RNA structure prediction.
>
> I was thinking about having a go at writing a bioperl-run wrapper  
> around
> some of the structure prediction stuff, but according to the wiki  
> this is
> being done already (at least for the Vienna tools). I spoke to Albert
> Vilella at the EBI the other day and he said Chris Fields was the  
> man to
> speak to. So could he (or anyone) let me know what the current  
> state of
> RNA structure prediction tools in bioperl is?
>
> Cheers,
> Cass xx
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Actually, the only RNA tool wrappers I have made are ones for ERPIN,  
RNAMotif, and Infernal (the only one in bioperl-run CVS at this time  
is RNAMotif).  I am planning on writing up wrappers for Vienna,  
UNAFold, and a few others but haven't really started in.  Here's  
where I'm at right now...

I am writing up a new set of AnnotationI classes which positionally  
describe data (Meta) which I hope will help deal with this stuff.   
These would be similar in nature to Heikki's Bio::Seq::Meta classes:

http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html

I would use a regular Bio::SeqI and store the structural data and  
anything else (such as energy calculations, etc) as Annotation  
objects in an AnnotationCollection, and then write up a series of  
SeqIO modules to get data into/out of the designated structure  
formats, like UNAfold ct, RNAML, and so on.  Each sequence would then  
be capable of holding more than one structural Annotation (i.e. could  
represent different folding pathways, alternative RNA folds, and so on).

At this point I represent the data as an array of hashes where $array 
[0] is nt 1 and the hash keys indicate the type of interaction, base  
interacted with, etc.  The text representation would be as simple  
Eddy WUSS (Rfam-like) format by default, which is capable of  
representing some complex data (pseudoknots, for instance), is  
compact, and is documented (via the Infernal manual).  Tags will  
probably switch to more ontologically relevant terms (probably from  
RNAML or RNA Ontology), but in general it is something like this:

[
  {'interaction' => 'WC',
    'base'  => 24},
  {'interaction' => 'WC',
    'base'  => 23},
  {'interaction' => 'SS'},
...
]

In this implementation every seq position would have some kind of  
interaction designation, though that's open for debate as it could  
just be simple text or undef for single-stranded regions.

This is also scalable based on complexity of the data: if one wanted  
to add tert/quaternary interactions, location, base modifications,  
remote sequence interactions, etc., extra key/value pairs could be  
used.  Comversely, if one only wanted sec structure (for drawing RNA  
structures, for example), then only that data would be parsed.

If you (or anyone listening) have any suggestions I would greatly  
appreciate them.

chris


From johnsonm at gmail.com  Tue Feb  6 23:53:49 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 6 Feb 2007 17:53:49 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
Message-ID: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>

Okay, I need to get something going for a project I'm working on.  Options:

1) Stick it all in one module:  This can get a bit ugly, as Glimmer, as
opposed to GlimmerM and GlimmerHMM, does not explicitly identify itself in
the prediction report.  You can pick up on some unique things in the output
file, but you don't know what you've got until you're actually parsing it.
Unless you require a format argument up front, then you can split the
parsing code up into different functions.
2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/Glimmer3.
With or without an abstract dispatch front end.

I suppose at this point, after getting my hands dirty, I'd prefer 1), with
an explicit -format => Glimmer2/3/M/HMM arg required in the constructor.
Though I'm not opposed to 2) if that is what it takes to get it into
Bioperl.

If we can achieve some sort of consensus without too much bloodshed, I'll
shoot y'all some patches and we can consider this issue checked off the
list.

On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     I think it's going to be at least two modules, one for the
> prokaryotic stuff and one for the eukaryotic.  And really, the
> prokaryotic stuff is different enough to warrant two modules. So three
> different parsers.  Could do it in one, but it would be ugly and
> nasty.  However, this does not preclude three parsers and one abstract
> interface, which is your excellent suggestion.
>     Oh, and excuse me, but I have a bit of a rant here, after dealing
> with parsers and pipelines for the last few months.  Parsers should
> not load the whole input file into RAM to parse it.  And Pipelines
> using the parsers (Ensembl / biopipe) should not stuff the whole
> result set from the parser into a single array.  When you're trying to
> annotate assemblies, it sucks to have to split up contigs/supercontigs
> because the whole result set won't fit into RAM on a 12 gig blade.
> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
> tiny (by comparison to vertebrates).  There, sorry, been saving up
> that frustration for a while.  No offense meant, hope I didn't tick
> anybody off.  8)
>     Torsten:  You sound like you know what you're doing with respect
> to Bioperl more than I do, and I know I don't have CVS access, so I'll
> defer to you.  I'd be happy to help out, though.
>
>
> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> >
> > On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
> >
> > > I'm not sure whether to
> > >
> > > 1. parse them all under the same module, perhaps with a
> > > -format=>'glimmerXXX' parameter
> > >
> > > 2. create a single new module  Glimmer2 and Glimmer3
> > >
> > > 3. create two new modules, one for Glimmer2 and one for Glimmer3,
> > > given
> > > they are different outputs both in syntax and number of output files
> > >
> > > Any advice from Bioperl 'old timers' appreciated ;-)
> > >
> >
> > If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
> > example for how this can work.
> >
> > If this would amount to basically 4 modules stringed together into
> > one file (because the parsing code can't share much if anything
> > between the flavors), it'd still be advantageous to have a single
> > frontend module that would then dispatch.
> >
> >         -hilmar
> >
> > --
> > ===========================================================
> > : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> > ===========================================================
> >
> >
> >
> >
> >
> >
>


From jason at bioperl.org  Wed Feb  7 00:33:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 6 Feb 2007 16:33:11 -0800
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
Message-ID: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>

I definitely vote for 1) - worst case you have 4 separate methods if  
there is no good way to condense the parsing for each format and  
require the user to specify the format.

I have no problem with requiring user to specify what program she  
used - if we can be fancy and guess the format later (i.e. guess  
format in SeqIO) -then that's icing.

-jason
On Feb 6, 2007, at 3:53 PM, Mark Johnson wrote:

> Okay, I need to get something going for a project I'm working on.   
> Options:
>
> 1) Stick it all in one module:  This can get a bit ugly, as  
> Glimmer, as
> opposed to GlimmerM and GlimmerHMM, does not explicitly identify  
> itself in
> the prediction report.  You can pick up on some unique things in  
> the output
> file, but you don't know what you've got until you're actually  
> parsing it.
> Unless you require a format argument up front, then you can split the
> parsing code up into different functions.
> 2) Two modules, one for GlimmerM/GlimmerHMM and one for Glimmer2/ 
> Glimmer3.
> With or without an abstract dispatch front end.
>
> I suppose at this point, after getting my hands dirty, I'd prefer  
> 1), with
> an explicit -format => Glimmer2/3/M/HMM arg required in the  
> constructor.
> Though I'm not opposed to 2) if that is what it takes to get it into
> Bioperl.
>
> If we can achieve some sort of consensus without too much  
> bloodshed, I'll
> shoot y'all some patches and we can consider this issue checked off  
> the
> list.
>
> On 9/20/06, Mark Johnson <johnsonm at gmail.com> wrote:
>>
>>     I think it's going to be at least two modules, one for the
>> prokaryotic stuff and one for the eukaryotic.  And really, the
>> prokaryotic stuff is different enough to warrant two modules. So  
>> three
>> different parsers.  Could do it in one, but it would be ugly and
>> nasty.  However, this does not preclude three parsers and one  
>> abstract
>> interface, which is your excellent suggestion.
>>     Oh, and excuse me, but I have a bit of a rant here, after dealing
>> with parsers and pipelines for the last few months.  Parsers should
>> not load the whole input file into RAM to parse it.  And Pipelines
>> using the parsers (Ensembl / biopipe) should not stuff the whole
>> result set from the parser into a single array.  When you're  
>> trying to
>> annotate assemblies, it sucks to have to split up contigs/ 
>> supercontigs
>> because the whole result set won't fit into RAM on a 12 gig blade.
>> Sheesh.  Though this doesn't matter for bacterial genomes, as they're
>> tiny (by comparison to vertebrates).  There, sorry, been saving up
>> that frustration for a while.  No offense meant, hope I didn't tick
>> anybody off.  8)
>>     Torsten:  You sound like you know what you're doing with respect
>> to Bioperl more than I do, and I know I don't have CVS access, so  
>> I'll
>> defer to you.  I'd be happy to help out, though.
>>
>>
>> On 9/20/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>>>
>>> On Sep 19, 2006, at 9:13 PM, Torsten Seemann wrote:
>>>
>>>> I'm not sure whether to
>>>>
>>>> 1. parse them all under the same module, perhaps with a
>>>> -format=>'glimmerXXX' parameter
>>>>
>>>> 2. create a single new module  Glimmer2 and Glimmer3
>>>>
>>>> 3. create two new modules, one for Glimmer2 and one for Glimmer3,
>>>> given
>>>> they are different outputs both in syntax and number of output  
>>>> files
>>>>
>>>> Any advice from Bioperl 'old timers' appreciated ;-)
>>>>
>>>
>>> If at all possible I'd favor 1), with e.g. Bio::Tools::GFF being an
>>> example for how this can work.
>>>
>>> If this would amount to basically 4 modules stringed together into
>>> one file (because the parsing code can't share much if anything
>>> between the flavors), it'd still be advantageous to have a single
>>> frontend module that would then dispatch.
>>>
>>>         -hilmar
>>>
>>> --
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From torsten.seemann at infotech.monash.edu.au  Wed Feb  7 02:36:54 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 7 Feb 2007 13:36:54 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <a79f6a4b0702061836l7e63933bs3f065b773054c9c4@mail.gmail.com>

> I definitely vote for 1) - worst case you have 4 separate methods if
> there is no good way to condense the parsing for each format and
> require the user to specify the format.

And make the defaut -format to be what is currently parses, ie.
GlimmerM/GlimmerHMM

> I have no problem with requiring user to specify what program she
> used - if we can be fancy and guess the format later (i.e. guess
> format in SeqIO) -then that's icing.

Agreed.

>> Okay, I need to get something going for a project I'm working on.

I would normally try to help but I am so swamped with work-work at the
moment. Just a reminder that last year I added examples of the
different Glimmer outputs to the CVS repository:

./t/data/Glimmer3.predict
./t/data/Glimmer3.detail
./t/data/GlimmerHMM.out
./t/data/Glimmer2.out
./t/data/GlimmerM.out
./t/data/glimmer.out (this was the original one)

Thanks for taking this on!

--Torsten


From mitch_skinner at berkeley.edu  Wed Feb  7 04:37:35 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Tue, 06 Feb 2007 20:37:35 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
Message-ID: <45C9578F.2060802@berkeley.edu>

Hello,

I'm working on an AJAX version of GBrowse (http://genome.biowiki.org), 
where we're pre-rendering entire chromosomes by breaking them up into 
tiles.  One of the problems we have is that it takes a long time to 
render all those tiles.  One of the things that's slowing the process 
down (and using lots of RAM) is rendering the gridlines, and it would 
make things a lot easier (and faster) for us if we could assume that the 
gridlines were the same for each tile.  Since we're only rendering at a 
particular set of zoom levels (that we have control over), I think this 
is a reasonable assumption.

Given the right set of zoom levels, the assumption works almost all the 
time, except for one specific case.  It has to do with the way draw_grid 
and map_pt in Bio::Graphics::Panel work for the very first gridline.

Here's how draw_grid (in CVS HEAD) computes the first gridline:

    my $first_tick = $minor * int($self->start/$minor);

$first_tick, $minor and $self->start are in base-pair space, which is 
1-based.  However, if ($self->start < $minor) then $first_tick is 0.  
This might not be a problem, except that $first_tick is translated into 
pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here 
are the relevant lines in map_pt:

    my $val = $flip 
      ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
      : int (0.5 + ($_-$offset-1) * $scale);

This style of rounding only works for positive numbers; rounding 0.6 by 
doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing 
int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0, 
10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates 
false, and pad left is 0) they're drawn at pixels 0, 9, and 19.

I think that there should be gridlines at pixels 0, 10, and 20.  The 
fact that currently the first interval is 9 pixels and the second is 10 
pixels is breaking my hopeful assumption about the gridlines.

AFAICT my problems are solved if we make two changes:
change the above line from draw_grid to this:
    my $first_tick = 1 + $minor * int(($start - 1)/$minor);
and change the lines from map_pt to this:

    my $val = $flip 
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

Does this make sense?  If people agree that these changes are right then 
I can also produce a proper patch if y'all would prefer that.

Regards,
Mitch


From lstein at cshl.edu  Wed Feb  7 12:17:22 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:17:22 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>

Hi Mitch,

Zero is not a forbidden coordinate, since gbrowse also works on genetic maps
which have negative and floating point coordinates. You've simply picked up
a boundary case where the rounding isn't working properly. I will fix this
now.

Lincoln


On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed Feb  7 12:18:40 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 7 Feb 2007 07:18:40 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45C9578F.2060802@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
Message-ID: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>

However, I'm also very interested in why grid-drawing takes so long. When
I've profiled drawing, neither grid drawing nor map_pt() consume any
significant amount of time.

Lincoln

On 2/6/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Hello,
>
> I'm working on an AJAX version of GBrowse (http://genome.biowiki.org),
> where we're pre-rendering entire chromosomes by breaking them up into
> tiles.  One of the problems we have is that it takes a long time to
> render all those tiles.  One of the things that's slowing the process
> down (and using lots of RAM) is rendering the gridlines, and it would
> make things a lot easier (and faster) for us if we could assume that the
> gridlines were the same for each tile.  Since we're only rendering at a
> particular set of zoom levels (that we have control over), I think this
> is a reasonable assumption.
>
> Given the right set of zoom levels, the assumption works almost all the
> time, except for one specific case.  It has to do with the way draw_grid
> and map_pt in Bio::Graphics::Panel work for the very first gridline.
>
> Here's how draw_grid (in CVS HEAD) computes the first gridline:
>
>     my $first_tick = $minor * int($self->start/$minor);
>
> $first_tick, $minor and $self->start are in base-pair space, which is
> 1-based.  However, if ($self->start < $minor) then $first_tick is 0.
> This might not be a problem, except that $first_tick is translated into
> pixel coordinates in map_pt, which expects 1-based bp coordinates.  Here
> are the relevant lines in map_pt:
>
>     my $val = $flip
>       ? int (0.5 + $pr - ($length - ($_- 1)) * $scale)
>       : int (0.5 + ($_-$offset-1) * $scale);
>
> This style of rounding only works for positive numbers; rounding 0.6 by
> doing int(0.5 + 0.6) gives you 1 as expected, but rounding -0.6 by doing
> int(0.5 + -0.6) gives you 0.  So if the first three gridlines are at 0,
> 10, and 20 bp, then (assuming $scale is 1, $offset is 0, $flip evaluates
> false, and pad left is 0) they're drawn at pixels 0, 9, and 19.
>
> I think that there should be gridlines at pixels 0, 10, and 20.  The
> fact that currently the first interval is 9 pixels and the second is 10
> pixels is breaking my hopeful assumption about the gridlines.
>
> AFAICT my problems are solved if we make two changes:
> change the above line from draw_grid to this:
>     my $first_tick = 1 + $minor * int(($start - 1)/$minor);
> and change the lines from map_pt to this:
>
>     my $val = $flip
>       ? ($pr - ($length - ($_- 1)) * $scale)
>       : (($_-$offset-1) * $scale);
>     $val = int($val + .5 * ($val <=> 0));
>
> Does this make sense?  If people agree that these changes are right then
> I can also produce a proper patch if y'all would prefer that.
>
> Regards,
> Mitch
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From johnsonm at gmail.com  Wed Feb  7 16:50:05 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 7 Feb 2007 10:50:05 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
Message-ID: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>

    Well, each format has some unique features.  If the user declines to
specify the format, I can figure it out, but it will probably involve
scanning the input file twice.  I'll take a look.
    I can do all the parsing in one function, in fact I have, just to see
how nasty it would end up being.  I just can't stomach having the code that
tightly coupled and hard to read.  In the end it'll probably be three
functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
Glimmer3 aren't *that* different, either.

On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>
> I definitely vote for 1) - worst case you have 4 separate methods if there
> is no good way to condense the parsing for each format and require the user
> to specify the format.
>
> I have no problem with requiring user to specify what program she used -
> if we can be fancy and guess the format later (i.e. guess format in SeqIO)
> -then that's icing.
>
> -jason
>
>


From adsj at novozymes.com  Wed Feb  7 17:11:32 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Wed, 07 Feb 2007 18:11:32 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
Message-ID: <8764adoptn.fsf@topper.koldfront.dk>

  Hi.


I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
to features in Bio::Seq objects have stopped appearing when I output
them as EMBL or GenBank-files.

Below is a test-script that exercises the problem.

I guess I should be doing something else when adding qualifiers, now
with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
again of course works perfectly), but I can't deduce what from perldoc
Bio::SeqFeature::Generic - it still lists the add_tag_value method,
and calling it doesn't croak nor warn.

I have found some comments on this in the release notes of 1.5.0? on
the Bioperl wiki, but I must admit I wasn't able to extract what
methods I should be calling instead.

If someone could point me to the relevant documentation or tell me
what method to use instead, I would be happy as a clam.


  Best regards,

    Adam

== =
use Test::More tests=>2;

use strict;
use warnings;

use Bio::Seq;
use Bio::SeqFeature::Generic;
use IO::String;
use Bio::SeqIO;

my $seq=Bio::Seq->new(
                      -seq=>'actgactgactg',
                     );

$seq->display_id('D27');
$seq->accession_number('DB:D27');

my $seq_feature=Bio::SeqFeature::Generic->new(
                                              -strand=>1,
                                              -primary=>'source',
                                             );
$seq_feature->set_attributes(-start=>2, -end=>8);
$seq_feature->add_tag_value(note=>'TEST');
$seq_feature->add_tag_value(db_xref=>'DB:D27');

$seq->add_SeqFeature($seq_feature);

my $raw='';
my $fh=IO::String->new($raw);
my $out=Bio::SeqIO->new(-format=>'EMBL', -fh=>$fh);
$out->write_seq($seq);

ok($raw=~m!/note!, 'Qualifier note found');
ok($raw=~m!/db_xref!, 'Qualifier db_xref found');
== =


? <http://www.bioperl.org/wiki/Core_1.4.0_1.5.0_delta>

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com


From cjfields at uiuc.edu  Wed Feb  7 17:50:13 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 11:50:13 -0600
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
In-Reply-To: <8764adoptn.fsf@topper.koldfront.dk>
References: <8764adoptn.fsf@topper.koldfront.dk>
Message-ID: <C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>


On Feb 7, 2007, at 11:11 AM, Adam Sj?gren wrote:

>   Hi.
>
>
> I am transitioning from Bioperl 1.4 to 1.5.2, and the qualifiers I add
> to features in Bio::Seq objects have stopped appearing when I output
> them as EMBL or GenBank-files.
>
> Below is a test-script that exercises the problem.
>
> I guess I should be doing something else when adding qualifiers, now
> with 1.5.2 (as reading an EMBL-file with Bio::SeqIO and outputting it
> again of course works perfectly), but I can't deduce what from perldoc
> Bio::SeqFeature::Generic - it still lists the add_tag_value method,
> and calling it doesn't croak nor warn.
>
> I have found some comments on this in the release notes of 1.5.0? on
> the Bioperl wiki, but I must admit I wasn't able to extract what
> methods I should be calling instead.
>
> If someone could point me to the relevant documentation or tell me
> what method to use instead, I would be happy as a clam.
>
>
>   Best regards,
>
>     Adam

...

This works for me using bioperl-live (Mac OS X):

ok 1 - Qualifier note found
ok 2 - Qualifier db_xref found

If I print the string I get:

ID   DB:D27; SV 1; linear; unassigned DNA; STD; UNC; 12 BP.
XX
AC   DB:D27;
XX
XX
FH   Key             Location/Qualifiers
FH
FT   source          2..8
FT                   /db_xref="DB:D27"
FT                   /note="TEST"
XX
SQ   Sequence 12 BP; 3 A; 3 C; 3 G; 3 T; 0 other;
      actgactgac  
tg                                                            12
//

GenBank also works:

LOCUS       D27                       12 bp    dna     linear   UNK
ACCESSION   DB:D27
FEATURES             Location/Qualifiers
      source          2..8
                      /db_xref="DB:D27"
                      /note="TEST"
BASE COUNT        3 a      3 c      3 g      3 t
ORIGIN
         1 actgactgac tg
//

If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
mixing the two versions (you can check by using 'perldoc -l  
Bio::Root::Root').

chris


From cjfields at uiuc.edu  Wed Feb  7 18:04:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 12:04:33 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <91A3A651-C0D5-495F-941F-05B8AA0DDA60@uiuc.edu>


On Feb 7, 2007, at 10:50 AM, Mark Johnson wrote:

>     Well, each format has some unique features.  If the user  
> declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just  
> to see
> how nasty it would end up being.  I just can't stomach having the  
> code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.

I don't see a problem with passing off the parse to a defined class  
method either right off or mid-parse.  I'm doing something like this  
with a revamped GenBank parser:

# declare local to module

my %GLIMMER_METHODS = (
     'GlimmerHMM' => '_parsehmm',
     'Glimmer'  => '_parsenormal',
     ....others if needed
     '_DEFAULT_' => '_parseabnormal'
);

...

Then either preparse part of file using _readline() to determine  
format, or use -format and bypass preparsing:

sub next_thingy {
    ...
    if (!$format) {
        while (my $line = $self->_readline()) {
            if ($line =~ m{(something)}) {
                $format = $1; $self->_pushback($line); last;
            }
        }
    }
    my $method =  (exists $GLIMMER_METHODS($format)) ?  
$GLIMMER_METHODS($format) :
                  ($GLIMMER_METHODS('_DEFAULT_'); # fallback to this one

    return $self->$method() # hand off parsing flow to to proper parser
    ...
}

# all parser variants would have this structure:

sub _parsehmm {
    my $self = shift;
    ... init stuff here
    while (my $line = $self->_readline()) {
        ... do stuff until END of next prediction/report
    }
    ... return data if any
}

chris

> On 2/6/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> I definitely vote for 1) - worst case you have 4 separate methods  
>> if there
>> is no good way to condense the parsing for each format and require  
>> the user
>> to specify the format.
>>
>> I have no problem with requiring user to specify what program she  
>> used -
>> if we can be fancy and guess the format later (i.e. guess format  
>> in SeqIO)
>> -then that's icing.
>>
>> -jason
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From johnston at biochem.ucl.ac.uk  Wed Feb  7 18:56:52 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 7 Feb 2007 18:56:52 +0000 (GMT)
Subject: [Bioperl-l] RNA folding
In-Reply-To: <C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
Message-ID: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>

Thanks Chris.

Storing the interaction data as a hash according to an ontology and using
an extended bracket notation as the string representation seems to make
sense, but I'm still unsure how this is supposed to be
attached to the Seq objects. You reckon it should be an AnnotationI?

I'm not sure I understand the distinction between annotations and
features. From the docs I got the impression that Features were like
annotation on bits of sequences and had a reference to the sequence to
which they belong, whereas annotations don't. If that's the case though,
why would RNA structure be an annotation rather than a feature? If not,
what is the distinction between them? Are the positional Annotation
subclasses you're developing intended to replace features? Have I got the
wrong end of the stick entirely?

Cheers,
Cass


On Tue, 6 Feb 2007, Chris Fields wrote:

> Actually, the only RNA tool wrappers I have made are ones for ERPIN,
> RNAMotif, and Infernal (the only one in bioperl-run CVS at this time
> is RNAMotif).  I am planning on writing up wrappers for Vienna,
> UNAFold, and a few others but haven't really started in.  Here's
> where I'm at right now...
>
> I am writing up a new set of AnnotationI classes which positionally
> describe data (Meta) which I hope will help deal with this stuff.
> These would be similar in nature to Heikki's Bio::Seq::Meta classes:
>
> http://bioperl.org/pipermail/bioperl-l/2006-December/024414.html
>
> I would use a regular Bio::SeqI and store the structural data and
> anything else (such as energy calculations, etc) as Annotation
> objects in an AnnotationCollection, and then write up a series of
> SeqIO modules to get data into/out of the designated structure
> formats, like UNAfold ct, RNAML, and so on.  Each sequence would then
> be capable of holding more than one structural Annotation (i.e. could
> represent different folding pathways, alternative RNA folds, and so on).
>
> At this point I represent the data as an array of hashes where $array
> [0] is nt 1 and the hash keys indicate the type of interaction, base
> interacted with, etc.  The text representation would be as simple
> Eddy WUSS (Rfam-like) format by default, which is capable of
> representing some complex data (pseudoknots, for instance), is
> compact, and is documented (via the Infernal manual).  Tags will
> probably switch to more ontologically relevant terms (probably from
> RNAML or RNA Ontology), but in general it is something like this:
>
> [
>   {'interaction' => 'WC',
>     'base'  => 24},
>   {'interaction' => 'WC',
>     'base'  => 23},
>   {'interaction' => 'SS'},
> ...
> ]
>
> In this implementation every seq position would have some kind of
> interaction designation, though that's open for debate as it could
> just be simple text or undef for single-stranded regions.
>
> This is also scalable based on complexity of the data: if one wanted
> to add tert/quaternary interactions, location, base modifications,
> remote sequence interactions, etc., extra key/value pairs could be
> used.  Comversely, if one only wanted sec structure (for drawing RNA
> structures, for example), then only that data would be parsed.
>
> If you (or anyone listening) have any suggestions I would greatly
> appreciate them.
>
> chris
>
>


From cjfields at uiuc.edu  Wed Feb  7 22:15:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 7 Feb 2007 16:15:44 -0600
Subject: [Bioperl-l] RNA folding
In-Reply-To: <Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
References: <Pine.LNX.4.58.0702061143420.4383@localhost.localdomain>
	<C7C6DC30-ACBB-4626-A0D7-CC0248AF847E@uiuc.edu>
	<Pine.LNX.4.58.0702071535320.30225@localhost.localdomain>
Message-ID: <7360B66F-6AF3-4CB1-8343-0A19E42AD7F8@uiuc.edu>


On Feb 7, 2007, at 12:56 PM, Caroline Johnston wrote:

> Thanks Chris.
>
> Storing the interaction data as a hash according to an ontology and  
> using
> an extended bracket notation as the string representation seems to  
> make
> sense, but I'm still unsure how this is supposed to be
> attached to the Seq objects. You reckon it should be an AnnotationI?

As long as it describes everything in the object and that there is a  
reasonable way of textually representing the data, I think you can  
attach anything as annotation.  A recent example is the addition of  
trees as annotation.  Also, Annotation can be used to describe  
alignments (such as the structure consensus string in Rfam  
alignments), or added to SeqFeatures.  The class just needs to  
implement AnnotatableI.

> I'm not sure I understand the distinction between annotations and
> features. From the docs I got the impression that Features were like
> annotation on bits of sequences and had a reference to the sequence to
> which they belong, whereas annotations don't. If that's the case  
> though,
> why would RNA structure be an annotation rather than a feature? If  
> not,
> what is the distinction between them? Are the positional Annotation
> subclasses you're developing intended to replace features? Have I  
> got the
> wrong end of the stick entirely?
>
> Cheers,
> Cass

The key distinction between seqfeatures and annotations is that  
annotations are normally associated with the entire sequence record,  
while seqfeatures normally describe a part of the sequence (and thus  
have a location on the sequence).  There are a few exceptions, but in  
general that's that case.  The HOWTO gives a bit more background:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Using annotations or seqfeatures in a case like this may be  
completely dependent on one's point of view.  For instance, one  
implementation I had considered was adding an interface to Bio::Seq  
which would allow Seq objects to also have Bio::Structure objects/  
since my view is that any sequence could (optionally) have a  
structure associated with it.  However, I reasoned that a sequence  
could actually have multiple structures (RNA, ssDNA, and protein can  
have several alternative folds or different folding pathways, for  
instance).   Instead of splitting up each structure into individual  
seqfeatures (where each which would have to be tagged with the  
relevant structure and score info), I could have one class encompass  
all of that data in a reasonable way.  Hence I used Annotation.

BTW, this isn't meant to replace features in any way.  It would be  
primarily used to describe (1) a sequence as a whole, such as a tRNA  
sequence, (2) a seqfeature, such as a tRNA, rRNA, riboswitch, etc in  
a genome sequence, or (3) a conserved structure in an alignment, such  
as Rfam stockholm output.

I'll add that the option of splitting the data into seqfeatures isn't  
ruled out.  It would be a matter of using a helper method, maybe in  
SeqUtils or directly in Annotation::Meta or whatever I end up calling  
it.  I plan on adding something along those lines at some point.

chris


From mitch_skinner at berkeley.edu  Wed Feb  7 23:26:53 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:26:53 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070417k692fedcchfac5b1f0e01f72bd@mail.gmail.com>
Message-ID: <45CA603D.1070901@berkeley.edu>

Lincoln Stein wrote:
> Zero is not a forbidden coordinate, since gbrowse also works on 
> genetic maps which have negative and floating point coordinates. 
> You've simply picked up a boundary case where the rounding isn't 
> working properly. I will fix this now.
Thanks for the fix.  What do you think of the following case?.  This is 
something I actually ran into.  Suppose you have:
the original draw_grid:

    my $first_tick = $minor * int($self->start/$minor);

and my version of map_pt:

    my $val = $flip
      ? ($pr - ($length - ($_- 1)) * $scale)
      : (($_-$offset-1) * $scale);
    $val = int($val + .5 * ($val <=> 0));

and scale=0.5, offset=0, pad_left=0, flip=0, and minor=10.
Our tiles are currently 1000px wide.  So the first gridline will be at 
0bp => -1px and the 200th gridline will be at 2000bp => 1000px.  So the 
first tile will not have a gridline at it's 0th pixel but the second 
tile will have one there.  Last night I was thinking that this was an 
artifact of having gridlines start at 0bp but now I'm thinking this is 
just because rounding half-pixels leaves an extra space when crossing 
zero.  Which is not unreasonable; it just invalidates the assumption I 
was hoping to make that the gridlines are the same for each tile.  Maybe 
it's just unreasonable to think that floating point calculations will 
give pixel-exact results.

Or I may just be barking up the wrong tree entirely.  Perhaps it's time 
to reconsider at a higher level (see my next message).

Mitch


From mitch_skinner at berkeley.edu  Wed Feb  7 23:28:11 2007
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Wed, 07 Feb 2007 15:28:11 -0800
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
Message-ID: <45CA608B.80907@berkeley.edu>

Lincoln Stein wrote:
> However, I'm also very interested in why grid-drawing takes so long. 
> When I've profiled drawing, neither grid drawing nor map_pt() consume 
> any significant amount of time.
Well, the approach that we've been taking is to hand 
Bio::Graphics::Panel a fake GD object that stores all of the graphical 
primitives (line, rectangle, filledRectangle, etc. + their parameters) 
and then draws them later in chunks (for each tile, we draw all the 
primitives that overlap its pixel boundaries).  We're doing this because 
trying to create a real GD object that's hundreds of millions of pixels 
wide takes too much RAM.  But storing all the gridlines (for a whole 
chromosome, at a high zoom level) also takes a lot of RAM, and getting 
the gridlines for the current tile and translating their coordinates 
into the coordinate space of the tile also takes a fair amount of CPU.  
The gridline hack I've been experimenting with (that prompted these 
emails) was motivated by the hope that the gridlines were regular enough 
that we wouldn't have to store them explicitly, but just draw the same 
gridlines over and over again.  It runs almost twice as fast as the 
version that explicitly stores the gridlines.

So the main slowdown is not in draw_grid or map_pt, but in our code 
that's storing/retrieving and translating the gridlines.  Which we are 
also looking into speeding up.  But the memory usage is harder to 
reduce; I've experimented with trying to compress the gridline data but 
it seems easier to just have the panel draw the grid directly.

The more I read the Panel code, the more I think it would be nice to 
make more use of it.  One of the reasons that we're trying to fool it 
right now is that there seem to be a number of behaviors in it (and/or 
in the glyphs?) that take the current image boundaries into account 
(drawing an arrow where a feature runs off the edge of the image, 
etc.).  But in our browser each tile is supposed to mesh seamlessly with 
its neighbor, so if there's an easy way to turn off those edge-aware 
behaviors that would be pretty interesting.

Ian has also suggested that it might be better to store less information 
than the full set of graphics primitives.  For example, we could just 
store the Panel's glyph boxes and use their (pixel bound)->feature 
information to decide which features need to be drawn for each tile.

I'm going to be spending some time reading the Bio::Graphics code in 
more depth.  I'd also welcome suggestions from you or anyone on the list.

Thanks,
Mitch


From sdbrown at annular.org  Wed Feb  7 23:41:13 2007
From: sdbrown at annular.org (Steven Brown)
Date: Wed, 7 Feb 2007 15:41:13 -0800
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
Message-ID: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>

The module seems to have trouble handling the cut-site specifiers  
that surround the sequence that the enzyme is specific for.  The error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad end parameter (22). End must be less than the total length  
of sequence (total=6)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/Root/Root.pm:328
STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/ 
Bio/PrimarySeq.pm:371
STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/ 
5.8.6/Bio/Restriction/Analysis.pm:369
STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/ 
site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
---snip (my script line)---
-----------------------------------------------------------

The offending enzyme:

---snip---
<1>AcuI
<2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
<3>CTGAAG(16/14)
---snip---

If I get rid of the (16/14) the error disappears and the right  
sequence site is matched.  It seems like maybe a decision was made  
not analyze enzymes with remote cut positions, or the code wouldn't  
throw the error...?  Any information on this would be helpful.

Thanks,
Steve


From adsj at novozymes.com  Thu Feb  8 08:55:50 2007
From: adsj at novozymes.com (Adam =?iso-8859-1?Q?Sj=F8gren?=)
Date: Thu, 08 Feb 2007 09:55:50 +0100
Subject: [Bioperl-l] Bioperl 1.4 to 1.5.2,
	adding qualifiers to Bio::Seq-objects
References: <8764adoptn.fsf@topper.koldfront.dk>
	<C350729C-3964-4685-A89C-D3E5C24A5114@uiuc.edu>
Message-ID: <87fy9hqb8p.fsf@topper.koldfront.dk>

On Wed, 7 Feb 2007 11:50:13 -0600, Chris wrote:

> This works for me using bioperl-live (Mac OS X):

> ok 1 - Qualifier note found
> ok 2 - Qualifier db_xref found

*slaps forehead*

Thanks for the test - your diagnose was spot on:

> If you haven't uninstalled 1.4, make sure you aren't running 1.4 or  
> mixing the two versions (you can check by using 'perldoc -l  
> Bio::Root::Root').

I had a modified version of Bio::Seq and Bio::SeqFeature::Generic in
my @INC (added, and promptly forgotten, writing the patch mentioned
here: <http://article.gmane.org/gmane.comp.lang.perl.bio.general/13349/>).

Removing those and patching 1.5.2 fixed my self-inflicted problem.


  Thanks again!

     Adam

-- 
                                                          Adam Sj?gren
                                                    adsj at novozymes.com


From heikki at sanbi.ac.za  Thu Feb  8 09:39:47 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 8 Feb 2007 11:39:47 +0200
Subject: [Bioperl-l] Bio::Restriction::Analysis cut site problem in 1.5.2
In-Reply-To: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
References: <3828F5B9-BFF8-47F4-8228-3BEA036513A0@annular.org>
Message-ID: <200702081139.48125.heikki@sanbi.ac.za>

The error comes from Bio::PrimarySeq::subseq when it tries to cut beyond an 
existing sequence. Maybe your sequence has a restriction site that is near 
the end of your sequence?

This is a special case which has not been into account in 
Bio::Restriction::Analysis::_cuts method. 

The question is : should the site be be detected if its cut site is not within 
the studied sequence?

Please submit a bugzilla bug, so this gets solved. I probably do not have time 
to tweak the code myself.

	-Heikki


On Thursday 08 February 2007 01:41:13 Steven Brown wrote:
> The module seems to have trouble handling the cut-site specifiers
> that surround the sequence that the enzyme is specific for.  The error:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Bad end parameter (22). End must be less than the total length
> of sequence (total=6)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/Root/Root.pm:328
> STACK: Bio::PrimarySeq::subseq /usr/local/lib/perl5/site_perl/5.8.6/
> Bio/PrimarySeq.pm:371
> STACK: Bio::Restriction::Analysis::_enzyme_sites /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:884
> STACK: Bio::Restriction::Analysis::_cuts /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:785
> STACK: Bio::Restriction::Analysis::cut /usr/local/lib/perl5/site_perl/
> 5.8.6/Bio/Restriction/Analysis.pm:369
> STACK: Bio::Restriction::Analysis::cutters /usr/local/lib/perl5/
> site_perl/5.8.6/Bio/Restriction/Analysis.pm:678
> ---snip (my script line)---
> -----------------------------------------------------------
>
> The offending enzyme:
>
> ---snip---
> <1>AcuI
> <2>Eco57I,Bsp6II,BspD6II,BspKT5I,Eco112I,Eco125I,FsfI
> <3>CTGAAG(16/14)
> ---snip---
>
> If I get rid of the (16/14) the error disappears and the right
> sequence site is matched.  It seems like maybe a decision was made
> not analyze enzymes with remote cut positions, or the code wouldn't
> throw the error...?  Any information on this would be helpful.
>
> Thanks,
> Steve
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From cjfields at uiuc.edu  Thu Feb  8 14:20:26 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 8 Feb 2007 08:20:26 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
Message-ID: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>

All,

BLAST XML parsing should now work for any CPAN-based XML::SAX parser!

XML::SAX::PurePerl (comes with XML::SAX, the slowest)
XML::SAX::Expat
XML::SAX::ExpatXS (the fastest)
XML::LibXML::SAX
XML::LibXML::SAX::Parser

Grant MacLean has updated XML::SAX on CPAN to fix a XML::SAX:PurePerl  
bug, so using that parser will necessitate an XML::SAX upgrade.  I  
had also found a bug in the SAX handler which chopped off a large  
chunk of the description for hits which is now fixed in CVS.

If Sendu is out there, I think we can safely remove any dependencies  
beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
modify Build.PL?

chris


From lstein at cshl.edu  Thu Feb  8 15:51:49 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 8 Feb 2007 10:51:49 -0500
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
In-Reply-To: <45CA608B.80907@berkeley.edu>
References: <45C9578F.2060802@berkeley.edu>
	<6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <6dce9a0b0702080751m210e4d44k3e5c38bfdd3ee9ea@mail.gmail.com>

Hi,

I like the approach you're taking (creating a fake GD object that stores the
graphics primitives). Perhaps the best thing to do is to subclass Panel
itself so that it doesn't draw the gridlines (or turn gridlines off
completely). Then you can draw gridlines after the fact in each tile as
needed.

Lincoln

On 2/7/07, Mitch Skinner <mitch_skinner at berkeley.edu> wrote:
>
> Lincoln Stein wrote:
> > However, I'm also very interested in why grid-drawing takes so long.
> > When I've profiled drawing, neither grid drawing nor map_pt() consume
> > any significant amount of time.
> Well, the approach that we've been taking is to hand
> Bio::Graphics::Panel a fake GD object that stores all of the graphical
> primitives (line, rectangle, filledRectangle, etc. + their parameters)
> and then draws them later in chunks (for each tile, we draw all the
> primitives that overlap its pixel boundaries).  We're doing this because
> trying to create a real GD object that's hundreds of millions of pixels
> wide takes too much RAM.  But storing all the gridlines (for a whole
> chromosome, at a high zoom level) also takes a lot of RAM, and getting
> the gridlines for the current tile and translating their coordinates
> into the coordinate space of the tile also takes a fair amount of CPU.
> The gridline hack I've been experimenting with (that prompted these
> emails) was motivated by the hope that the gridlines were regular enough
> that we wouldn't have to store them explicitly, but just draw the same
> gridlines over and over again.  It runs almost twice as fast as the
> version that explicitly stores the gridlines.
>
> So the main slowdown is not in draw_grid or map_pt, but in our code
> that's storing/retrieving and translating the gridlines.  Which we are
> also looking into speeding up.  But the memory usage is harder to
> reduce; I've experimented with trying to compress the gridline data but
> it seems easier to just have the panel draw the grid directly.
>
> The more I read the Panel code, the more I think it would be nice to
> make more use of it.  One of the reasons that we're trying to fool it
> right now is that there seem to be a number of behaviors in it (and/or
> in the glyphs?) that take the current image boundaries into account
> (drawing an arrow where a feature runs off the edge of the image,
> etc.).  But in our browser each tile is supposed to mesh seamlessly with
> its neighbor, so if there's an easy way to turn off those edge-aware
> behaviors that would be pretty interesting.
>
> Ian has also suggested that it might be better to store less information
> than the full set of graphics primitives.  For example, we could just
> store the Panel's glyph boxes and use their (pixel bound)->feature
> information to decide which features need to be drawn for each tile.
>
> I'm going to be spending some time reading the Bio::Graphics code in
> more depth.  I'd also welcome suggestions from you or anyone on the list.
>
> Thanks,
> Mitch
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Kevin.M.Brown at asu.edu  Thu Feb  8 15:28:30 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 08:28:30 -0700
Subject: [Bioperl-l] Bio::Graphics::Panel gridlines and pixels
References: <45C9578F.2060802@berkeley.edu><6dce9a0b0702070418x139b4e5cu396bb629760a860c@mail.gmail.com>
	<45CA608B.80907@berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC1D0@EX02.asurite.ad.asu.edu>

> The more I read the Panel code, the more I think it would be 
> nice to make more use of it.  One of the reasons that we're 
> trying to fool it right now is that there seem to be a number 
> of behaviors in it (and/or in the glyphs?) that take the 
> current image boundaries into account (drawing an arrow where 
> a feature runs off the edge of the image, etc.).  But in our 
> browser each tile is supposed to mesh seamlessly with its 
> neighbor, so if there's an easy way to turn off those 
> edge-aware behaviors that would be pretty interesting.

I think the glyphs try to deal with edges because if they didn't, then
they would flow out into whatever right or left padding had been placed
around the image when the panel was created.  Something I've noticed is
that when I create tiles for the chromosomes I'm working on the panels
don't line up because the bump position in one panel is not accounted
for when the next panel is drawn.


From sheris at eps.berkeley.edu  Thu Feb  8 17:42:27 2007
From: sheris at eps.berkeley.edu (Sheri Simmons)
Date: Thu, 08 Feb 2007 09:42:27 -0800
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
Message-ID: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>

Hi,
I'm a newbie to BioPerl so apologies if this is a very basic 
question. I am trying to parse GenBank files with the goal of 
creating concatenated gene lists in nucleic acid or amino acid 
format. It is working fine, except for one thing: I need to create 
gene labels incorporating information on whether the gene is on the 
complementary strand or not ("complement" in the CDS tag). How can I 
get Bioperl to tell me whether the CDS tag value includes the word 
"complement"?

Thanks
Sheri


From george.heller at yahoo.com  Thu Feb  8 18:54:41 2007
From: george.heller at yahoo.com (George Heller)
Date: Thu, 8 Feb 2007 10:54:41 -0800 (PST)
Subject: [Bioperl-l] Perl script to extract from ncbi
Message-ID: <178139.85769.qm@web56506.mail.re3.yahoo.com>

Hi all, 
   
  I have a question regarding extracting data from Ncbi. I have a database to store the sequence data, but the files I have loaded into it, dont have a proper description line specified. Based on the accession number, I need to find out what is the genus and species name (organism name) from ncbi. 
   
  I have about 1500 records for which I need to extract the names from ncbi. 
   
  Any ideas of how I can go about writing a perl script for extracting this information from ncbi?
   
  Thanks!
  George.

 
---------------------------------
Now that's room service! Choose from over 150,000 hotels 
in 45,000 destinations on Yahoo! Travel to find your fit.


From Kevin.M.Brown at asu.edu  Thu Feb  8 19:11:50 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 8 Feb 2007 12:11:50 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402AAC29A@EX02.asurite.ad.asu.edu>

When you extract the features, just look at the strand method on the
returned sequence to find out.

@features = $seq->all_SeqFeatures;
# sort features by their primary tags
for my $f (@features)
{
	my $tag = $f->primary_tag;
	if ($tag eq 'CDS')
	{
		print $f->strand ."\n";
	}
} 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Sheri Simmons
> Sent: Thursday, February 08, 2007 10:42 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] bioperl newbie needs help with 
> extracting cds info
> 
> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic 
> question. I am trying to parse GenBank files with the goal of 
> creating concatenated gene lists in nucleic acid or amino 
> acid format. It is working fine, except for one thing: I need 
> to create gene labels incorporating information on whether 
> the gene is on the complementary strand or not ("complement" 
> in the CDS tag). How can I get Bioperl to tell me whether the 
> CDS tag value includes the word "complement"?
> 
> Thanks
> Sheri
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From barry.moore at genetics.utah.edu  Thu Feb  8 19:35:03 2007
From: barry.moore at genetics.utah.edu (Barry Moore)
Date: Thu, 8 Feb 2007 12:35:03 -0700
Subject: [Bioperl-l] bioperl newbie needs help with extracting cds info
In-Reply-To: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
References: <7.0.1.0.2.20070208093840.0202da90@eps.berkeley.edu>
Message-ID: <E6200600-30F2-4471-9107-29A355F543F9@genetics.utah.edu>

Sheri-

The Bio::SeqFeature::Generic object has a 'strand' method, so you can  
just call strand on the CDS (or any other) feature like this.

   my @features = grep { $_->primary_tag eq 'CDS' } $seq- 
 >get_SeqFeatures();
   for my $feature (@features) {
	  my $strand = $feature->strand;
  }

Barry

On Feb 8, 2007, at 10:42 AM, Sheri Simmons wrote:

> Hi,
> I'm a newbie to BioPerl so apologies if this is a very basic
> question. I am trying to parse GenBank files with the goal of
> creating concatenated gene lists in nucleic acid or amino acid
> format. It is working fine, except for one thing: I need to create
> gene labels incorporating information on whether the gene is on the
> complementary strand or not ("complement" in the CDS tag). How can I
> get Bioperl to tell me whether the CDS tag value includes the word
> "complement"?
>
> Thanks
> Sheri
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From torsten.seemann at infotech.monash.edu.au  Fri Feb  9 04:18:33 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 9 Feb 2007 15:18:33 +1100
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>

Chris,

> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
> XML::SAX::Expat
> XML::SAX::ExpatXS (the fastest)
> XML::LibXML::SAX
> XML::LibXML::SAX::Parser

That's excellent news - thanks for all the work you have put in on
this one. I'm impressed.

This is a good opportunity to encourage people who use Bio::SearchIO
for BLAST parsing to switch to 'blastxml' format over 'blast'.
Although the latter is more human readable, it perenially requires
parser source changes to cope with the variations and new formatting
introduced with each new NCBI BLAST release. Best to use "-m 7" XML
format, and convert as appropriate using one of the
Bio::Search::Writer:: classes.

--Torsten


From cjfields at uiuc.edu  Fri Feb  9 13:58:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 07:58:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<a79f6a4b0702082018n564073fcyb1b886b8fe185b90@mail.gmail.com>
Message-ID: <4FC966A7-7A03-41D9-ABF7-6ACD888720FB@uiuc.edu>

On Feb 8, 2007, at 10:18 PM, Torsten Seemann wrote:

> Chris,
>
>> BLAST XML parsing should now work for any CPAN-based XML::SAX parser!
>> XML::SAX::PurePerl (comes with XML::SAX, the slowest)
>> XML::SAX::Expat
>> XML::SAX::ExpatXS (the fastest)
>> XML::LibXML::SAX
>> XML::LibXML::SAX::Parser
>
> That's excellent news - thanks for all the work you have put in on
> this one. I'm impressed.

Jason did most of the hard work; I just tinkered with it until it  
worked (and pestered a few perl XML guys along the way).  Thanks  
Grant and Bj?rn!

> This is a good opportunity to encourage people who use Bio::SearchIO
> for BLAST parsing to switch to 'blastxml' format over 'blast'.
> Although the latter is more human readable, it perenially requires
> parser source changes to cope with the variations and new formatting
> introduced with each new NCBI BLAST release. Best to use "-m 7" XML
> format, and convert as appropriate using one of the
> Bio::Search::Writer:: classes.
>
> --Torsten

I'll try getting some benchmarks for the different parsers up today  
on the wiki if I have time.

Strangely enough, NCBI changed a few things about BLAST XML a few  
releases back w/o mentioning it to anyone (it was a silent bug in  
BLAST XML parsing which I fixed recently).  If you sent in multiple  
queries in older versions of BLAST you would get all of the BLAST XML  
reports concatenated together, which required preparsing the reports  
to carve up the XML prior to parsing.  Now they treat it like PSI- 
BLAST where multiple queries = multiple iterations, so you get one  
long XML BLAST report where each iteration=Result.

The current parser should handle both as it just caches the other  
results and returns them one at a time prior to new parses, but I  
wouldn't recommend parsing a huge BLAST XML file with hundreds of  
queries as you'll quickly run out of memory!

If they get Perl SAX2 up to date with Expat they'll eventually add  
parse_chunk() and pause_parse() for each parser.  Until then...

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cuiw at ncbi.nlm.nih.gov  Fri Feb  9 14:20:10 2007
From: cuiw at ncbi.nlm.nih.gov (Cui, Wenwu (NIH/NLM/NCBI) [C])
Date: Fri, 9 Feb 2007 09:20:10 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
References: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <18C407FD4FFB424292D769FBD68C1987020BBC58@NIHCESMLBX8.nih.gov>

This is an example for fetching two GenBank records
(id=124504630,110665734) in XML format. Organism names like
'<GBSeq_organism>Rattus norvegicus</GBSeq_organism>' can be parsed from
the XML. 

 
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&i
d=124504630,110665734&retmode=xml&rettype=gb

 
Or you can get TaxIds and translate them into real names:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide
&id=124504630,110665734&retmode=xml

 
Wenwu Cui, PhD

 
-----Original Message-----
From: George Heller [mailto:george.heller at yahoo.com] 
Sent: Thursday, February 08, 2007 1:55 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Perl script to extract from ncbi

 
Hi all, 

   
  I have a question regarding extracting data from Ncbi. I have a
database to store the sequence data, but the files I have loaded into
it, dont have a proper description line specified. Based on the
accession number, I need to find out what is the genus and species name
() from ncbi. 

   
  I have about 1500 records for which I need to extract the names from
ncbi. 

   
  Any ideas of how I can go about writing a perl script for extracting
this information from ncbi?

   
  Thanks!

  George.

 
---------------------------------

Now that's room service! Choose from over 150,000 hotels 

in 45,000 destinations on Yahoo! Travel to find your fit.

_______________________________________________

Bioperl-l mailing list

Bioperl-l at lists.open-bio.org

http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Feb  9 17:51:19 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 09 Feb 2007 12:51:19 -0500
Subject: [Bioperl-l] Perl script to extract from ncbi
In-Reply-To: <178139.85769.qm@web56506.mail.re3.yahoo.com>
Message-ID: <C1F21EC7.CBAA%bosborne11@verizon.net>

George,

http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_dat
abase

Brian O.


On 2/8/07 1:54 PM, "George Heller" <george.heller at yahoo.com> wrote:

> Hi all, 
>    
>   I have a question regarding extracting data from Ncbi. I have a database to
> store the sequence data, but the files I have loaded into it, dont have a
> proper description line specified. Based on the accession number, I need to
> find out what is the genus and species name (organism name) from ncbi.
>    
>   I have about 1500 records for which I need to extract the names from ncbi.
>    
>   Any ideas of how I can go about writing a perl script for extracting this
> information from ncbi?
>    
>   Thanks!
>   George.
> 
>  
> ---------------------------------
> Now that's room service! Choose from over 150,000 hotels
> in 45,000 destinations on Yahoo! Travel to find your fit.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From johnston at biochem.ucl.ac.uk  Fri Feb  9 19:23:41 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Fri, 9 Feb 2007 19:23:41 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
Message-ID: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>

Hi,

Could WrapperBase::executable warn you if it doesn't find the exe in
program_path? At the moment it just silently goes ahead and uses one in
the system path if it exists.

Cass.

I've never used diff, so not sure if this is right, but:

305,308c305,314
<        if( $prog_path && -e $prog_path && -x $prog_path ) {
<            $self->{'_pathtoexe'} = $prog_path;
<        } else {
<            my $exe;
---
>        if($prog_path){
>        if(-e $prog_path && -x $prog_path){
>          $self->{'_pathtoexe'} = $prog_path;
>        }
>        else{
>          $self->warn("executable not found in $prog_path, trying system
path...") if $warn;
>        }
>        }
>        unless ($self->{_path_to_exe}){
>        my $exe;
335a342


From bix at sendu.me.uk  Fri Feb  9 22:38:59 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:38:59 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
Message-ID: <45CCF803.9030004@sendu.me.uk>

Caroline Johnston wrote:
> Hi,
> 
> Could WrapperBase::executable warn you if it doesn't find the exe in
> program_path? At the moment it just silently goes ahead and uses one in
> the system path if it exists.

No, I think not. That would be very annoying when using wrappers for 
programs that you just have in your system path.

What specific problem are you encountering with the current behaviour?


From bix at sendu.me.uk  Fri Feb  9 22:40:33 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 09 Feb 2007 22:40:33 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
Message-ID: <45CCF861.8030000@sendu.me.uk>

Chris Fields wrote:
> If Sendu is out there, I think we can safely remove any dependencies  
> beyond XML::SAX 0.15 for the next release.  Should I go ahead and  
> modify Build.PL?

Sure, good to hear.


From cjfields at uiuc.edu  Sat Feb 10 03:42:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 9 Feb 2007 21:42:24 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45CCF861.8030000@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
Message-ID: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>


On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> If Sendu is out there, I think we can safely remove any dependencies
>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>> modify Build.PL?
>
> Sure, good to hear.

I added a version dependency for XML::SAX (v. 0.15) for the PurePerl  
fix.  That likely obviates the need for a Bundle for XML::Simple.   
Not too pressing; we can determine that before the next release.

chris


From johnston at biochem.ucl.ac.uk  Sat Feb 10 16:27:53 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Sat, 10 Feb 2007 16:27:53 +0000 (GMT)
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <45CCF803.9030004@sendu.me.uk>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>
	<45CCF803.9030004@sendu.me.uk>
Message-ID: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>

> No, I think not. That would be very annoying when using wrappers for
> programs that you just have in your system path.
>

Hmm, maybe I misundertood what the program_path was for? The executable
method goes straight to the system path unless program_path is set, so I
assumed you would only set program_path if you specifically wanted it to
look somewhere else. You wouldn't get a warning if you didn't specify a
program_path and just left it to look in the system path.

> What specific problem are you encountering with the current behaviour?

One version of an executable in /usr/local, another version - which I
wanted to use in my home directory.
The program_path method gets a path from an environment variable, which
was set to ~/.
I didn't realise I had the wrong permissions on the
executable though, and it was silently failing to use my version and using
the one in /usr/local instead.


Cass


From george.heller at yahoo.com  Sat Feb 10 20:35:18 2007
From: george.heller at yahoo.com (George Heller)
Date: Sat, 10 Feb 2007 12:35:18 -0800 (PST)
Subject: [Bioperl-l] Error while parsing
Message-ID: <162150.76282.qm@web56511.mail.re3.yahoo.com>

Hi all,
   
  I am in the process of parsing a few files, actually blast results, but happen to get the following error:
   
  ------------- EXCEPTION  -------------
MSG: Can't get HSPs: data not collected.
STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
STACK toplevel parser.pl:31
  --------------------------------------

  I am not sure if this is a bug, or is there something I am doing wrong. Any pointers are appreciated. 
   
  Thanks!
  George.

 
---------------------------------
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.


From cjfields at uiuc.edu  Sat Feb 10 22:56:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 10 Feb 2007 16:56:19 -0600
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>

On Feb 10, 2007, at 2:35 PM, George Heller wrote:

> Hi all,
>
>   I am in the process of parsing a few files, actually blast  
> results, but happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp /usr/lib/perl5/site_perl/ 
> 5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing  
> wrong. Any pointers are appreciated.
>
>   Thanks!
>   George.

We'll need more to go on than that.  If the bioperl version is  
v1.5.2, please file a bug via the bioperl bugzilla:

http://bugzilla.open-bio.org/

Don't forget to attach a test file which triggers the bug using the  
'Create a new attachment' link after the report has been filed.

chris


From sac at bioperl.org  Sun Feb 11 03:56:10 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Sat, 10 Feb 2007 19:56:10 -0800
Subject: [Bioperl-l] Error while parsing
In-Reply-To: <162150.76282.qm@web56511.mail.re3.yahoo.com>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
Message-ID: <8f200b4c0702101956h53fea96dm241126c680d64ab4@mail.gmail.com>

Your report may be lacking HSP alignments for the hit you are attempting to
process. Note that by default, blast will report twice as many one-line
descriptions as it will alignments:

  -v  Number of database sequences to show one-line descriptions for (V)
[Integer]
    default = 500
  -b  Number of database sequence to show alignments for (B) [Integer]
    default = 250

Verify that this isn't the case for your error. If not, go ahead and file a
bug report. Attach the report (zipped if big) as well as the relevant
portion of your processing script.

Steve

On 2/10/07, George Heller <george.heller at yahoo.com> wrote:
>
> Hi all,
>
>   I am in the process of parsing a few files, actually blast results, but
> happen to get the following error:
>
>   ------------- EXCEPTION  -------------
> MSG: Can't get HSPs: data not collected.
> STACK Bio::Search::Hit::GenericHit::hsp
> /usr/lib/perl5/site_perl/5.8.5/Bio/Search/Hit/GenericHit.pm:649
> STACK toplevel parser.pl:31
>   --------------------------------------
>
>   I am not sure if this is a bug, or is there something I am doing wrong.
> Any pointers are appreciated.
>
>   Thanks!
>   George.
>
>
> ---------------------------------
> No need to miss a message. Get email on-the-go
> with Yahoo! Mail for Mobile. Get started.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jay at jays.net  Sun Feb 11 14:24:55 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 08:24:55 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
Message-ID: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>

Just a heads-up --

I wanted to check the "E-mail me when a page I'm watching is changed"  
box in my preferences

http://www.bioperl.org/wiki/Special:Preferences

But I can't. Even if I change nothing and hit the Save button I get  
this:

----------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "User::saveSettings". MySQL returned error  
"1054: Unknown column 'user_newpass_time' in 'field list' (localhost)".
----------

(Yes, it literally says "(SQL query hidden)". That wasn't me for the  
purposes of this email. -grin-)

Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


Username:	Jhannah
User ID:	51


From jay at jays.net  Sun Feb 11 15:16:13 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 09:16:13 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
Message-ID: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>

Hmm.... The error appears to not be limited to changing preferences.  
I tried to update a couple different pages and got errors like this:

------
Database error
A database query syntax error has occurred. This may indicate a bug  
in the software. The last attempted database query was:

     (SQL query hidden)

from within function "Article::updateRedirectOn". MySQL returned  
error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
------

So all changes to the wiki aren't working right now?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From jason at bioperl.org  Sun Feb 11 20:18:15 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 12:18:15 -0800
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
Message-ID: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>

Should be fine now - I did an upgrade to mediawiki 1.9 this weekend  
and i think the upgrade script didn't finish.

In the future system support requests should go to support - AT -  
open-bio.org so we can track them.

-jason
On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:

> Hmm.... The error appears to not be limited to changing preferences.
> I tried to update a couple different pages and got errors like this:
>
> ------
> Database error
> A database query syntax error has occurred. This may indicate a bug
> in the software. The last attempted database query was:
>
>      (SQL query hidden)
>
> from within function "Article::updateRedirectOn". MySQL returned
> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
> ------
>
> So all changes to the wiki aren't working right now?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From cjfields at uiuc.edu  Sun Feb 11 20:51:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 11 Feb 2007 14:51:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
Message-ID: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>

Is there a good place on the main wiki page to prominently display  
this?  I wanted to place something at the top of the main page but I  
didn't know if we wanted to post the support email address on the  
page itself.

chris

On Feb 11, 2007, at 2:18 PM, Jason Stajich wrote:

> Should be fine now - I did an upgrade to mediawiki 1.9 this weekend
> and i think the upgrade script didn't finish.
>
> In the future system support requests should go to support - AT -
> open-bio.org so we can track them.
>
> -jason
> On Feb 11, 2007, at 7:16 AM, Jay Hannah wrote:
>
>> Hmm.... The error appears to not be limited to changing preferences.
>> I tried to update a couple different pages and got errors like this:
>>
>> ------
>> Database error
>> A database query syntax error has occurred. This may indicate a bug
>> in the software. The last attempted database query was:
>>
>>      (SQL query hidden)
>>
>> from within function "Article::updateRedirectOn". MySQL returned
>> error "1146: Table 'perlwikidb.redirect' doesn't exist (localhost)".
>> ------
>>
>> So all changes to the wiki aren't working right now?
>>
>> j
>> seqlab.net
>> http://www.bioperl.org/wiki/User:Jhannah
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jay at jays.net  Sun Feb 11 20:56:53 2007
From: jay at jays.net (Jay Hannah)
Date: Sun, 11 Feb 2007 14:56:53 -0600
Subject: [Bioperl-l] wiki: Database error when attempting to change
	preferences (1054: Unknown column 'user_newpass_time')
In-Reply-To: <E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
References: <9CEA9E3C-25A6-4DEF-902D-3FD997B81C9C@jays.net>
	<9B38AD87-3EB4-4658-A24B-ACC75F7F8298@jays.net>
	<3B80F494-925F-454A-8249-1565A4D50EF6@bioperl.org>
	<E54012AF-CA57-4FA3-B70D-E135D66107ED@uiuc.edu>
Message-ID: <CAF40EBD-F0E2-434C-91F4-2B766B20E734@jays.net>

On Feb 11, 2007, at 2:51 PM, Chris Fields wrote:
> Is there a good place on the main wiki page to prominently display  
> this?  I wanted to place something at the top of the main page but  
> I didn't know if we wanted to post the support email address on the  
> page itself.

I added it here:

http://www.bioperl.org/wiki/About_site

Which is linked from all pages via the left-hand bar:  community |  
About this site

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From agd27 at cornell.edu  Sun Feb 11 17:47:03 2007
From: agd27 at cornell.edu (Adam Diehl)
Date: Sun, 11 Feb 2007 12:47:03 -0500
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
Message-ID: <45CF5697.60703@cornell.edu>

Good morning folks,

I've got sort of a newbie question regarding how to get gff's out of 
Bio::Tools:GFF objects that are formatted according to the UCSC browser 
conventions, described here:

http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
(Ignore the custom track headers and what-not. I just need the fields to 
be set up according to the descriptions in 1 - 9).

The write_feature($feature) method isn't doing it for me, as I get lines 
like the following (newlines excepted):

chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +       
.       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +       
.       
EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase+III%2C+beta+chain;protein_
id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVATDSHRMSQRLIT
LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFETEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNPTYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN

As you can see, field 8, which should be frame according to UCSC 
conventions is blank, and field 9, group according to UCSC, has frame, 
along with ID, etc. All this extra stuff causes the UCSC browser to 
choke. First off, it can't identify which features are the same (it does 
this by matching the group field), and second, it can't interpret the 
CDS's into translated proteins because it lacks frame data.

Basically what I need to do is, for CDS features, extract frame (or 
codon_start, as it were), from the last field, parse out the integer 
value and store that in field 8 (as frame), then parse out locus_tag 
from the last field, clear out everything else and store the locus_tag 
only in that field (preferably without the qualifier locus_tag=). For 
feature type gene, I just want to do the last step, so that the gene and 
CDS features for the same feature have matching group fields that are as 
simple as possible. Let me know if this is not clear.

The way I've been trying to do this is by stringifying each gff object, 
splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the 
following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally, trying to 
parse out the bits I need with regular expressions and store back to 
@tmp1[n].  -- This does not work, because perl wants to interpret every 
/ + etc. as a metacharacter!

I am assuming there's a simple way to get at each value in the last 
field of the gff object using methods supplied by Bio::Tools::GFF, but 
the API docs seem a bit lacking in this area. Could anyone steer me 
towards what I need to know to do this? Please let me know if I can 
clarify any details!

Cheers,
Adam Diehl


From jason at bioperl.org  Sun Feb 11 23:29:16 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 15:29:16 -0800
Subject: [Bioperl-l] Getting GFF output in UCSC-specific format
In-Reply-To: <45CF5697.60703@cornell.edu>
References: <45CF5697.60703@cornell.edu>
Message-ID: <F6B017A7-E91F-4739-9688-F1212EC857C8@bioperl.org>

I assume you are getting your features from a Bio::SeqIO parse of a  
Genbank file?

you get back a Bio::SeqFeature::Generic objects  so you want to look  
at the docs for that module to see what the API is.
you will need to set frame via
$feature->frame($frame)
You are going to have to determine the frame yourself if that isn't  
part of the feature, we don't calculate it for you.

For the 9th column, this is available through the tags methods  
has_tag, add_tag_values, get_tag_values, get_all_tags, and remove_tag
so you can remove all the tags you don't want through remove_tag (or  
if you want to remove them all)
my $locus;
for my $tag ( $feature->get_all_tags ) {
  if( $tag eq 'locus_tag' ) { # save the locus_tag when we see it
   ($locus) = $feature->get_tag_values($tag);
  }
  $feature->remove_tag($tag);
}

You will also want to set the GFF format when you call  
Bio::Tools::GFF - I think the UCSC site is only supporting GFF1, I  
don't know exactly how you set the tag then when they aren't paired  
with key=>value, you'll need to set the tag to 'group' so
$feature->add_tag_value('group', $locus);

If this is all unsatistfactory you can easily write your own GFF  
write for your flavor of the data with the
print join("\t",
                  $feat->seq_id,
                  $feat->source_tag,
                  $feat->primary_tag,
                  $feat->start,
                  $feat->end,
                  $feat->score,
                  $feat->strand > 0 ? '+' : '-',
                  $feat->frame,
		$locus), "\n";


-jason
On Feb 11, 2007, at 9:47 AM, Adam Diehl wrote:

> Good morning folks,
>
> I've got sort of a newbie question regarding how to get gff's out of
> Bio::Tools:GFF objects that are formatted according to the UCSC  
> browser
> conventions, described here:
>
> http://genome.ucsc.edu/goldenPath/help/customTrack.html#GFF
> (Ignore the custom track headers and what-not. I just need the  
> fields to
> be set up according to the descriptions in 1 - 9).
>
> The write_feature($feature) method isn't doing it for me, as I get  
> lines
> like the following (newlines excepted):
>
> chr1    EMBL/GenBank/SwissProt  gene    1712    2848    .       +
> .       db_xref=GeneID:4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002
> chr1    EMBL/GenBank/SwissProt  CDS     1712    2848    .       +
> .
> EC_number=2.7.7.7;codon_start=1;db_xref=GI:94989511,GeneID: 
> 4063728;gene=dnaN;locus_tag=MGAS10270_Spy0002;product=DNA+polymerase 
> +III%2C+beta+chain;protein_
> id=YP_597611.1;transl_table=11;translation=MIQFSINRTLFIHALNATKRAISTKNA 
> IPILSSIKIEVTSTGVTLTGSNGQISIENTIPVSNENAGLLITSPGAILLEASFFINIISSLPDISINVK 
> EIEQHQVVLTSGKSEITLKGKDVDQYPRLQEVSTENPLILKTKLLKSIIAETAFAASLQESRPILTGVHI 
> VLSNHKDFKAVATDSHRMSQRLIT
> LENTSADFDVVIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISFYTRLLEGNYPDTDRLLMTEFE 
> TEVVFNTQSLRHAMERAFLISNATQNGTVKLEITQNHISAHVNSPEVGKVNEDLDIVSQSGSDLTISFNP 
> TYLIESLKAIKSETVKIHFLSPVRPFTLTPGDEEESFIQLITPVRTN
>
> As you can see, field 8, which should be frame according to UCSC
> conventions is blank, and field 9, group according to UCSC, has frame,
> along with ID, etc. All this extra stuff causes the UCSC browser to
> choke. First off, it can't identify which features are the same (it  
> does
> this by matching the group field), and second, it can't interpret the
> CDS's into translated proteins because it lacks frame data.
>
> Basically what I need to do is, for CDS features, extract frame (or
> codon_start, as it were), from the last field, parse out the integer
> value and store that in field 8 (as frame), then parse out locus_tag
> from the last field, clear out everything else and store the locus_tag
> only in that field (preferably without the qualifier locus_tag=). For
> feature type gene, I just want to do the last step, so that the  
> gene and
> CDS features for the same feature have matching group fields that  
> are as
> simple as possible. Let me know if this is not clear.
>
> The way I've been trying to do this is by stringifying each gff  
> object,
> splitting into an array, @tmp1, splitting @tmp1[8] into @tmp2 with the
> following code:  my @tmp2 = split /\;\, $tmp1[8]; and finally,  
> trying to
> parse out the bits I need with regular expressions and store back to
> @tmp1[n].  -- This does not work, because perl wants to interpret  
> every
> / + etc. as a metacharacter!
>
> I am assuming there's a simple way to get at each value in the last
> field of the gff object using methods supplied by Bio::Tools::GFF, but
> the API docs seem a bit lacking in this area. Could anyone steer me
> towards what I need to know to do this? Please let me know if I can
> clarify any details!
>
> Cheers,
> Adam Diehl
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From bix at sendu.me.uk  Sun Feb 11 23:39:15 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 11 Feb 2007 23:39:15 +0000
Subject: [Bioperl-l] WrapperBase
In-Reply-To: <Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
References: <Pine.LNX.4.58.0702091905300.3237@localhost.localdomain>	<45CCF803.9030004@sendu.me.uk>
	<Pine.LNX.4.58.0702101558180.16119@localhost.localdomain>
Message-ID: <45CFA923.8010201@sendu.me.uk>

Caroline Johnston wrote:
>> No, I think not. That would be very annoying when using wrappers for
>> programs that you just have in your system path.
> 
> Hmm, maybe I misundertood what the program_path was for? The executable
> method goes straight to the system path unless program_path is set, so I
> assumed you would only set program_path if you specifically wanted it to
> look somewhere else. You wouldn't get a warning if you didn't specify a
> program_path and just left it to look in the system path.

Yes, sorry. Having now actually looked at your patch it seems fine. I'll 
commit it unless someone beats me to it.


From flope004 at hotmail.com  Mon Feb 12 02:40:08 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 03:40:08 +0100
Subject: [Bioperl-l] TreeIO, how it works?
Message-ID: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>

Hi,

I have a problem. I don't understand how TreeIO reads the trees:
my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));

An unrooted tree with 4 tips and 2 internal nodes.
when I asked for:
print "Total number of nodes ",$tree->number_nodes;

I get 6 but when I ask for:
foreach my $node (@nodes) {
	print $node->internal_id,",";
}
I get 6,0,1,2,3,4,5. Total 7.

The root is number 6 and 2 and 5 are my internal nodes.
If I set the root to be number 5 this node 6 is still present.
Why? what is the node 6?

when I try the following:
  $node5 = $tree->find_node(-internal_id => '5');
  $node6 = $tree->find_node(-internal_id => '6');
  $node2 = $tree->find_node(-internal_id => '2');
  $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
  $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
  $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
  or any other distance I get 2 warnings:
  -------------------- WARNING ---------------------
MSG: Must provide a valid array reference for -nodes
---------------------------------------------------

-------------------- WARNING ---------------------
MSG: Could not find distance!
---------------------------------------------------
What am I doing incorrectly?

I am practicing with AlignIO and TreeIO to calculate the maximum likelihood 
for a given tree. So,
other information about that would be of great help. I am practicing with 
this to see how Bioperl can
help me with more complex problems.

Thank you very much for your help!

_________________________________________________________________
Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos 
incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos


From jason at bioperl.org  Mon Feb 12 03:05:18 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 11 Feb 2007 19:05:18 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
References: <BAY125-F39E82B0D8378C73702B91C8B910@phx.gbl>
Message-ID: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>


On Feb 11, 2007, at 6:40 PM, Wolverine Fran wrote:

> Hi,
>
> I have a problem. I don't understand how TreeIO reads the trees:
> my input: ((dog:0.04,cat:0.08):0.12,(human:0.15,mouse:0.2));
>
> An unrooted tree with 4 tips and 2 internal nodes.
> when I asked for:
> print "Total number of nodes ",$tree->number_nodes;
>
> I get 6 but when I ask for:
> foreach my $node (@nodes) {
> 	print $node->internal_id,",";
> }
> I get 6,0,1,2,3,4,5. Total 7.
>
> The root is number 6 and 2 and 5 are my internal nodes.
> If I set the root to be number 5 this node 6 is still present.
> Why? what is the node 6?

Node 6 is to hold the root or a fake root with a trifurcation for  
unrooted trees.  Did you actually call the reroot method to set the  
root to node 5?

>
> when I try the following:
>   $node5 = $tree->find_node(-internal_id => '5');
>   $node6 = $tree->find_node(-internal_id => '6');
>   $node2 = $tree->find_node(-internal_id => '2');
>   $distance1 = $tree->distance(-nodes =>[$node5,$node2]);
>   $distance2 = $tree->distance(-nodes =>[$node5,$node6]);
>   $distance3 = $tree->distance(-nodes =>[$node2,$node6]);
>   or any other distance I get 2 warnings:
>   -------------------- WARNING ---------------------
> MSG: Must provide a valid array reference for -nodes
> ---------------------------------------------------
>
> -------------------- WARNING ---------------------
> MSG: Could not find distance!
> ---------------------------------------------------
> What am I doing incorrectly?
>
The distance method is just summing branch lengths on the path  
between two nodes.  Is that what are you trying to do?

The error message you report doesn't make sense as
"Must provide a valid array reference for -nodes"
is only printed when you call is_monophyletic or is_paraphyletic as  
far as I can tell.

what version of bioperl are you using?

> I am practicing with AlignIO and TreeIO to calculate the maximum  
> likelihood
> for a given tree. So,other information about that would be of great  
> help. I am practicing with
> this to see how Bioperl can help me with more complex problems.
>
You are trying to calculate the likelihood of a tree or are you  
trying to generate a ML tree from an alignment?

> Thank you very much for your help!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Correos m?s divertidos con fotos y textos
> incre?bles en MSN Premium. Desc?rgalo y pru?balo 2 meses gratis.
> http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_correosmasdivertidos
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From er at xs4all.nl  Mon Feb 12 13:03:06 2007
From: er at xs4all.nl (Erik)
Date: Mon, 12 Feb 2007 14:03:06 +0100 (CET)
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
Message-ID: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>

Hi,


The bioperl wiki changes rss / atom feed has two leading empty lines which
invalidate the xml:

XML Parsing Error: xml declaration not at start of external entity
Location:
http://www.bioperl.org/w/index.php?title=Special:Recentchanges&feed=rss
Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
^

Could those be removed? (I didn't see a way to do it myself). Might be a
useful feed :)


thanks,

Erik


From cjfields at uiuc.edu  Mon Feb 12 14:52:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 12 Feb 2007 08:52:44 -0600
Subject: [Bioperl-l] bioperl wiki changes rss / atom
In-Reply-To: <20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
References: <162150.76282.qm@web56511.mail.re3.yahoo.com>
	<AF76B50B-AF7A-41D1-A64E-D993D8CC4C86@uiuc.edu>
	<20181.156.83.0.124.1171285386.squirrel@webmail.xs4all.nl>
Message-ID: <DA1A57C0-32B5-4095-AB80-318B5F529730@uiuc.edu>

I have forwarded this to support at open-bio.org, which should take  
care of it.

chris

On Feb 12, 2007, at 7:03 AM, Erik wrote:

> Hi,
>
>
> The bioperl wiki changes rss / atom feed has two leading empty  
> lines which
> invalidate the xml:
>
> XML Parsing Error: xml declaration not at start of external entity
> Location:
> http://www.bioperl.org/w/index.php? 
> title=Special:Recentchanges&feed=rss
> Line Number 3, Column 1:<?xml version="1.0" encoding="utf-8"?>
> ^
>
> Could those be removed? (I didn't see a way to do it myself). Might  
> be a
> useful feed :)
>
>
> thanks,
>
> Erik
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sm8 at sanger.ac.uk  Mon Feb 12 17:12:00 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 17:12:00 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF830@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From sm8 at sanger.ac.uk  Mon Feb 12 16:04:41 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Mon, 12 Feb 2007 16:04:41 -0000
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>

Hi -

It is a subtract function for the Bio::RangeI class.  (To be added if
interested)

All the best!
Stephen Montgomery


//ADD TO BIO::RANGEI


=head2 subtract

  Title   : subtract
  Usage   : my @subtracted = $r1->subtract($r2)
  Function: Subtract range r2 from range r1
  Args    : arg #1 = a range to subtract from this one (mandatory)
            arg #2 = strand option ('strong', 'weak', 'ignore')
(optional)
  Returns : undef if they do not overlap or r2 contains this RangeI,
            or an arrayref of Range objects (this is an array since some
            instances where the subtract range is enclosed within this
range
            will result in the creation of two new disjoint ranges)

=cut

sub subtract() {
   my ($self, $range, $so) = @_;
    $self->throw("missing arg: you need to pass in another feature")
      unless $range;
    return unless $self->_testStrand($range, $so);
    
    if ($self eq "Bio::RangeI") {
	$self = "Bio::Range";
	$self->warn("calling static methods of an interface is
deprecated; use $self instead");
    }
    $range->throw("Input a Bio::RangeI object") unless
$range->isa('Bio::RangeI');
    
    if (!$self->overlaps($range)) {
        return undef;
    }
    
    ##Subtracts everything
    if ($range->contains($self)) {
        return undef;   
    }
    
    my ($start, $end, $strand) = $self->intersection($range, $so);
    ##Subtract intersection from $self range
    
    my @outranges = ();
    if ($self->start < $start) {
        push(@outranges, 
		 $self->new('-start'=>$self->start,
			    '-end'=>$start - 1,
			    '-strand'=>$self->strand,
			   ));   
    }
    if ($self->end > $end) {
        push(@outranges, 
		 $self->new('-start'=>$end + 1,
			    '-end'=>$self->end,
			    '-strand'=>$self->strand,
			   ));   
    }
    return \@outranges;
}


//UNIT TEST

#!/usr/bin/perl
use strict;
use Bio::SeqFeature::Generic;
use Data::Dumper;
use Test;

plan tests => 13;

my $feature1 =  new Bio::SeqFeature::Generic ( -start => 1, -end =>
1000, -strand => 1);
my $feature2 =  new Bio::SeqFeature::Generic ( -start => 100, -end =>
900, -strand => -1);

my $subtracted = $feature1->subtract($feature2);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 2);
foreach my $range (@$subtracted) {
    ok($range->start == 1 || $range->start == 901);
    ok($range->end == 99 || $range->end == 1000);
}

my $subtracted = $feature2->subtract($feature1);
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'weak');
ok(!defined($subtracted));
my $subtracted = $feature1->subtract($feature2, 'strong');
ok(!defined($subtracted));

my $feature3 =  new Bio::SeqFeature::Generic ( -start => 500, -end =>
1500, -strand => 1);
my $subtracted = $feature1->subtract($feature3);
ok(defined($subtracted));
ok(scalar(@$subtracted) == 1);
my $subtracted_i = @$subtracted[0];
ok($subtracted_i->start == 1);
ok($subtracted_i->end == 499);


From flope004 at hotmail.com  Mon Feb 12 18:07:12 2007
From: flope004 at hotmail.com (Wolverine Fran)
Date: Mon, 12 Feb 2007 19:07:12 +0100
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <60D2DD3B-2ED1-4A92-A9FA-2875FEAA28CC@bioperl.org>
Message-ID: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>

thanks for your reply!

I am using Bioperl 1.4.

>Node 6 is to hold the root or a fake root with a trifurcation for
>unrooted trees.  Did you actually call the reroot method to set the
>root to node 5?

Yes, I tried the following with the same result:
$tree->reroot($tree->find_node(-internal_id => '5'));
or
$tree->set_root_node($tree->find_node(-internal_id => '5'));

Even if I use a rooted tree: 
(((dog:0.04,cat:0.08):0.12,human:0.15):0.1,mouse:0.1);
I get the node #6. So, is it always present? Am I not representing properly 
a rooted tree  in newick format?

>The distance method is just summing branch lengths on the path
>between two nodes.  Is that what are you trying to do?
>
>The error message you report doesn't make sense as
>"Must provide a valid array reference for -nodes"
>is only printed when you call is_monophyletic or is_paraphyletic as
>far as I can tell.

I do not know yet what I was doing incorrectly but now It works. Yes, I was 
using the distance method to know where the node 6 was located. For the 
unrooted tree, node 6 was node 5 (an internal node) and for the rooted tree 
node 6 was 0.1 from the mouse leaf and the internal node (root).
The error message: "Must provide a valid array reference for -nodes" is 
shown if I indicate a node which is not present in the tree.

>You are trying to calculate the likelihood of a tree or are you
>trying to generate a ML tree from an alignment?

I am trying to calculate the likelihood of a tree, as a practice. Probably 
there are other  bioperl modules, besides AlignIO and TreeIO, which can help 
me in the process and I do not know them.

Again, thank you for your time!

_________________________________________________________________
Acepta el reto MSN Premium: Protecci?n para tus hijos en internet. 
Desc?rgalo y pru?balo 2 meses gratis. 
http://join.msn.com?XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil


From dmessina at wustl.edu  Mon Feb 12 17:49:49 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 12 Feb 2007 11:49:49 -0600
Subject: [Bioperl-l] subtract for Bio::RangeI.pm
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B02FCF754@exchsrv2.internal.sanger.ac.uk>
Message-ID: <1574ACCF-92D5-4DEC-AD04-14EB7767F22A@wustl.edu>

Stephen,

Great, thanks for this. Could you submit it to Bugzilla as an  
enhancement?

http://bugzilla.open-bio.org/


Thanks,
Dave


From jason at bioperl.org  Mon Feb 12 18:38:11 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 12 Feb 2007 10:38:11 -0800
Subject: [Bioperl-l] TreeIO, how (does) it work?
In-Reply-To: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
References: <BAY125-F384316569877D96C05FDC28B910@phx.gbl>
Message-ID: <BD0EF8B4-69A9-468E-A722-1110B02D0EF7@bioperl.org>

I would definitely suggest getting ahold of bioperl 1.5.2 as I seem  
to remember there are several fixes in the tree module code for re- 
rooting a tree.
-jason

On Feb 12, 2007, at 10:07 AM, Wolverine Fran wrote:

> thanks for your reply!
>
> I am using Bioperl 1.4.
>
>> Node 6 is to hold the root or a fake root with a trifurcation for
>> unrooted trees.  Did you actually call the reroot method to set the
>> root to node 5?
>
> Yes, I tried the following with the same result:
> $tree->reroot($tree->find_node(-internal_id => '5'));
> or
> $tree->set_root_node($tree->find_node(-internal_id => '5'));
>
> Even if I use a rooted tree: (((dog:0.04,cat:0.08):0.12,human:0.15): 
> 0.1,mouse:0.1);
> I get the node #6. So, is it always present? Am I not representing  
> properly a rooted tree  in newick format?
>
>> The distance method is just summing branch lengths on the path
>> between two nodes.  Is that what are you trying to do?
>>
>> The error message you report doesn't make sense as
>> "Must provide a valid array reference for -nodes"
>> is only printed when you call is_monophyletic or is_paraphyletic as
>> far as I can tell.
>
> I do not know yet what I was doing incorrectly but now It works.  
> Yes, I was using the distance method to know where the node 6 was  
> located. For the unrooted tree, node 6 was node 5 (an internal  
> node) and for the rooted tree node 6 was 0.1 from the mouse leaf  
> and the internal node (root).
> The error message: "Must provide a valid array reference for - 
> nodes" is shown if I indicate a node which is not present in the tree.
>
>> You are trying to calculate the likelihood of a tree or are you
>> trying to generate a ML tree from an alignment?
>
> I am trying to calculate the likelihood of a tree, as a practice.  
> Probably there are other  bioperl modules, besides AlignIO and  
> TreeIO, which can help me in the process and I do not know them.
>
> Again, thank you for your time!
>
> _________________________________________________________________
> Acepta el reto MSN Premium: Protecci?n para tus hijos en internet.  
> Desc?rgalo y pru?balo 2 meses gratis. http://join.msn.com? 
> XAPID=1697&DI=1055&HL=Footer_mailsenviados_proteccioninfantil
>

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From johnsonm at gmail.com  Mon Feb 12 23:13:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 12 Feb 2007 17:13:09 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
Message-ID: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>

On 2/7/07, Mark Johnson <johnsonm at gmail.com> wrote:
>
>     Well, each format has some unique features.  If the user declines to
> specify the format, I can figure it out, but it will probably involve
> scanning the input file twice.  I'll take a look.
>     I can do all the parsing in one function, in fact I have, just to see
> how nasty it would end up being.  I just can't stomach having the code that
> tightly coupled and hard to read.  In the end it'll probably be three
> functions.  GlimmermM/HMM are pretty close.  Maybe two, Glimmer2 and
> Glimmer3 aren't *that* different, either.


    I've got a 4-in-1 parser roughed in per Chris Fields' suggestion.   Two
actual parsing routines (prokaryotic and eukaryotic).  You can specify
-format as an arg to the constructor (Glimmer, GlimmerM, GlimmerHMM), or it
will look through the input until it can figure out what it is looking at.
    I've got one main issue to solve, the rest is just stuff like updating
the POD.  Torsten Seemann very helpfully added example output for all 4
formats to t/data.  Looking at GlimmerHMM.out, the first line is
'GlimmerHMM'.  However, I think there is a bug in the existing
_parse_predictions:

Shouldn't this:

} elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }

be this instead:

} elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version
            $source = $1;
            next;
        }


I lifted that bit of code to do format detection...we don't have GlimmerHMM
installed locally, so I'm assuming Torsten's output is correct and the above
is a bug.  Guess I'll go check bugzilla...


From torsten.seemann at infotech.monash.edu.au  Tue Feb 13 02:07:40 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 13 Feb 2007 13:07:40 +1100
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
Message-ID: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>

Mark,

>     I've got one main issue to solve, the rest is just stuff like updating
> the POD.  Torsten Seemann very helpfully added example output for all 4
> formats to t/data.  Looking at GlimmerHMM.out, the first line is
> 'GlimmerHMM'.  However, I think there is a bug in the existing
> _parse_predictions:
> Shouldn't this:
> } elsif( /^(Glimmer\S*)$/ ) { # GlimmerHMM has no version
> be this instead:
> } elsif( /^(GlimmerHMM\S*)$/ ) { # GlimmerHMM has no version

I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
Here's why:

I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
parse GlimmerM. I noted that GlimmerHMM was the same output format as
GlimmerM, except for the first line. So in rev 1.5 I modified the
regexp to match both ie. \S* . This would also hopefully match any
other Glimmer-clone formats that arose. I also fixed the pdocs to say
this, and added tests to t/Genpred.t.
% cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
% cvs diff -r 1.15 -r 1.16 t/Genpred.t

I then planned to extend support to Glimmer2 and Glimmer3. I added the
4 test files (t/Glimmer*.out) but never wrote the code. This is where
you have come in Mark :-)

> I lifted that bit of code to do format detection...we don't have GlimmerHMM
> installed locally, so I'm assuming Torsten's output is correct and the above
> is a bug.  Guess I'll go check bugzilla...

I'm pretty sure my 4 test files are correct - I spent a lot of time
ensuring they were consistent etc, as I was getting very confused with
the different "glimmer" versions!

Hope this all helps,

--Torsten


From avilella at gmail.com  Tue Feb 13 13:20:15 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 13 Feb 2007 13:20:15 +0000
Subject: [Bioperl-l] number of gaps for the other sequences in an alignment
Message-ID: <358f4d650702130520n269419cfkb9cb6dac8feaaa5c@mail.gmail.com>

Hi,

It would be great if we could have a method to count, given one
sequence in an alignment, the number of gaps present in the rest of
the sequences of the alignment. That is, for each
nucleotide/aminoacidic position of the sequence of interest, look at
the column in the alignment, count the gaps, then sum them over for
the rest of the non-gapped columns in the sequence of interest.

Has anyone tried this before?

My idea is to end up having a coefficient of indel contribution for
each of the sequences in the alignment, with this coefficient being
high when one sequences forces a lot of gaps to be inserted in the
final alignment, in order to accommodate this given sequence.

I would say that the best place for this is either using methods
already available in SimpleAlign, or have something new added there.

Looking forward to your comments,

Cheers,

    Albert.


From bix at sendu.me.uk  Tue Feb 13 16:09:09 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 13 Feb 2007 16:09:09 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
Message-ID: <45D1E2A5.6060104@sendu.me.uk>

I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database 
and wanted to associated some basic information with them, like exon 
positions. I thought of creating Bio::SeqFeature::Gene::Transcript 
objects and storing them so I could later use features() to see what 
other features overlapped exons. I ran into a fatal error that can be 
replicated with the following simplified one-liner:

perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e 
'$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn => 
"dbi:mysql:test"); $trans = 
Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id 
=> "test"); $db->store($trans); @trans = $db->features(-seqid => $id, 
-type => "transcript"); print "@trans\n";'

code sub {
     package Bio::SeqFeature::Generic;
     use strict 'refs';
     my $self = shift @_;
     foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
         $f = undef;
     }
     $$self{'_gsf_seq'} = undef;
     foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
         $$self{'_gsf_tag_hash'}{$t} = undef;
         delete $$self{'_gsf_tag_hash'}{$t};
     }
} did not evaluate to a subroutine reference, at 
/.../Bio/DB/SeqFeature/Store.pm line 2280


Is this a bug? Or am I taking the wrong approach?


From johnsonm at gmail.com  Tue Feb 13 20:10:23 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Tue, 13 Feb 2007 14:10:23 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
Message-ID: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>

    You're quite correct.  I wasn't paying enough attention.  That does work
just fine.  I fat-fingered something somewhere else, broke my version of the
module for GlimmerHMM, hallucinated and confused \S and \s.  8)
    All I have left now is to fixup the POD documentation and such and then
I can send the module along and somebody can make whatever tweaks and check
it in.  Shall I open a ticket in Bugzilla for this and attach diffs, or just
send them along to somebody to take care of directly?
    Oh, one thing I have not mentioned.  I also added a -seqname argument.
Glimmer2 does not provide any kind of sequence identifier in the output, and
only processes the first sequence in a fasta file.  It would be tedious to
have to code around this by fixing up the predictions after they are
produced, so I added the option to provide this missing info up front,
hopefully allowing downstream code to not have to care as much and have a
special case for fixing up Glimmer2 predictions.

On 2/12/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:

> I think it should be what it says, or perhaps now /^(Glimmer(M|HMM))/.
> Here's why:
>
> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
> parse GlimmerM. I noted that GlimmerHMM was the same output format as
> GlimmerM, except for the first line. So in rev 1.5 I modified the
> regexp to match both ie. \S* . This would also hopefully match any
> other Glimmer-clone formats that arose. I also fixed the pdocs to say
> this, and added tests to t/Genpred.t.
> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>
> I then planned to extend support to Glimmer2 and Glimmer3. I added the
> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
> you have come in Mark :-)
>
> > I lifted that bit of code to do format detection...we don't have
> GlimmerHMM
> > installed locally, so I'm assuming Torsten's output is correct and the
> above
> > is a bug.  Guess I'll go check bugzilla...
>
> I'm pretty sure my 4 test files are correct - I spent a lot of time
> ensuring they were consistent etc, as I was getting very confused with
> the different "glimmer" versions!
>
> Hope this all helps,
>
> --Torsten
>


From cjfields at uiuc.edu  Tue Feb 13 20:47:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 14:47:19 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<451095C7.3020905@infotech.monash.edu.au>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
Message-ID: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>

You'll also want to update whatever relevant tests there are for  
Glimmer; looks like they are in GenPred.t.

chris

On Feb 13, 2007, at 2:10 PM, Mark Johnson wrote:

>     You're quite correct.  I wasn't paying enough attention.  That  
> does work
> just fine.  I fat-fingered something somewhere else, broke my  
> version of the
> module for GlimmerHMM, hallucinated and confused \S and \s.  8)
>     All I have left now is to fixup the POD documentation and such  
> and then
> I can send the module along and somebody can make whatever tweaks  
> and check
> it in.  Shall I open a ticket in Bugzilla for this and attach  
> diffs, or just
> send them along to somebody to take care of directly?
>     Oh, one thing I have not mentioned.  I also added a -seqname  
> argument.
> Glimmer2 does not provide any kind of sequence identifier in the  
> output, and
> only processes the first sequence in a fasta file.  It would be  
> tedious to
> have to code around this by fixing up the predictions after they are
> produced, so I added the option to provide this missing info up front,
> hopefully allowing downstream code to not have to care as much and  
> have a
> special case for fixing up Glimmer2 predictions.
>
> On 2/12/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>
>> I think it should be what it says, or perhaps now /^(Glimmer(M| 
>> HMM))/.
>> Here's why:
>>
>> I came onto the scene at Glimmer.pm rev 1.4. At that stage it only
>> parse GlimmerM. I noted that GlimmerHMM was the same output format as
>> GlimmerM, except for the first line. So in rev 1.5 I modified the
>> regexp to match both ie. \S* . This would also hopefully match any
>> other Glimmer-clone formats that arose. I also fixed the pdocs to say
>> this, and added tests to t/Genpred.t.
>> % cvs diff -r 1.4 -r 1.5 Bio/Tools/Glimmer.pm
>> % cvs diff -r 1.15 -r 1.16 t/Genpred.t
>>
>> I then planned to extend support to Glimmer2 and Glimmer3. I added  
>> the
>> 4 test files (t/Glimmer*.out) but never wrote the code. This is where
>> you have come in Mark :-)
>>
>>> I lifted that bit of code to do format detection...we don't have
>> GlimmerHMM
>>> installed locally, so I'm assuming Torsten's output is correct  
>>> and the
>> above
>>> is a bug.  Guess I'll go check bugzilla...
>>
>> I'm pretty sure my 4 test files are correct - I spent a lot of time
>> ensuring they were consistent etc, as I was getting very confused  
>> with
>> the different "glimmer" versions!
>>
>> Hope this all helps,
>>
>> --Torsten
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From thokeller at gmail.com  Tue Feb 13 22:00:06 2007
From: thokeller at gmail.com (Thomas Keller)
Date: Tue, 13 Feb 2007 14:00:06 -0800
Subject: [Bioperl-l] update/install problem
Message-ID: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>

Could someone suggest a workaround or fix for this error?

$ sudo fink update bioperl-pm586
Information about 5850 packages read in 2 seconds.
The package 'bioperl-pm586' will be built and installed.
The package 'xml-sax-pm586' will be installed.
The package 'xml-sax-writer-pm586' will be built and installed.
The package 'xml-filter-buffertext-pm586' will be built and installed.
The following package will be installed or updated:
 bioperl-pm586
The following 3 additional packages will be installed:
 xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
Do you want to continue? [Y/n] Y
/sw/bin/dpkg-lockwait -i
/sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/xml-sax-pm586_0.13-2_darwin-
powerpc.deb
(Reading database ... 48029 files and directories currently installed.)
Preparing to replace xml-sax-pm586 0.13-2 (using
.../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
Unpacking replacement xml-sax-pm586 ...
Setting up xml-sax-pm586 (0.13-2) ...
update-perl586-sax-parsers: adding Perl SAX parser module info file of
XML::SAX::PurePerl...
Can't locate object method "save_parsers_debian" via package "XML::SAX" at
/sw/sbin/update-perl586-sax-parsers line 96.
/sw/bin/dpkg: error processing xml-sax-pm586 (--install):
 subprocess post-installation script returned error exit status 22
Errors were encountered while processing:
 xml-sax-pm586
### execution of /sw/bin/dpkg-lockwait failed, exit code 1
Failed: can't install package xml-sax-pm586-0.13-2


-- 
Tom Keller
"Ecrasez l'Infame!" -- Voltaire


From sac at bioperl.org  Tue Feb 13 23:00:46 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 13 Feb 2007 15:00:46 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
Message-ID: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>

I noticed that Bio::Root::Utilities was purged from bioperl-live for the
1.5.2 release, but I'd like us to consider adding it back. I agree that the
other purged Root modules were ancient relics of the past, but Bio::Root::
Utilities.pm still has signs of life (at least I still find occasion to use
it, or refer to code in it).

I know that it's not currently used by any other modules in Bioperl, but
there are likely some legacy scripts out there that rely on it. Probably
most of those scripts are ones I've written, but there have been substantive
commits by others in the not-to-distant past (Dec 2005), so at least some
folks besides myself are using it and may hesitate to upgrade their bioperl
installation if it's absent.

I'm all for avoiding bloat in the codebase and am eager to see Bioperl be
more lean and mean, but I'd like to keep this module around. I'll agree to
add some tests for it as well as clean some things up (e.g., use
Bio::Root::IO to get temp file name).

Cheers,
Steve
--
Steve Chervitz
sac at bioperl.org


From cjfields at uiuc.edu  Wed Feb 14 01:29:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 19:29:03 -0600
Subject: [Bioperl-l] update/install problem
In-Reply-To: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
Message-ID: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>

On Feb 13, 2007, at 4:00 PM, Thomas Keller wrote:

> Could someone suggest a workaround or fix for this error?
>
> $ sudo fink update bioperl-pm586
> Information about 5850 packages read in 2 seconds.
> The package 'bioperl-pm586' will be built and installed.
> The package 'xml-sax-pm586' will be installed.
> The package 'xml-sax-writer-pm586' will be built and installed.
> The package 'xml-filter-buffertext-pm586' will be built and installed.
> The following package will be installed or updated:
>  bioperl-pm586
> The following 3 additional packages will be installed:
>  xml-filter-buffertext-pm586 xml-sax-pm586 xml-sax-writer-pm586
> Do you want to continue? [Y/n] Y
> /sw/bin/dpkg-lockwait -i
> /sw/fink/dists/unstable/main/binary-darwin-powerpc/libs/perlmods/ 
> xml-sax-pm586_0.13-2_darwin-
> powerpc.deb
> (Reading database ... 48029 files and directories currently  
> installed.)
> Preparing to replace xml-sax-pm586 0.13-2 (using
> .../xml-sax-pm586_0.13-2_darwin-powerpc.deb) ...
> Unpacking replacement xml-sax-pm586 ...
> Setting up xml-sax-pm586 (0.13-2) ...
> update-perl586-sax-parsers: adding Perl SAX parser module info file of
> XML::SAX::PurePerl...
> Can't locate object method "save_parsers_debian" via package  
> "XML::SAX" at
> /sw/sbin/update-perl586-sax-parsers line 96.
> /sw/bin/dpkg: error processing xml-sax-pm586 (--install):
>  subprocess post-installation script returned error exit status 22
> Errors were encountered while processing:
>  xml-sax-pm586
> ### execution of /sw/bin/dpkg-lockwait failed, exit code 1
> Failed: can't install package xml-sax-pm586-0.13-2

The fink installation seems to be hanging on XML::SAX, not bioperl.   
You could try installing XML::SAX (now at v. 0.15) via CPAN using  
'sudo cpan'; I updated just recently w/o problems.

As an aside, you could similarly install bioperl directly from CPAN  
(which I also haven't had any problems with).  The installation  
allows for installing optional modules.

chris


From cjfields at uiuc.edu  Wed Feb 14 03:41:31 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 13 Feb 2007 21:41:31 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
Message-ID: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>


On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:

> I noticed that Bio::Root::Utilities was purged from bioperl-live  
> for the
> 1.5.2 release, but I'd like us to consider adding it back. I agree  
> that the
> other purged Root modules were ancient relics of the past, but  
> Bio::Root::
> Utilities.pm still has signs of life (at least I still find  
> occasion to use
> it, or refer to code in it).
>
> I know that it's not currently used by any other modules in  
> Bioperl, but
> there are likely some legacy scripts out there that rely on it.  
> Probably
> most of those scripts are ones I've written, but there have been  
> substantive
> commits by others in the not-to-distant past (Dec 2005), so at  
> least some
> folks besides myself are using it and may hesitate to upgrade their  
> bioperl
> installation if it's absent.
>
> I'm all for avoiding bloat in the codebase and am eager to see  
> Bioperl be
> more lean and mean, but I'd like to keep this module around. I'll  
> agree to
> add some tests for it as well as clean some things up (e.g., use
> Bio::Root::IO to get temp file name).
>
> Cheers,
> Steve
> --
> Steve Chervitz
> sac at bioperl.org

I don't have a problem with adding it back, esp. if tests are added.   
Everything in Bio::Root* not tied to a module was yanked out when no  
one spoke up about cleaning up Bio::Root* modules:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/ 
focus=12839

Maybe others disagree?

chris


From bix at sendu.me.uk  Wed Feb 14 08:00:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 08:00:35 +0000
Subject: [Bioperl-l] update/install problem
In-Reply-To: <C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
References: <baa466330702131400q30a65bb4ib5037756fc59a9b7@mail.gmail.com>
	<C8B7325D-0554-4E37-9B06-9A8344BF5F7E@uiuc.edu>
Message-ID: <45D2C1A3.9060300@sendu.me.uk>

Chris Fields wrote:
> As an aside, you could similarly install bioperl directly from CPAN  
> (which I also haven't had any problems with).

Indeed. If you follow the unix instructions at 
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix you should have 
a problem-free complete install under Mac OS X.


From bix at sendu.me.uk  Wed Feb 14 14:08:22 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:08:22 +0000
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
Message-ID: <45D317D6.5070903@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
> 
>> Chris Fields wrote:
>>> If Sendu is out there, I think we can safely remove any dependencies
>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>> modify Build.PL?
>>
>> Sure, good to hear.
> 
> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl 
> fix.  That likely obviates the need for a Bundle for XML::Simple.  Not 
> too pressing; we can determine that before the next release.

The bundle is now obsolete. Does anything in Bioperl, or any of its 
dependencies, now make use of the expat library? If not, I can remove 
mention of it from the install documentation.


From bix at sendu.me.uk  Wed Feb 14 14:02:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 14 Feb 2007 14:02:39 +0000
Subject: [Bioperl-l] DB.t failures
Message-ID: <45D3167F.2000608@sendu.me.uk>

DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer 
getting sequences back from NCBI in the order we requested them in batch 
mode.

Is this a change at NCBI? Is there some way we can make sure to return 
the sequences in the expected order? Or shouldn't the order be expected 
(should the test script be altered)?


From cjfields at uiuc.edu  Wed Feb 14 14:37:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:37:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <49A5C7D3-8D63-452C-B0EA-6F7144F85E35@uiuc.edu>

Confirmed on this end.

It's possible that the default sort order from eutils is different  
now though I haven't seen anything on the eutils mail list.  There  
may be a way to set the sort order via the base URL; I'll check into  
it later today; I'm still digging myself out from the midwest blizzard.

chris

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:

> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.
>
> Is this a change at NCBI? Is there some way we can make sure to return
> the sequences in the expected order? Or shouldn't the order be  
> expected
> (should the test script be altered)?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Feb 14 14:42:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 08:42:05 -0600
Subject: [Bioperl-l] BLASTXML changes (good this time!)
In-Reply-To: <45D317D6.5070903@sendu.me.uk>
References: <EB130F09-5670-4F3E-B481-60A700AEA38F@uiuc.edu>
	<45CCF861.8030000@sendu.me.uk>
	<DF989651-7CC0-45A0-BA5E-EEEE88FDD40E@uiuc.edu>
	<45D317D6.5070903@sendu.me.uk>
Message-ID: <E9611B3C-658E-4CBC-A2ED-1990F929A130@uiuc.edu>


On Feb 14, 2007, at 8:08 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 9, 2007, at 4:40 PM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> If Sendu is out there, I think we can safely remove any  
>>>> dependencies
>>>> beyond XML::SAX 0.15 for the next release.  Should I go ahead and
>>>> modify Build.PL?
>>>
>>> Sure, good to hear.
>>
>> I added a version dependency for XML::SAX (v. 0.15) for the PurePerl
>> fix.  That likely obviates the need for a Bundle for XML::Simple.   
>> Not
>> too pressing; we can determine that before the next release.
>
> The bundle is now obsolete. Does anything in Bioperl, or any of its
> dependencies, now make use of the expat library? If not, I can remove
> mention of it from the install documentation.

I'll try getting something up about XML::SAX on the wiki today.   
XML::Parser, though, still requires expat AFAIK:

http://www.bioperl.org/wiki/BioPerl_Dependencies

chris


From kellert at ohsu.edu  Tue Feb 13 22:43:24 2007
From: kellert at ohsu.edu (Thomas J Keller)
Date: Tue, 13 Feb 2007 14:43:24 -0800
Subject: [Bioperl-l] HowTo:SearchIO
Message-ID: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>

Greetings,
I've been away from programming and informatics for many months.  
Hoping to get back into it, I thought it would be good to review the  
tutorials.
I tried the code in the tutorial on the sample blast report in the  
tutorial and it worked fine. So I ran a blastx search and saved the  
results and tried to parse them: It gave the "... parsing" message,  
but no other results get reported.

Any suggestions?

Thanks,
Tom

Tom Keller, Ph.D.
kellert at ohsu.edu
503-494-2442
6339b Basic Science Bldg
http://www.ohsu.edu/research/core


From mrouard at gmail.com  Wed Feb 14 11:23:47 2007
From: mrouard at gmail.com (Mathieu Rouard)
Date: Wed, 14 Feb 2007 12:23:47 +0100
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
Message-ID: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>

Dear all,

I am starting to use the bioperl API to parse multiple alignments and I am
wondering what is the most effective way to splice all the columns from an
alignment (all the AA at the postion 1, position 2 etc.). I quickly
implemented this simple code but it becomes quite slow when the length of
sequences increases.

my $stream  = Bio::AlignIO->new(-file => $inputfilename,
                        '-format' => 'stockholm');

my $aln = $stream->next_aln();

my $length = $aln->length();
my %column;

for (my $i=1;$i<=$length;$i++) {
       my $aa;
        foreach my $seq ($aln->each_seq()) {
          my $obj = $seq->trunc($i,$i);
          $aa .=$obj->seq;
        }
     # need to track the column number and the sequence of the column
     push $column,  $aa;
}

Would you have any other suggestion?

thanks
Mathieu


From avilella at gmail.com  Wed Feb 14 15:29:02 2007
From: avilella at gmail.com (Albert Vilella)
Date: Wed, 14 Feb 2007 15:29:02 +0000
Subject: [Bioperl-l] get the sequence of a column in a multiple alignment
In-Reply-To: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
References: <ab1628190702140323m2d6f1e00h9cc85ffaf84fcafd@mail.gmail.com>
Message-ID: <358f4d650702140729u4dae2847qc8eeeb45b20faca4@mail.gmail.com>

there is a slice method:

  $mini_aln = $aln->slice(20,30);  # get a block of columns

 Title     : slice
 Usage     : $aln2 = $aln->slice(20,30)
 Function  : Creates a slice from the alignment inclusive of start and
             end columns, and the first column in the alignment is denoted 1.
             Sequences with no residues in the slice are excluded from the
             new alignment and a warning is printed. Slice beyond the length of
             the sequence does not do padding.
 Returns   : A Bio::SimpleAlign object
 Args      : Positive integer for start column, positive integer for end column,
             optional boolean which if true will keep gap-only columns
in the newly
             created slice. Example:

             $aln2 = $aln->slice(20,30,1)

but I don't know how well it behaves for lots of sequences :)


On 2/14/07, Mathieu Rouard <mrouard at gmail.com> wrote:
> Dear all,
>
> I am starting to use the bioperl API to parse multiple alignments and I am
> wondering what is the most effective way to splice all the columns from an
> alignment (all the AA at the postion 1, position 2 etc.). I quickly
> implemented this simple code but it becomes quite slow when the length of
> sequences increases.
>
> my $stream  = Bio::AlignIO->new(-file => $inputfilename,
>                         '-format' => 'stockholm');
>
> my $aln = $stream->next_aln();
>
> my $length = $aln->length();
> my %column;
>
> for (my $i=1;$i<=$length;$i++) {
>        my $aa;
>         foreach my $seq ($aln->each_seq()) {
>           my $obj = $seq->trunc($i,$i);
>           $aa .=$obj->seq;
>         }
>      # need to track the column number and the sequence of the column
>      push $column,  $aa;
> }
>
> Would you have any other suggestion?
>
> thanks
> Mathieu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Wed Feb 14 16:59:49 2007
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 14 Feb 2007 08:59:49 -0800
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>

As always, reporting the version of BLAST and Bioperl you have  
installed will help someone diagnose if this is a fixed problem or  
not.  If you trawl through the list archives you'll chris and others  
have been playing cat and mouse with the text version output from  
NCBI BLAST which appears to be an ever evolving beast.

So the best advice right now is to get the latest bioperl from CVS   
to insure you have all the patches that might parse this version.  If  
it still fails then the standard response will be to submit the  
report as an attachment to a new bug report on the bugzilla.

thanks,
-jason


On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:

> Greetings,
> I've been away from programming and informatics for many months.
> Hoping to get back into it, I thought it would be good to review the
> tutorials.
> I tried the code in the tutorial on the sample blast report in the
> tutorial and it worked fine. So I ran a blastx search and saved the
> results and tried to parse them: It gave the "... parsing" message,
> but no other results get reported.
>
> Any suggestions?
>
> Thanks,
> Tom
>
> Tom Keller, Ph.D.
> kellert at ohsu.edu
> 503-494-2442
> 6339b Basic Science Bldg
> http://www.ohsu.edu/research/core
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/


From dmessina at wustl.edu  Wed Feb 14 16:58:45 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 10:58:45 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
Message-ID: <6E3CAB6B-9F9E-46FD-9021-50D7FE011860@wustl.edu>

Hi Tom,

Could you tell us what version of BioPerl you are using, and what  
specific example is failing for  you? And could you post your code?

That would make it easier to diagnose the problem.

Thanks,
Dave

-- 
Dave Messina
Senior Programmer/Analyst, Assembly Group
WashU Genome Sequencing Center
dmessina a t  wustl.edu
314-286-1415


From cjfields at uiuc.edu  Wed Feb 14 17:28:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 11:28:24 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
Message-ID: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>

I would also strongly encourage switching to using XML-based parsing,  
which is much more stable now.  Here's the link to the NCBI response  
re: BLAST report parsing:

http://bioperl.org/wiki/NCBI_Blast_email

chris (taking a break from shoveling snow...)

On Feb 14, 2007, at 10:59 AM, Jason Stajich wrote:

> As always, reporting the version of BLAST and Bioperl you have
> installed will help someone diagnose if this is a fixed problem or
> not.  If you trawl through the list archives you'll chris and others
> have been playing cat and mouse with the text version output from
> NCBI BLAST which appears to be an ever evolving beast.
>
> So the best advice right now is to get the latest bioperl from CVS
> to insure you have all the patches that might parse this version.  If
> it still fails then the standard response will be to submit the
> report as an attachment to a new bug report on the bugzilla.
>
> thanks,
> -jason
>
>
> On Feb 13, 2007, at 2:43 PM, Thomas J Keller wrote:
>
>> Greetings,
>> I've been away from programming and informatics for many months.
>> Hoping to get back into it, I thought it would be good to review the
>> tutorials.
>> I tried the code in the tutorial on the sample blast report in the
>> tutorial and it worked fine. So I ran a blastx search and saved the
>> results and tried to parse them: It gave the "... parsing" message,
>> but no other results get reported.
>>
>> Any suggestions?
>>
>> Thanks,
>> Tom
>>
>> Tom Keller, Ph.D.
>> kellert at ohsu.edu
>> 503-494-2442
>> 6339b Basic Science Bldg
>> http://www.ohsu.edu/research/core
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From sac at bioperl.org  Wed Feb 14 18:20:17 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 14 Feb 2007 10:20:17 -0800
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
Message-ID: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
>
> On Feb 13, 2007, at 5:00 PM, Steve Chervitz wrote:
>
> > I noticed that Bio::Root::Utilities was purged from bioperl-live
> > for the
> > 1.5.2 release, but I'd like us to consider adding it back. I agree
> > that the
> > other purged Root modules were ancient relics of the past, but
> > Bio::Root::
> > Utilities.pm still has signs of life (at least I still find
> > occasion to use
> > it, or refer to code in it).
> >
> > I know that it's not currently used by any other modules in
> > Bioperl, but
> > there are likely some legacy scripts out there that rely on it.
> > Probably
> > most of those scripts are ones I've written, but there have been
> > substantive
> > commits by others in the not-to-distant past (Dec 2005), so at
> > least some
> > folks besides myself are using it and may hesitate to upgrade their
> > bioperl
> > installation if it's absent.
> >
> > I'm all for avoiding bloat in the codebase and am eager to see
> > Bioperl be
> > more lean and mean, but I'd like to keep this module around. I'll
> > agree to
> > add some tests for it as well as clean some things up (e.g., use
> > Bio::Root::IO to get temp file name).
> >
> > Cheers,
> > Steve
> > --
> > Steve Chervitz
> > sac at bioperl.org
>
> I don't have a problem with adding it back, esp. if tests are added.
> Everything in Bio::Root* not tied to a module was yanked out when no
> one spoke up about cleaning up Bio::Root* modules:
>
> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
> focus=12839
>
> Maybe others disagree?
>
> chris
>

Sorry I missed out on that thread. I had some trouble with my bioperl-l
email delivery getting disabled due to excessive bounces, and it took me a
while to catch it.

Bio::Root::Utilities is quite a grab bag of miscellaneous general functions
that are occasionally useful for perl scripting (e.g., determining
end-of-line characters, sending email, etc.). The code could definitely use
a review, and maybe an example script to advertise it. I can look into this,
and suggestions are welcome.

Steve


From dmessina at wustl.edu  Wed Feb 14 18:55:18 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 12:55:18 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
Message-ID: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>


On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:

> I would also strongly encourage switching to using XML-based parsing,

Unless anyone objects, I would be happy to update the HOWTO to  
suggest people make the switch and give an example of XML parsing.

The Bio::SearchIO synopsis is already an XML example. However,  
there's no warning about text-based parsing nor a suggestion to use  
XML that I can see -- perhaps should be added?

Dave


From cjfields at uiuc.edu  Wed Feb 14 20:12:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 14 Feb 2007 14:12:21 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
Message-ID: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>


On Feb 14, 2007, at 12:55 PM, David Messina wrote:

>
> On Feb 14, 2007, at 11:28 AM, Chris Fields wrote:
>
>> I would also strongly encourage switching to using XML-based parsing,
>
> Unless anyone objects, I would be happy to update the HOWTO to
> suggest people make the switch and give an example of XML parsing.
>
> The Bio::SearchIO synopsis is already an XML example. However,
> there's no warning about text-based parsing nor a suggestion to use
> XML that I can see -- perhaps should be added?
>
> Dave

We should probably add something specifically for BLAST, yes.  Other  
text parsers should be fine.

Personally, I use XML or tabular output parsing simply b/c they are  
faster and do what I need.  I think we'll need to retain the  
capability for text-based BLAST parsing, but it will become extremely  
bloated long-term if we plan on continuing support for parsing all  
versions and flavors of BLAST, particularly if NCBI continues to  
change the output.

chris


From dmessina at wustl.edu  Wed Feb 14 20:46:31 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 14 Feb 2007 14:46:31 -0600
Subject: [Bioperl-l] HowTo:SearchIO
In-Reply-To: <C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
References: <CF656232-DEF2-47C4-9855-DAF748AEE862@ohsu.edu>
	<FAAF2A93-97CA-4E84-9178-0561C9BC7B5B@bioperl.org>
	<3A7FFE76-808B-4249-B588-8709567B870F@uiuc.edu>
	<49A80E44-EB7F-4E2A-86E8-2379065F0FC3@wustl.edu>
	<C9FDCBA5-5BDE-41FE-9281-F5C73B0A700B@uiuc.edu>
Message-ID: <136DA052-B9FD-4547-B262-EC6E38B47392@wustl.edu>

On Feb 14, 2007, at 2:12 PM, Chris Fields wrote:

> We should probably add something specifically for BLAST, yes.   
> Other text parsers should be fine.

Good point -- I'll make it clear it's only pertinent to BLAST.


> I think we'll need to retain the capability for text-based BLAST  
> parsing,

Agreed. Through the 1.6 release at least, I would think.


> particularly if NCBI continues to change the output.

Well, clearly the solution is not to use the NCBI flavor of BLAST. :)


Dave
(look at my email address)


From jay at jays.net  Thu Feb 15 13:08:56 2007
From: jay at jays.net (Jay Hannah)
Date: Thu, 15 Feb 2007 07:08:56 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D3167F.2000608@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
Message-ID: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>

On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
> getting sequences back from NCBI in the order we requested them in  
> batch
> mode.

Is this the same result you get?


DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
         Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
okay, 85.84%)
Failed Test Stat Wstat Total Fail  Failed  List of Failed
------------------------------------------------------------------------ 
-------
DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
8 subtests skipped.


Thanks,

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From bix at sendu.me.uk  Thu Feb 15 13:37:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 13:37:32 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
Message-ID: <45D4621C.6040309@sendu.me.uk>

Jay Hannah wrote:
> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>> getting sequences back from NCBI in the order we requested them in  
>> batch
>> mode.
> 
> Is this the same result you get?
> 
> 
> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97  
> okay, 85.84%)
> Failed Test Stat Wstat Total Fail  Failed  List of Failed
> ------------------------------------------------------------------------ 
> -------
> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
> 8 subtests skipped.

Yes, those fails are all caused by results in the wrong order (I believe).


From cjfields at uiuc.edu  Thu Feb 15 14:22:09 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:22:09 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <CF92D281-CAC2-415C-91A9-CBA0893336B9@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

I'm fixing those now so it doesn't depend on order and will commit in  
the next few minutes.

chris


From bix at sendu.me.uk  Thu Feb 15 14:37:00 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 15 Feb 2007 14:37:00 +0000
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
Message-ID: <45D4700C.8020305@sendu.me.uk>

Chris Fields wrote:
> 
> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
> 
>> Jay Hannah wrote:
>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>>> getting sequences back from NCBI in the order we requested them in
>>>> batch mode.
 >
> Okay, I committed a fix for that.  I hope there are many users who 
> depend on the returned sequence order for anything!

s/are/aren't/ ?

I suspect there might be, and its certainly a reasonable assumption to 
make. Did you not see an easy way of maintaining the order?


From cjfields at uiuc.edu  Thu Feb 15 14:28:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 08:28:46 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4621C.6040309@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
Message-ID: <586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>


On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:

> Jay Hannah wrote:
>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no longer
>>> getting sequences back from NCBI in the order we requested them in
>>> batch
>>> mode.
>>
>> Is this the same result you get?
>>
>>
>> DIED. FAILED tests 59-60, 63-64, 67-68, 71-72
>>          Failed 8/113 tests, 92.92% okay (less 8 skipped tests: 97
>> okay, 85.84%)
>> Failed Test Stat Wstat Total Fail  Failed  List of Failed
>> --------------------------------------------------------------------- 
>> ---
>> -------
>> DB.t           8  2048   113    8   7.08%  59-60 63-64 67-68 71-72
>> 8 subtests skipped.
>
> Yes, those fails are all caused by results in the wrong order (I  
> believe).

Okay, I committed a fix for that.  I hope there are many users who  
depend on the returned sequence order for anything!

chris


From michael.watson at bbsrc.ac.uk  Thu Feb 15 14:44:27 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 15 Feb 2007 14:44:27 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

OK I have some great images out of this glyph, but I can't see the axis,
and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
publication.  The docs say:

"NOTE: -gc_window=>'auto' gives nice results and is recommended for
drawing GC content. The GC content axes draw slightly outside the
panel, so you may wish to add some extra padding on the right and
left. "

Any idea how to do this?

Basically, I want a nice GC graph with the axis quite clearly labelled,
and a nice "%GC" title next to it :)

Thanks

Mick

The information contained in this message may be confidential or legally
privileged and is intended solely for the addressee. If you have
received this message in error please delete it & notify the originator
immediately.
Unauthorised use, disclosure, copying or alteration of this message is
forbidden & may be unlawful. 
The contents of this e-mail are the views of the sender and do not
necessarily represent the views of the Institute. 
This email and associated attachments has been checked locally for
viruses but we can accept no responsibility once it has left our
systems.
Communications on Institute computers are monitored to secure the
effective operation of the systems and for other lawful purposes. 


From nehadnahar at yahoo.co.in  Thu Feb 15 15:28:42 2007
From: nehadnahar at yahoo.co.in (Neha Nahar)
Date: Thu, 15 Feb 2007 15:28:42 +0000 (GMT)
Subject: [Bioperl-l] Convert newick to nexus format
In-Reply-To: <84703383-600F-42F4-A860-DD0D1C43EE83@bioperl.org>
Message-ID: <777943.33252.qm@web8404.mail.in.yahoo.com>

Thank you Jason. I ran the tests and they failed, so I re-installed the bioperl module and now it works fine.

Regards,
Neha.

Jason Stajich <jason at bioperl.org> wrote: Something is wrong with your install I am guessing - can you run the  
tests?
Go to bioperl directory:
$ perl t/TreeIO.t

can you describe how you installed bioperl?

On Feb 5, 2007, at 11:58 AM, Neha Nahar wrote:

>
> Hi,
> Thank you for the code.
> I tried it but I still get the same exception.
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus1.pl:18
>
>
> Please find attached the perl file(nexus.pl).
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
> Please let me know if I am using the correct version.If not, please  
> point me to the latest one.
>
> Thank you.
> Regards,
> nnahar
>
>
>
>
>
> Jason Stajich  wrote:please  cc the mailing list  
> when asking a question or followup.
>
> Sorry I don't know what you are doing wrong - you didn't resend  
> your code so I don't know if you still have a typo.
>
> This code works fine for me
>
> use Bio::TreeIO;
> use strict;
> my ($filein,$fileout) = @ARGV;
> my ($format,$oformat) = qw(newick nexus);
> my $in = Bio::TreeIO->new(-file  => $filein, -format => $format);my  
> $out= Bio::TreeIO->new(-format => $oformat, -file => ">$fileout");
>
>
> while( my $t = $in->next_tree ) {
>  $out->write_tree($t);
> }
>
>
>
> On Feb 5, 2007, at 11:24 AM, Neha Nahar wrote:
>
> Thank you very much for the reply.
>
>
> I fixed the code as per your suggestion,but now am getting a  
> different error:
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
> -------------  EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
> --------------------------------------
>
>
> Please help me out with this script.
>
>
> Thank you.
> Regards,
> Neha
>
>
>
>
>
>
>
>
> Jason Stajich  wrote: you want to write the TREE  
> out not the TREE WRITER.
>
>
>
>
> $treeout->write_tree($tree)
>
>
> not
> $treeout->write_tree($treeout);
>
>
> On Feb 5, 2007, at 9:59 AM, Neha Nahar wrote:
>
>
> Hello everyone,
>
>
>
>
> I am trying  to convert newick tree to nexus format.
> Using the script (refered from and email from George dated Wed Sep  
> 22 11:52:47 EDT 2004) :
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> $ cat nexus.pl
> #!/usr/bin/perl -w
>
>
>
>
> use Bio::TreeIO;
>
>
>
>
> ($NEWICKFILE, $NEXUSFILE) = @ARGV;
> print "newickfile=$NEWICKFILE,  nexusfile=$NEXUSFILE\n";
> my $treeio = new Bio::TreeIO(-format => 'newick', -file   =>  
> "$NEWICKFILE");
> my $treeout = new Bio::TreeIO(-format => 'nexus', -file   => "> 
> $NEXUSFILE");
> while(my $tree = $treeio->next_tree) {
>         $treeout->write_tree($treeout);
>     }
>
>
>
>
> exit 0;
>
>
>
>
>
>
>
>
> /*------------------------------------------------------------*/
>
>
>
>
> Running the script through command line:
> Gives the following error:
>
>
>
>
> $ ./nexus.pl mrp-input.txt nexus.out
> newickfile=mrp-input.txt, nexusfile=nexus.out
>
>
>
>
> ------------- EXCEPTION  -------------
> MSG: Cannot call method write_tree on Bio::TreeIO object must use a  
> subclass
> STACK  Bio::TreeIO::nexus::write_tree /usr/lib/perl5/vendor_perl/ 
> 5.8.8/Bio/TreeIO/nexus.pm:170
> STACK toplevel ./nexus.pl:23
>
>
>
>
> --------------------------------------
>
>
>
>
>
>
>
>
> Using  bioperl-1.5.2_101.tar.gz module from http://search.cpan.org/ 
> ~sendu/bioperl/Bio/TreeIO.pm
>
>
>
>
> Questions:-
>
>
>
>
> 1. Please let me know if I am using the correct version.
> If not, please point me to the latest one.
>
>
>
>
> 2. Provided that the version I am using is the right one, please  
> let me know what is wrong with the script.
>
>
>
>
> Thank you.
> Regards,
> Neha.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>
>
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
>  --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
>
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
>
>
>
>
>
>
>
>
> -Neha Nahar
>   " Work  for cause and not for applause, live to express and not  
> to impress !"
>
> ---------------------------------
>   Here?s a new way to find what you're looking for - Yahoo! Answers
>
>
> -Neha Nahar
>   " Work for cause and not for applause, live to express and not to  
> impress !"
>      
> ---------------------------------
>  Here?s a new way to find what you're looking for - Yahoo! Answers
> 


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-Neha Nahar
  " Work for cause and not for applause, live to express and not to impress !"
 				
---------------------------------
 Here?s a new way to find what you're looking for - Yahoo! Answers 


From cjfields at uiuc.edu  Thu Feb 15 15:44:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 09:44:23 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <45D4700C.8020305@sendu.me.uk>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
Message-ID: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>


On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:

> Chris Fields wrote:
>>
>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>
>>> Jay Hannah wrote:
>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no  
>>>>> longer
>>>>> getting sequences back from NCBI in the order we requested them in
>>>>> batch mode.
>>
>> Okay, I committed a fix for that.  I hope there are many users who
>> depend on the returned sequence order for anything!
>
> s/are/aren't/ ?

Yes, my oops.

> I suspect there might be, and its certainly a reasonable assumption to
> make. Did you not see an easy way of maintaining the order?

I haven't looked (been busy the last few days), but I think there is  
a way via efetch.

We could add in something to the default base URL if there is  
something or (probably better) add a sort_order() method to designate  
a particular sort order, defaulting to the old order if not set.

chris


From lstein at cshl.edu  Thu Feb 15 18:53:13 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 15 Feb 2007 13:53:13 -0500
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>

Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20);

This will leave enough room on the left and right for you to see the Y axis.
Otherwise it runs off the edge of the image (ok, this is a mis-design, but
it was the only way to solve a chicken-and-egg problem about who gets to say
how wide the panel is)

Lincoln

On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote:
>
> Hi
>
> OK I have some great images out of this glyph, but I can't see the axis,
> and nor is it labelled (ie does it go from 0 - 100%?) so isn't great for
> publication.  The docs say:
>
> "NOTE: -gc_window=>'auto' gives nice results and is recommended for
> drawing GC content. The GC content axes draw slightly outside the
> panel, so you may wish to add some extra padding on the right and
> left. "
>
> Any idea how to do this?
>
> Basically, I want a nice GC graph with the axis quite clearly labelled,
> and a nice "%GC" title next to it :)
>
> Thanks
>
> Mick
>
> The information contained in this message may be confidential or legally
> privileged and is intended solely for the addressee. If you have
> received this message in error please delete it & notify the originator
> immediately.
> Unauthorised use, disclosure, copying or alteration of this message is
> forbidden & may be unlawful.
> The contents of this e-mail are the views of the sender and do not
> necessarily represent the views of the Institute.
> This email and associated attachments has been checked locally for
> viruses but we can accept no responsibility once it has left our
> systems.
> Communications on Institute computers are monitored to secure the
> effective operation of the systems and for other lawful purposes.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From johnsonm at gmail.com  Thu Feb 15 19:24:08 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 13:24:08 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
Message-ID: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>

Done.  Bug opened in Bugzilla, diffs attached including new/updated tests:

http://bugzilla.open-bio.org/show_bug.cgi?id=2206

Can somebody grab that, take a look, tweak to taste, test and commit?  Tests
pass on my end presently.

On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> You'll also want to update whatever relevant tests there are for
> Glimmer; looks like they are in GenPred.t.
>
> chris
>


From cjfields at uiuc.edu  Thu Feb 15 19:37:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:37:22 -0600
Subject: [Bioperl-l] Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
References: <ebf5eb170609190833r5fa138e9gc6c51346e5f7890c@mail.gmail.com>
	<0FBEEE2B-9641-4EFA-A5D2-0DA839556E4C@gmx.net>
	<ebf5eb170609201247r12eb9475v5cb553e0727163c0@mail.gmail.com>
	<ebf5eb170702061553t29b0d1f2q402582abba032031@mail.gmail.com>
	<20F07AFE-BADF-4BA5-83B1-C99EEBE915C2@bioperl.org>
	<ebf5eb170702070850k10f1d24fr2d93d61534eeaaff@mail.gmail.com>
	<ebf5eb170702121513r20759845i21e366f916020baa@mail.gmail.com>
	<a79f6a4b0702121807i47d6360cwed1b0276865d6bd7@mail.gmail.com>
	<ebf5eb170702131210l1fa27bb5n31095e9ca2d4ecf2@mail.gmail.com>
	<DB4D80A7-1D29-4809-A7D2-FCE5DB2502B4@uiuc.edu>
	<ebf5eb170702151124l52492a51mbbca2b2e346d5ee8@mail.gmail.com>
Message-ID: <4C15214E-AE4B-4D85-A710-60536B08BE86@uiuc.edu>


On Feb 15, 2007, at 1:24 PM, Mark Johnson wrote:

> Done.  Bug opened in Bugzilla, diffs attached including new/updated  
> tests:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2206
>
> Can somebody grab that, take a look, tweak to taste, test and  
> commit?  Tests
> pass on my end presently.
>
> On 2/13/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> You'll also want to update whatever relevant tests there are for
>> Glimmer; looks like they are in GenPred.t.
>>
>> chris

Done; everything passed on this end as well, no tweaking necessary.   
If there are problems we'll definitely hear about it down the road  
(Glimmer is a popular tool), but I think you'll be fine.

Thanks Mark!

chris


From cjfields at uiuc.edu  Thu Feb 15 19:46:07 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 13:46:07 -0600
Subject: [Bioperl-l] DB.t failures
In-Reply-To: <809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
References: <45D3167F.2000608@sendu.me.uk>
	<AEABC2CF-AABF-4BC9-9E66-17F2E3A51B37@jays.net>
	<45D4621C.6040309@sendu.me.uk>
	<586AAA5B-DFA6-4746-BA70-F3C16CA97EE3@uiuc.edu>
	<45D4700C.8020305@sendu.me.uk>
	<809A1A9D-0344-412D-98D7-EA852EFED63D@uiuc.edu>
Message-ID: <FA9F2E96-064B-4C8F-87BB-D72A7D6F6910@uiuc.edu>


On Feb 15, 2007, at 9:44 AM, Chris Fields wrote:

>
> On Feb 15, 2007, at 8:37 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>>
>>> On Feb 15, 2007, at 7:37 AM, Sendu Bala wrote:
>>>
>>>> Jay Hannah wrote:
>>>>> On Feb 14, 2007, at 8:02 AM, Sendu Bala wrote:
>>>>>> DB.t is failing with BIOPERLDEBUG set. Apparently, we are no
>>>>>> longer
>>>>>> getting sequences back from NCBI in the order we requested  
>>>>>> them in
>>>>>> batch mode.
>>>
>>> Okay, I committed a fix for that.  I hope there are many users who
>>> depend on the returned sequence order for anything!
>>
>> s/are/aren't/ ?
>
> Yes, my oops.
>
>> I suspect there might be, and its certainly a reasonable  
>> assumption to
>> make. Did you not see an easy way of maintaining the order?
>
> I haven't looked (been busy the last few days), but I think there is
> a way via efetch.
>
> We could add in something to the default base URL if there is
> something or (probably better) add a sort_order() method to designate
> a particular sort order, defaulting to the old order if not set.
>
> chris

Delving in to it further, the problem only occurs when using  
get_seq_stream() directly in batch mode, which is likely only used by  
developers for testing.  The sort issue only pops up when eposting  
IDs using that mode; retrieved seqs are returned in a different order  
than through a direct efetch query (the default with get_Stream* or  
get_Seq* methods).  No use of the 'sort' parameter works to get  
around that problem, not a complete surprise since it is supposed to  
only work for PubMed, but since the method is rarely used I'll just  
leave the bullet-proofed tests alone.

chris


From letondal at pasteur.fr  Thu Feb 15 20:23:55 2007
From: letondal at pasteur.fr (Catherine Letondal)
Date: Thu, 15 Feb 2007 21:23:55 +0100
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
Message-ID: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>

Hi bioperlers,

I have a script called protal2dna 
(http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, see 
attachment #1) that realign DNA sequences giving their sequences + the 
corresponding protein alignment (sequences have to be in the same order 
or named equivalently). We have a parsing problem reported from the 
AlignIO class when users enter some clustalw file (see attachment #2 
for an example):

% protal2dna alig-protal2dna.dat dna-protal2dna.data
no alignment available in 'clustalw' format from file 
'alig-protal2dna.dat'
%

I have tried with bioperl 1.4. I have looked in the archive and in the 
BUGS, but found nothing?
Is there any bug fix for this? I also provide the DNA sequences file if 
you want to test.

Thanks a lot in advance,

--
Catherine Letondal -- Institut Pasteur
www.pasteur.fr/~letondal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: protal2dna
Type: application/octet-stream
Size: 11093 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0012.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: alig-protal2dna.dat
Type: application/octet-stream
Size: 12022 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0013.obj>
-------------- next part --------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dna-protal2dna.data
Type: application/octet-stream
Size: 7739 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070215/42a0ba43/attachment-0014.obj>

From Kevin.M.Brown at asu.edu  Thu Feb 15 21:38:25 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 15 Feb 2007 14:38:25 -0700
Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
In-Reply-To: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
References: <ab16cb62f58278aaf565a0fd73ba4009@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B402BA7764@EX02.asurite.ad.asu.edu>

Did you try Bioperl 1.5.2 to see if updates to it might fix the issue?
IIRC 1.4 is nearly 2 years old now.  1.5.2 was released within the last
few months.

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Catherine Letondal
> Sent: Thursday, February 15, 2007 1:24 PM
> To: bioperl-l
> Cc: Catherine Letondal; Katja Schuerer
> Subject: [Bioperl-l] Problem parsing clustalw format ni AlignIO
> 
> Hi bioperlers,
> 
> I have a script called protal2dna
> (http://bioweb.pasteur.fr/seqanal/interfaces/protal2dna.html, 
> see attachment #1) that realign DNA sequences giving their 
> sequences + the corresponding protein alignment (sequences 
> have to be in the same order or named equivalently). We have 
> a parsing problem reported from the AlignIO class when users 
> enter some clustalw file (see attachment #2 for an example):
> 
> % protal2dna alig-protal2dna.dat dna-protal2dna.data no 
> alignment available in 'clustalw' format from file 
> 'alig-protal2dna.dat'
> %
> 
> I have tried with bioperl 1.4. I have looked in the archive 
> and in the BUGS, but found nothing?
> Is there any bug fix for this? I also provide the DNA 
> sequences file if you want to test.
> 
> Thanks a lot in advance,
> 
> --
> Catherine Letondal -- Institut Pasteur
> www.pasteur.fr/~letondal
> 
> 


From cjfields at uiuc.edu  Thu Feb 15 21:50:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:50:54 -0600
Subject: [Bioperl-l] Bio::Root::Utilities.pm
In-Reply-To: <8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
References: <8f200b4c0702131500m2f3790nceace3c3e3dffd4f@mail.gmail.com>
	<1D04BA5E-359E-4AC5-B674-8ECDC4F81E83@uiuc.edu>
	<8f200b4c0702141020x5815018bvd69873222582d9d5@mail.gmail.com>
Message-ID: <C53B465C-8BBA-4DE7-92BC-FFC5DDBEB4AA@uiuc.edu>


On Feb 14, 2007, at 12:20 PM, Steve Chervitz wrote:
...

>>
>> I don't have a problem with adding it back, esp. if tests are added.
>> Everything in Bio::Root* not tied to a module was yanked out when no
>> one spoke up about cleaning up Bio::Root* modules:
>>
>> http://thread.gmane.org/gmane.comp.lang.perl.bio.general/12792/
>> focus=12839
>>
>> Maybe others disagree?
>>
>> chris
>>
>
> Sorry I missed out on that thread. I had some trouble with my  
> bioperl-l
> email delivery getting disabled due to excessive bounces, and it  
> took me a
> while to catch it.
>
> Bio::Root::Utilities is quite a grab bag of miscellaneous general  
> functions
> that are occasionally useful for perl scripting (e.g., determining
> end-of-line characters, sending email, etc.). The code could  
> definitely use
> a review, and maybe an example script to advertise it. I can look  
> into this,
> and suggestions are welcome.
>
> Steve

Steve,

I have added Root::Utilities back to CVS but I didn't know if I  
should add back the other related Root modules (didn't know what your  
future plans were for them).  Could the Bio::Root::Global and  
Bio::Root::Object stuff be consolidated into Bio::Root::Utilities or  
would that be too problematic?  None of the other Bio* modules  
currently use them.

Personally, I use Date::Manip for anything that requires date/time  
manipulation (updating seq records based on dates, for instance).   
Some of the other utilities could come in handy, though.  Don't know  
if that helps...

chris


From cjfields at uiuc.edu  Thu Feb 15 21:51:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 15:51:58 -0600
Subject: [Bioperl-l] XEMBL deprecation
Message-ID: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>

I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService  
both for deprecation in the wiki and in CVS (though I haven't set any  
timeline):

http://www.bioperl.org/wiki/Deprecated_modules

The XEMBL web services are no longer available, and it looks like  
everything is running through DBFetch now.  The XEMBL tests are  
skipped if no server is detected, so they shouldn't cause any  
problems with Bioperl installations.

Lincoln, was there anything to salvage from these?  I noticed they  
used SOAP::Lite, so maybe we could convert these over to a SOAP-based  
interface to DBFetch web services?

chris


From johnsonm at gmail.com  Thu Feb 15 22:29:37 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 15 Feb 2007 16:29:37 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Glimmer?
Message-ID: <ebf5eb170702151429w233ec66dkfb89743a4b8e687e@mail.gmail.com>

    Now that I've got Bio::Tools::Glimmer parsing Glimmer2 and Glimmer3
output, I suppose I might as well go and write Bio::Tools::Run::Glimmer.  I
suspect another 4-in-1 module may be possible.  Now that I think about it,
I'll need one for GeneMark, too.
    Comments?  Suggestions on a good module to use as a template?


From hlapp at gmx.net  Fri Feb 16 01:18:56 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:18:56 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>


On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:

> The XEMBL web services are no longer available

What happens if someone invokes the module? Should it maybe return  
nothing and warn()? I don't think it's a good idea if the module just  
silently does not function because its backend is no more.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Fri Feb 16 01:48:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:48:12 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
Message-ID: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>

On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:

> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>
>> The XEMBL web services are no longer available
>
> What happens if someone invokes the module? Should it maybe return  
> nothing and warn()? I don't think it's a good idea if the module  
> just silently does not function because its backend is no more.
>
> 	-hilmar

Yes, I thought the same.  I have added a warn() noting the  
deprecation to the XEMBL constructor and removed XEMBL tests from  
CVS.  The modules are still there for the time being.

I actually worry more about the internals; it would be a shame to  
toss them altogether.  Would it be worth it to shift this towards a  
SOAP-based interface to DBFetch?  Or, more precisely, how much  
trouble would it be to do so?

chris


From hlapp at gmx.net  Fri Feb 16 01:54:29 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 15 Feb 2007 20:54:29 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
Message-ID: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>

Well, if dbFetch dosn't have a SOAP based interface, how would you  
want to do this?

	-hilmar

On Feb 15, 2007, at 8:48 PM, Chris Fields wrote:

> On Feb 15, 2007, at 7:18 PM, Hilmar Lapp wrote:
>
>> On Feb 15, 2007, at 4:51 PM, Chris Fields wrote:
>>
>>> The XEMBL web services are no longer available
>>
>> What happens if someone invokes the module? Should it maybe return  
>> nothing and warn()? I don't think it's a good idea if the module  
>> just silently does not function because its backend is no more.
>>
>> 	-hilmar
>
> Yes, I thought the same.  I have added a warn() noting the  
> deprecation to the XEMBL constructor and removed XEMBL tests from  
> CVS.  The modules are still there for the time being.
>
> I actually worry more about the internals; it would be a shame to  
> toss them altogether.  Would it be worth it to shift this towards a  
> SOAP-based interface to DBFetch?  Or, more precisely, how much  
> trouble would it be to do so?
>
> chris

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Fri Feb 16 01:59:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 15 Feb 2007 19:59:46 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<989C9C17-9D85-4B3F-A9FF-C9B9372D38AD@gmx.net>
	<00A97EC9-A066-4848-A2A4-8E0773797783@uiuc.edu>
	<FB0CC994-F2C2-407C-9EFB-6B8083B9545C@gmx.net>
Message-ID: <8C7E18C6-B38D-4E15-BE9C-84256B09C312@uiuc.edu>


On Feb 15, 2007, at 7:54 PM, Hilmar Lapp wrote:

> Well, if dbFetch dosn't have a SOAP based interface, how would you  
> want to do this?
>
> 	-hilmar

DBfetch has a SOAP-based interface:

http://www.ebi.ac.uk/Tools/webservices/services/dbfetch

Just not sure how easy it would be to switch XEMBL code over to using  
it.  We already have Bio::DB::DBFetch so it may be redundant, but I  
don't recall any other SOAP-based tools in BioPerl beyond some stuff  
in bioperl-run (and I'm not sure how up-to-date the DBFetch module is).

chris


From jimhu at tamu.edu  Fri Feb 16 05:20:09 2007
From: jimhu at tamu.edu (Jim Hu)
Date: Thu, 15 Feb 2007 23:20:09 -0600
Subject: [Bioperl-l] Pathway tools output parser
In-Reply-To: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
References: <Pine.LNX.4.44.0702062205510.13338-100000@sos.lbl.gov>
Message-ID: <1632E2BF-4402-47DE-B750-9763E02711D2@tamu.edu>

Hi Chris,

I need to check the list more often!  I never got an answer here, but  
Eric Just pointed out a perl api at TAIR that's linked from the  
BioCyc site.  I've used the lisp parser functions from that to move  
the data to a perl array of arrays, and I'm working on creating  
object classes for BioCyc objects, starting with genes and products.

I need to look at the appropriate ways to link this up to the  
existing codebase for interconverting to Chado and other BioPerl data  
types.

Jim
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054


On Feb 7, 2007, at 12:07 AM, Chris Mungall wrote:

>
> Hi Jim
>
> Did you ever get an answer to this? I'm interested in storing  
> pathway data
> in Chado & I remember enough lisp to get it into something perl- 
> manageable
> like XML
>
> On Thu, 25 Jan 2007, Jim Hu wrote:
>
>> Is there a module to parse the lisp object files from Peter Karp's
>> Pathway Tools?   I need a parser to convert the gene and protein
>> objects in EcoCyc releases into something that can be imported into
>> Chado.
>> =====================================
>> Jim Hu
>> Associate Professor
>> Dept. of Biochemistry and Biophysics
>> 2128 TAMU
>> Texas A&M Univ.
>> College Station, TX 77843-2128
>> 979-862-4054
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From lstein at cshl.edu  Fri Feb 16 13:35:19 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:35:19 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D1E2A5.6060104@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
Message-ID: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>

Hi,

Older versions of Storable can't deal with features that contain subroutine
refs. You should get the current version from CPAN. Note that there is a
slight security problem here if you don't trust the objects stored in the
database. If they contain code refs, the code will be evaluated during
deserialization.

Lincoln

On 2/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> I have some raw sequences in a Bio::DB::SeqFeature::Store mysql database
> and wanted to associated some basic information with them, like exon
> positions. I thought of creating Bio::SeqFeature::Gene::Transcript
> objects and storing them so I could later use features() to see what
> other features overlapped exons. I ran into a fatal error that can be
> replicated with the following simplified one-liner:
>
> perl -MBio::DB::SeqFeature::Store -MBio::SeqFeature::Gene::Transcript -e
> '$db = Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -dsn =>
> "dbi:mysql:test"); $trans =
> Bio::SeqFeature::Gene::Transcript->new(-start => 1, -end => 2, -seq_id
> => "test"); $db->store($trans); @trans = $db->features(-seqid => $id,
> -type => "transcript"); print "@trans\n";'
>
> code sub {
>      package Bio::SeqFeature::Generic;
>      use strict 'refs';
>      my $self = shift @_;
>      foreach my $f (@{[] unless $$self{'_gsf_sub_array'};}) {
>          $f = undef;
>      }
>      $$self{'_gsf_seq'} = undef;
>      foreach my $t (keys %{{} unless $$self{'_gsf_tag_hash'};}) {
>          $$self{'_gsf_tag_hash'}{$t} = undef;
>          delete $$self{'_gsf_tag_hash'}{$t};
>      }
> } did not evaluate to a subroutine reference, at
> /.../Bio/DB/SeqFeature/Store.pm line 2280
>
>
> Is this a bug? Or am I taking the wrong approach?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 13:47:29 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:47:29 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160547s5873cd2bg2c5cf09779138249@mail.gmail.com>

Hi Sendu,

I'll do a little digging and let you know.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 13:52:30 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:52:30 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <45D5B42A.1080303@sendu.me.uk>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
	<45D5B42A.1080303@sendu.me.uk>
Message-ID: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>

It looks like 2.05 or higher is the Storable version to use. It requires
B::Deparse, which is (I think) standard on perl 5.6 or higher.

Lincoln

On 2/16/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Lincoln Stein wrote:
> > Hi,
> >
> > Older versions of Storable can't deal with features that contain
> > subroutine refs. You should get the current version from CPAN.
>
> Do you have any idea which version of Storable first supported this? I
> can specify that version in Bioperl's Build.PL.
>
> (else I just just specify the latest version)
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 13:55:06 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:06 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
Message-ID: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>

I like the idea of converting these over to use DBFetch's SOAP services. On
the other hand, it isn't llikely that I'm going to have time to do this
anytime soon.

Probably the best thing to do is to issue a warning and return undef if
someone tries to use othe XEMBL module. I'll make that change.

Lincoln

On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Fri Feb 16 13:55:47 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 16 Feb 2007 08:55:47 -0500
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
Message-ID: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>

Oh, looks like someone has inserted the warnings already. Good.

Lincoln

On 2/16/07, Lincoln Stein <lstein at cshl.edu> wrote:
>
> I like the idea of converting these over to use DBFetch's SOAP services.
> On the other hand, it isn't llikely that I'm going to have time to do this
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return undef if
> someone tries to use othe XEMBL module. I'll make that change.
>
> Lincoln
>
> On 2/15/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >
> > I have gone ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> > both for deprecation in the wiki and in CVS (though I haven't set any
> > timeline):
> >
> > http://www.bioperl.org/wiki/Deprecated_modules
> >
> > The XEMBL web services are no longer available, and it looks like
> > everything is running through DBFetch now.  The XEMBL tests are
> > skipped if no server is detected, so they shouldn't cause any
> > problems with Bioperl installations.
> >
> > Lincoln, was there anything to salvage from these?  I noticed they
> > used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> > interface to DBFetch web services?
> >
> > chris
> >
>
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From bix at sendu.me.uk  Fri Feb 16 13:56:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:56:50 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>	
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>	
	<45D5B42A.1080303@sendu.me.uk>
	<6dce9a0b0702160552l11417675kec7b5b4537cbfa70@mail.gmail.com>
Message-ID: <45D5B822.6080908@sendu.me.uk>

Lincoln Stein wrote:
> It looks like 2.05 or higher is the Storable version to use. It requires 
> B::Deparse, which is (I think) standard on perl 5.6 or higher.

Thanks, now recommended in Build.PL


From cjfields at uiuc.edu  Fri Feb 16 14:05:08 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 16 Feb 2007 08:05:08 -0600
Subject: [Bioperl-l] XEMBL deprecation
In-Reply-To: <6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
References: <72AD3CD9-4D42-4E65-A54E-56D01741CDCD@uiuc.edu>
	<6dce9a0b0702160555i7eae057dt1c3cd46b1f732bd1@mail.gmail.com>
	<6dce9a0b0702160555s5b4d7d70n936c603bc79dd40d@mail.gmail.com>
Message-ID: <ACAF9E26-CBDD-43AC-8D3E-0CADFF5B9576@uiuc.edu>

I added the warning yesterday.

We can add something to the project priority list on modifying XEMBL  
to use DBFetch instead; I like the SOAP-based interface.  I am  
thinking of a similar interface for NCBI eutils but I haven't had  
time to work on it.

chris

On Feb 16, 2007, at 7:55 AM, Lincoln Stein wrote:

> Oh, looks like someone has inserted the warnings already. Good.
>
> Lincoln
>
> On 2/16/07, Lincoln Stein <lstein at cshl.edu > wrote:I like the idea  
> of converting these over to use DBFetch's SOAP services. On the  
> other hand, it isn't llikely that I'm going to have time to do this  
> anytime soon.
>
> Probably the best thing to do is to issue a warning and return  
> undef if someone tries to use othe XEMBL module. I'll make that  
> change.
>
> Lincoln
>
>
> On 2/15/07, Chris Fields < cjfields at uiuc.edu> wrote: I have gone  
> ahead and marked Bio::DB::XEMBL and Bio::DB::XEMBLService
> both for deprecation in the wiki and in CVS (though I haven't set any
> timeline):
>
> http://www.bioperl.org/wiki/Deprecated_modules
>
> The XEMBL web services are no longer available, and it looks like
> everything is running through DBFetch now.  The XEMBL tests are
> skipped if no server is detected, so they shouldn't cause any
> problems with Bioperl installations.
>
> Lincoln, was there anything to salvage from these?  I noticed they
> used SOAP::Lite, so maybe we could convert these over to a SOAP-based
> interface to DBFetch web services?
>
> chris
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Feb 16 13:39:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 16 Feb 2007 13:39:54 +0000
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store problem with transcripts
In-Reply-To: <6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
References: <45D1E2A5.6060104@sendu.me.uk>
	<6dce9a0b0702160535x946d6e5q4eda94ad174f1bac@mail.gmail.com>
Message-ID: <45D5B42A.1080303@sendu.me.uk>

Lincoln Stein wrote:
> Hi,
> 
> Older versions of Storable can't deal with features that contain 
> subroutine refs. You should get the current version from CPAN.

Do you have any idea which version of Storable first supported this? I 
can specify that version in Bioperl's Build.PL.

(else I just just specify the latest version)


From eu at otelo-online.de  Sat Feb 17 12:55:08 2007
From: eu at otelo-online.de (eu at otelo-online.de)
Date: Sat, 17 Feb 2007 13:55:08 +0100 (CET)
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
Message-ID: <29037001.1171716908969.JavaMail.ngmail@webmail18>

Hello @all,

i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.

Can somebody help me? I dont know  whether it is  possible?
Because i need for each amino acid a positive, negative charge and unchargedly.

thx
 

Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: g?nstig
und schnell mit DSL - das All-Inclusive-Paket f?r clevere Doppel-Sparer,
nur  44,85 ?  inkl. DSL- und ISDN-Grundgeb?hr!
http://www.arcor.de/rd/emf-dsl-2


From The_Polymorph at rocketmail.com  Sun Feb 18 19:08:34 2007
From: The_Polymorph at rocketmail.com (Caitlin)
Date: Sun, 18 Feb 2007 11:08:34 -0800 (PST)
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
Message-ID: <148421.50501.qm@web50801.mail.yahoo.com>

Hi.

In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
1.5.2_100, I noticed the ppm was not found on the activestate
repositories. 

Thanks,

~Caitlin


____________________________________________________________________________________
No need to miss a message. Get email on-the-go 
with Yahoo! Mail for Mobile. Get started.
http://mobile.yahoo.com/mail 


From bix at sendu.me.uk  Sun Feb 18 20:36:03 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 18 Feb 2007 20:36:03 +0000
Subject: [Bioperl-l] Missing ppm for Bioperl 1.5.2_100(?)
In-Reply-To: <148421.50501.qm@web50801.mail.yahoo.com>
References: <148421.50501.qm@web50801.mail.yahoo.com>
Message-ID: <45D8B8B3.4000408@sendu.me.uk>

Caitlin wrote:
> Hi.
> 
> In an attempt to upgrade my Bioperl install from 1.5.2 RC5 to
> 1.5.2_100, I noticed the ppm was not found on the activestate
> repositories. 

Follow the install instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Its not in the normal activestate repository, but on bioperl.org.


From t.nugent at cs.ucl.ac.uk  Mon Feb 19 17:29:48 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Mon, 19 Feb 2007 17:29:48 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
Message-ID: <45D9DE8C.2010301@cs.ucl.ac.uk>

Hi everyone,

I've written a perl module to display transmembrane protein topology 
using GD. There are various options, including labels, helix/loop 
dimensions, colour schemes etc but it only requires a string or array 
containing the protein topology (e.g. transmembrane helix start/stop 
points). It produces output like this:

http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png

using the code at the bottom.

Here is a the module:
http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm

I've never submitted anything to Bioperl before - is this sort of thing 
likely to be of use to others? I imagine it would sit alongside some of 
the Bio::Graphics stuff.

Best wishes,

Tim

#!/usr/bin/perl

use strict;
use warnings;
use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
use DrawTransmembrane;

my @topology = (20,45,59,70,86,109,145,168,194,220);

my %labels = ('5' => '5 - Sulphation Site',
               '21' => '1st Helix',
               '47' => '40 - Mutation',
               '60' => 'Voltage Sensor',
               '72' => '72 - Mutation 2',
               '73' => '73 - Mutation 3',
               '138' => '138 - Glycosylation Site',
               '170' => '170 - Phosphorylation Site',
               '200' => 'Last Helix');

my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
cartoon displaying transmembrane helices.',
                                                -topology => \@topology,
                                                -n_terminal => 'out',
                                                -helix_width => 48,
                                                -helix_height => 125,
                                                -short_loop_limit => 10,
                                                -long_loop_limit => 35,
                                                -loop_width => 25,
                                                -colour_scheme => 'yellow',
                                                -labels => \%labels,
                                                -text_offset => -10);

## print the .png file
my $output = 'test.png';
open(OUTPUT, ">$output");
binmode OUTPUT;
print OUTPUT $im->png;
close OUTPUT;

my $system = `display $output`;

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From bix at sendu.me.uk  Mon Feb 19 17:42:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 19 Feb 2007 17:42:23 +0000
Subject: [Bioperl-l] t/FeatureHolder.x
Message-ID: <45D9E17F.4030302@sendu.me.uk>

Is this supposed to work? It doesn't get run in the test suite normally 
because of its name.

With a live checkout I get:
./Build test --test_files t/FeatureHolder.x --verbose
t/FeatureHolder....1..6
ok 1
ok 2
Set group tag to: locus_tag
GROUPS:
   GROUP [?]:source

[snip]

   resolved pair Bio::SeqFeature::Generic=HASH(0x1375dc0) 
Bio::SeqFeature::Generic=HASH(0x1362830)
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [?]:gene
UNFLATTENING GROUP:
   GROUP [?]:repeat_region
UNFLATTENING GROUP:
   GROUP [BG:DS07721.3]:gene mRNA CDS
UNFLATTENING GROUP:
   GROUP [BG:DS07721.6]:gene mRNA CDS

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: DUPLICATE ID: AAF53399.1
STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/sendu/src/bioperl/core/blib/lib/Bio/Root/Root.pm:359
STACK: 
Bio::SeqFeature::Tools::IDHandler::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/SeqFeature/Tools/IDHandler.pm:175
STACK: Bio::FeatureHolderI::create_hierarchy_from_ParentIDs 
/home/sendu/src/bioperl/core/blib/lib/Bio/FeatureHolderI.pm:245
STACK: t/FeatureHolder.x:68
-----------------------------------------------------------
dubious
         Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 3-6
         Failed 4/6 tests, 33.33% okay
Failed Test       Stat Wstat Total Fail  List of Failed
-------------------------------------------------------------------------------
t/FeatureHolder.x  255 65280     6    8  3-6
Failed 1/1 test scripts. 4/6 subtests failed.
Files=1, Tests=6,  1 wallclock secs ( 0.55 cusr +  0.04 csys =  0.59 CPU)
Failed 1/1 test programs. 4/6 subtests failed.


It also fails quite differently with 1.5.2.


From cjfields at uiuc.edu  Mon Feb 19 20:04:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 14:04:20 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <45D9E17F.4030302@sendu.me.uk>
References: <45D9E17F.4030302@sendu.me.uk>
Message-ID: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>

Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know  
if he's stalking the mail list.

Wonder if this has anything to do the feature/annotation changes  
around rel 1.5.

(the other) chris

On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:

> Is this supposed to work? It doesn't get run in the test suite  
> normally
> because of its name.
>
> With a live checkout I get:
> ./Build test --test_files t/FeatureHolder.x --verbose
> t/FeatureHolder....1..6
...


From cjfields at uiuc.edu  Mon Feb 19 21:24:04 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 19 Feb 2007 15:24:04 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein toplogoy
In-Reply-To: <45D9DE8C.2010301@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
Message-ID: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>

I think this is pretty nice!  We can add the code and test script to  
bugzilla and (if someone has time) try to see where it might fit in,  
though Bio::Graphics sounds like a good spot.

Anyone else have ideas on where this could go?

chris

On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:

> Hi everyone,
>
> I've written a perl module to display transmembrane protein topology
> using GD. There are various options, including labels, helix/loop
> dimensions, colour schemes etc but it only requires a string or array
> containing the protein topology (e.g. transmembrane helix start/stop
> points). It produces output like this:
>
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>
> using the code at the bottom.
>
> Here is a the module:
> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>
> I've never submitted anything to Bioperl before - is this sort of  
> thing
> likely to be of use to others? I imagine it would sit alongside  
> some of
> the Bio::Graphics stuff.
>
> Best wishes,
>
> Tim
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
> use DrawTransmembrane;
>
> my @topology = (20,45,59,70,86,109,145,168,194,220);
>
> my %labels = ('5' => '5 - Sulphation Site',
>                '21' => '1st Helix',
>                '47' => '40 - Mutation',
>                '60' => 'Voltage Sensor',
>                '72' => '72 - Mutation 2',
>                '73' => '73 - Mutation 3',
>                '138' => '138 - Glycosylation Site',
>                '170' => '170 - Phosphorylation Site',
>                '200' => 'Last Helix');
>
> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
> cartoon displaying transmembrane helices.',
>                                                 -topology =>  
> \@topology,
>                                                 -n_terminal => 'out',
>                                                 -helix_width => 48,
>                                                 -helix_height => 125,
>                                                 -short_loop_limit  
> => 10,
>                                                 -long_loop_limit =>  
> 35,
>                                                 -loop_width => 25,
>                                                 -colour_scheme =>  
> 'yellow',
>                                                 -labels => \%labels,
>                                                 -text_offset => -10);
>
> ## print the .png file
> my $output = 'test.png';
> open(OUTPUT, ">$output");
> binmode OUTPUT;
> print OUTPUT $im->png;
> close OUTPUT;
>
> my $system = `display $output`;
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Feb 19 22:23:56 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 19 Feb 2007 14:23:56 -0800
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
Message-ID: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>


On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:

> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
> if he's stalking the mail list.

occasionally..

> Wonder if this has anything to do the feature/annotation changes
> around rel 1.5.

possibly even before then.

there was a reason for the .x prefix... I think it was intended to  
denote requirements; tests that don't pass yet but should in the future

anyway, this file can go

> (the other) chris
>
> On Feb 19, 2007, at 11:42 AM, Sendu Bala wrote:
>
>> Is this supposed to work? It doesn't get run in the test suite
>> normally
>> because of its name.
>>
>> With a live checkout I get:
>> ./Build test --test_files t/FeatureHolder.x --verbose
>> t/FeatureHolder....1..6
> ...
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From torsten.seemann at infotech.monash.edu.au  Mon Feb 19 23:20:48 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 20 Feb 2007 10:20:48 +1100
Subject: [Bioperl-l] Bioperl Module OddCodes(help)
In-Reply-To: <29037001.1171716908969.JavaMail.ngmail@webmail18>
References: <29037001.1171716908969.JavaMail.ngmail@webmail18>
Message-ID: <a79f6a4b0702191520l55625d6dif027df04b9841587@mail.gmail.com>

> i want translate a Sequence in Fasta Format  only to acidic,basic and polar dependent on the pH.
> OddCodes Module can ony to acidic,basic, polar and hydrophobic. And i think on default pH.
> Can somebody help me? I dont know  whether it is  possible?
> Because i need for each amino acid a positive, negative charge and unchargedly.

The latest released Bioperl 1.5.x has a charge() function which does
what you want:

http://doc.bioperl.org/releases/bioperl-1.5.2/Bio/Tools/OddCodes.html

It returns A, N, C for the charges.

--Torsten


From bix at sendu.me.uk  Tue Feb 20 11:18:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 20 Feb 2007 11:18:14 +0000
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
Message-ID: <45DAD8F6.1030409@sendu.me.uk>

Bio::Graphics::FeatureBase::seq_id is currently implemented as a 
read-only alias to ref():
sub seq_id          { shift->ref()         }


What is the reasoning behind this? Can it be made to handle setting of 
the value as well?:
sub seq_id          { shift->ref(@_)       }


Cheers,
Sendu.


From cjfields at uiuc.edu  Tue Feb 20 13:39:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:39:11 -0600
Subject: [Bioperl-l] t/FeatureHolder.x
In-Reply-To: <F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
References: <45D9E17F.4030302@sendu.me.uk>
	<534215C5-11BB-48FA-B6BF-D8B3E2BFE20A@uiuc.edu>
	<F54A07D7-B546-46E5-AFB2-54251ACE9182@fruitfly.org>
Message-ID: <67E26F10-67D5-405E-A00E-826EF51C476F@uiuc.edu>


On Feb 19, 2007, at 4:23 PM, Chris Mungall wrote:

> On Feb 19, 2007, at 12:04 PM, Chris Fields wrote:
>
>> Looks like that's some of Chris Mungall's stuff for GFF3.  Don't know
>> if he's stalking the mail list.
>
> occasionally..
>
>> Wonder if this has anything to do the feature/annotation changes
>> around rel 1.5.
>
> possibly even before then.
>
> there was a reason for the .x prefix... I think it was intended to
> denote requirements; tests that don't pass yet but should in the  
> future
>
> anyway, this file can go

Chris,

I removed it from CVS.  Thanks!

(the other) chris besides chris D.

P.S. I may have some Data::Stag questions for you at some point.  I'm  
guessing you're still at fruitfly.org?


From cjfields at uiuc.edu  Tue Feb 20 13:29:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 07:29:20 -0600
Subject: [Bioperl-l] Fwd: help on remote blast
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <6CC54E14-0581-45AF-8F12-E500A2FFDE86@uiuc.edu>

Sanjib,

You shouldn't email the developers directly.  Questions like this  
should go to the bioperl mail list in case I (or others) can't answer  
them immediately.

chris

Begin forwarded message:

> From: "Sanjib Kumar Gupta" <sanjib at bic.boseinst.ernet.in>
> Date: February 20, 2007 1:32:00 AM CST
> To: cjfields at uiuc.edu
> Subject: help on remote blast
>
> Dear Dr. Chris
> I am very new usedr to bioperl. and have been using the script for
> retrieving some blast sequences . But suddenly it has stopped  
> retrieving
> #perl n9.pl
> te.pep
> waiting........
> for a long time
>
> I am attaching the file. Can you please tell me what I should do so  
> that it
> again runs.
>
>
> --
> Sanjib Kumar Gupta
> Bioinformatics Centre
> Bose Institute
> Kolkata 700054, INDIA
> Phone  : +91-33-2355 6626, 2816, 2355 4766
> Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070220/02f96eab/attachment-0004.pl>
-------------- next part --------------

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 14:31:20 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 14:31:20 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
Message-ID: <45DB0638.1030001@cs.ucl.ac.uk>

Thanks Chris, glad it's appreciated.

Is there anything else I can do? If anyone has any requests/suggestions 
please let me know too.

Best wishes,

Tim

Chris Fields wrote:
> I think this is pretty nice!  We can add the code and test script to  
> bugzilla and (if someone has time) try to see where it might fit in,  
> though Bio::Graphics sounds like a good spot.
> 
> Anyone else have ideas on where this could go?
> 
> chris
> 
> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
> 
>> Hi everyone,
>>
>> I've written a perl module to display transmembrane protein topology
>> using GD. There are various options, including labels, helix/loop
>> dimensions, colour schemes etc but it only requires a string or array
>> containing the protein topology (e.g. transmembrane helix start/stop
>> points). It produces output like this:
>>
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>
>> using the code at the bottom.
>>
>> Here is a the module:
>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>
>> I've never submitted anything to Bioperl before - is this sort of  
>> thing
>> likely to be of use to others? I imagine it would sit alongside  
>> some of
>> the Bio::Graphics stuff.
>>
>> Best wishes,
>>
>> Tim
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to module
>> use DrawTransmembrane;
>>
>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>
>> my %labels = ('5' => '5 - Sulphation Site',
>>                '21' => '1st Helix',
>>                '47' => '40 - Mutation',
>>                '60' => 'Voltage Sensor',
>>                '72' => '72 - Mutation 2',
>>                '73' => '73 - Mutation 3',
>>                '138' => '138 - Glycosylation Site',
>>                '170' => '170 - Phosphorylation Site',
>>                '200' => 'Last Helix');
>>
>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>> cartoon displaying transmembrane helices.',
>>                                                 -topology =>  
>> \@topology,
>>                                                 -n_terminal => 'out',
>>                                                 -helix_width => 48,
>>                                                 -helix_height => 125,
>>                                                 -short_loop_limit  
>> => 10,
>>                                                 -long_loop_limit =>  
>> 35,
>>                                                 -loop_width => 25,
>>                                                 -colour_scheme =>  
>> 'yellow',
>>                                                 -labels => \%labels,
>>                                                 -text_offset => -10);
>>
>> ## print the .png file
>> my $output = 'test.png';
>> open(OUTPUT, ">$output");
>> binmode OUTPUT;
>> print OUTPUT $im->png;
>> close OUTPUT;
>>
>> my $system = `display $output`;
>>
>> -- 
>> Tim Nugent (MRes)
>> Research Student
>> Bioinformatics Unit
>> Department of Computer Science
>> University College London
>> Gower Street
>> London WC1E 6BT
>> Tel: 020-7679-0410
>> t.nugent at ucl.ac.uk
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From marian.thieme at lycos.de  Tue Feb 20 13:34:24 2007
From: marian.thieme at lycos.de (marian thieme)
Date: Tue, 20 Feb 2007 13:34:24 +0000
Subject: [Bioperl-l] Alignment
Message-ID: <188661178021328@lycos-europe.com>

Hi all,

perhaps somebody can give some comments in the following matter:

I have a series of sequences which should be aligned against a reference sequence.
In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.

Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
If yes how I have to understand the example in the doc:
use Bio::LocatableSeq;
my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);

Does the "-" sign represents a gap ? When this sequence starts at position 1
why it ends at position 7, because when considering the gap, there are 8 positions.
Does the SimpleAlign object can treat the gap ?


Thanks for your attention,
Marian

Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe

From cjfields at uiuc.edu  Tue Feb 20 14:40:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 08:40:38 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <E1D718F1-E0FA-496B-9798-7EC84E2D4439@uiuc.edu>

You can add the module and test code (the script) to bugzilla:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

Basically file a new bug report but note that it in an enhancement  
request when filling it out.  Attach the code and test script to the  
report after it is generated (note that it may be easier to add all  
of the files together as a zipped archive).  I think you could also  
add the graphical output as a binary file if they are huge files.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions please let me know too.
>
> Best wishes,
>
> Tim
>
> Chris Fields wrote:
>> I think this is pretty nice!  We can add the code and test script  
>> to  bugzilla and (if someone has time) try to see where it might  
>> fit in,  though Bio::Graphics sounds like a good spot.
>> Anyone else have ideas on where this could go?
>> chris
>> On Feb 19, 2007, at 11:29 AM, Tim Nugent wrote:
>>> Hi everyone,
>>>
>>> I've written a perl module to display transmembrane protein topology
>>> using GD. There are various options, including labels, helix/loop
>>> dimensions, colour schemes etc but it only requires a string or  
>>> array
>>> containing the protein topology (e.g. transmembrane helix start/stop
>>> points). It produces output like this:
>>>
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/blue.png
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/images/yellow.png
>>>
>>> using the code at the bottom.
>>>
>>> Here is a the module:
>>> http://www.cs.ucl.ac.uk/staff/T.Nugent/source/DrawTransmembrane.pm
>>>
>>> I've never submitted anything to Bioperl before - is this sort  
>>> of  thing
>>> likely to be of use to others? I imagine it would sit alongside   
>>> some of
>>> the Bio::Graphics stuff.
>>>
>>> Best wishes,
>>>
>>> Tim
>>>
>>> #!/usr/bin/perl
>>>
>>> use strict;
>>> use warnings;
>>> use lib '/scratch0/NOT_BACKED_UP/tnugent/perl5lib'; ## path to  
>>> module
>>> use DrawTransmembrane;
>>>
>>> my @topology = (20,45,59,70,86,109,145,168,194,220);
>>>
>>> my %labels = ('5' => '5 - Sulphation Site',
>>>                '21' => '1st Helix',
>>>                '47' => '40 - Mutation',
>>>                '60' => 'Voltage Sensor',
>>>                '72' => '72 - Mutation 2',
>>>                '73' => '73 - Mutation 3',
>>>                '138' => '138 - Glycosylation Site',
>>>                '170' => '170 - Phosphorylation Site',
>>>                '200' => 'Last Helix');
>>>
>>> my $im = DrawTransmembrane->draw_transmembrane(-title => 'This is a
>>> cartoon displaying transmembrane helices.',
>>>                                                 -topology =>   
>>> \@topology,
>>>                                                 -n_terminal =>  
>>> 'out',
>>>                                                 -helix_width => 48,
>>>                                                 -helix_height =>  
>>> 125,
>>>                                                 - 
>>> short_loop_limit  => 10,
>>>                                                 -long_loop_limit  
>>> =>  35,
>>>                                                 -loop_width => 25,
>>>                                                 -colour_scheme  
>>> =>  'yellow',
>>>                                                 -labels => \%labels,
>>>                                                 -text_offset =>  
>>> -10);
>>>
>>> ## print the .png file
>>> my $output = 'test.png';
>>> open(OUTPUT, ">$output");
>>> binmode OUTPUT;
>>> print OUTPUT $im->png;
>>> close OUTPUT;
>>>
>>> my $system = `display $output`;
>>>
>>> -- 
>>> Tim Nugent (MRes)
>>> Research Student
>>> Bioinformatics Unit
>>> Department of Computer Science
>>> University College London
>>> Gower Street
>>> London WC1E 6BT
>>> Tel: 020-7679-0410
>>> t.nugent at ucl.ac.uk
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> Tim Nugent (MRes)
> Research Student
> Bioinformatics Unit
> Department of Computer Science
> University College London
> Gower Street
> London WC1E 6BT
> Tel: 020-7679-0410
> t.nugent at ucl.ac.uk

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Feb 20 15:30:17 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 20 Feb 2007 15:30:17 +0000
Subject: [Bioperl-l] Alignment
In-Reply-To: <188661178021328@lycos-europe.com>
References: <188661178021328@lycos-europe.com>
Message-ID: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>

I think the SimpleAlign object contains a set of sequences, each of
which is a LocatableSeq object.

These LocatableSeq objects will have gaps, represented by '-' or
whatever other symbol is specified (I think there are methods for it),
and then one can use methods like column_from_residue_number to map
the coordinates between the primary sequence and the aligned sequence.
The perldoc for LocatableSeq has some examples on how to use these
methods.

[Hopefully I haven't written any lie in this message],

Cheers,

    Albert.

On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
> Hi all,
>
> perhaps somebody can give some comments in the following matter:
>
> I have a series of sequences which should be aligned against a reference sequence.
> In this special case we dont need to calculate anything, we only need to represent the sequences and get for instance some columns of interest.
> The problem now is, that some sequences have gaps and we need to represent gaps in the rewference sequence as well as in some individual sequences.
>
> Question: Can I use LocatableSeq to describe sequences with gaps and to add the sequence to the alignment ?
> If yes how I have to understand the example in the doc:
> use Bio::LocatableSeq;
> my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",-id  => "seq1", -start => 1,-end   => 7);
>
> Does the "-" sign represents a gap ? When this sequence starts at position 1
> why it ends at position 7, because when considering the gap, there are 8 positions.
> Does the SimpleAlign object can treat the gap ?
>
>
> Thanks for your attention,
> Marian
>
> Benachrichtigung bei E-Mail Empfang! - http://mail.lycos.de/app/lycosinside/setupLI.exe
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Feb 20 15:30:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:30:15 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB0638.1030001@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
Message-ID: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>

Sorry, I sent that last one off prematurely.

I could see this being used as a very useful utility if a Bioperl  
object had SeqFeatures which described transmembrane regions, or if  
output from something like TMHMM were parsed and used for input.   
Don't know if it's included, but if not you probably should allow  
labeling of the intracellular/extracellular space to designate  
periplasmic space, mitochondrial matrix, thylakoid, etc.

I think Bio::Graphics namespace is definitely the place to go.  If I  
ever get around to writing up the RNA structural stuff I may put  
something there myself.

chris

On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:

> Thanks Chris, glad it's appreciated.
>
> Is there anything else I can do? If anyone has any requests/ 
> suggestions
> please let me know too.
>
> Best wishes,
>
> Tim


From cjfields at uiuc.edu  Tue Feb 20 15:49:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 09:49:56 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <97E36074-1CF4-4348-85AB-DF23F1048727@uiuc.edu>


On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:

> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.
>
> These LocatableSeq objects will have gaps, represented by '-' or
> whatever other symbol is specified (I think there are methods for it),
> and then one can use methods like column_from_residue_number to map
> the coordinates between the primary sequence and the aligned sequence.
> The perldoc for LocatableSeq has some examples on how to use these
> methods.
>
> [Hopefully I haven't written any lie in this message],
>
> Cheers,
>
>     Albert.

No lies.  The comparison methods are in SimpleAlign; if you look in  
SimpleAlign.t you'll see several demos on how to go abouot adding  
LocatableSeqs to a SimpleAlign object and then use SimpleAlign  
methods for them.

chris

PS (to marian): I'm a bit behind this week, so the bracket_strings  
stuff is lagging behind; I'm writing up some stuff on a deadline.


From t.nugent at cs.ucl.ac.uk  Tue Feb 20 15:50:10 2007
From: t.nugent at cs.ucl.ac.uk (Tim Nugent)
Date: Tue, 20 Feb 2007 15:50:10 +0000
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
Message-ID: <45DB18B2.8070004@cs.ucl.ac.uk>

Labeling of inside/outside and membrane is already possible via 
-inside_label, -outside_label and -membrane_label tags, defaults are 
intracellular, extracellular and plasma membrane.

Was definitely going to add an input/parser for MEMSAT, developed here 
at UCL, and probably a few other popular TM predictors too, e.g. 
PHOBIUS, TMHMM etc. Can already accept topology in the string format 
used by OPM (http://opm.phar.umich.edu/).

Tim


Chris Fields wrote:
> Sorry, I sent that last one off prematurely.
> 
> I could see this being used as a very useful utility if a Bioperl object 
> had SeqFeatures which described transmembrane regions, or if output from 
> something like TMHMM were parsed and used for input.  Don't know if it's 
> included, but if not you probably should allow labeling of the 
> intracellular/extracellular space to designate periplasmic space, 
> mitochondrial matrix, thylakoid, etc.
> 
> I think Bio::Graphics namespace is definitely the place to go.  If I 
> ever get around to writing up the RNA structural stuff I may put 
> something there myself.
> 
> chris
> 
> On Feb 20, 2007, at 8:31 AM, Tim Nugent wrote:
> 
>> Thanks Chris, glad it's appreciated.
>>
>> Is there anything else I can do? If anyone has any requests/suggestions
>> please let me know too.
>>
>> Best wishes,
>>
>> Tim
> 
> 

-- 
Tim Nugent (MRes)
Research Student
Bioinformatics Unit
Department of Computer Science
University College London
Gower Street
London WC1E 6BT
Tel: 020-7679-0410
t.nugent at ucl.ac.uk


From cjfields at uiuc.edu  Tue Feb 20 16:09:00 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 20 Feb 2007 10:09:00 -0600
Subject: [Bioperl-l] Module to draw transmembrane protein topology
In-Reply-To: <45DB18B2.8070004@cs.ucl.ac.uk>
References: <45D9DE8C.2010301@cs.ucl.ac.uk>
	<29861210-1C15-4CFD-BA6F-1FF8EAAF8A62@uiuc.edu>
	<45DB0638.1030001@cs.ucl.ac.uk>
	<4BC43404-FFB2-4350-81DF-812E31AD7CB3@uiuc.edu>
	<45DB18B2.8070004@cs.ucl.ac.uk>
Message-ID: <FF7B4076-FA5A-4F44-ADE7-A44D2FCF4599@uiuc.edu>


On Feb 20, 2007, at 9:50 AM, Tim Nugent wrote:

> Labeling of inside/outside and membrane is already possible via - 
> inside_label, -outside_label and -membrane_label tags, defaults are  
> intracellular, extracellular and plasma membrane.
>
> Was definitely going to add an input/parser for MEMSAT, developed  
> here at UCL, and probably a few other popular TM predictors too,  
> e.g. PHOBIUS, TMHMM etc. Can already accept topology in the string  
> format used by OPM (http://opm.phar.umich.edu/).
>
> Tim

I'll definitely have to take a closer look at it when I have time.   
My guess is the best fit for data would be a seqfeatures, either in a  
collection or a Bio::Seq.  As for the parsers you can look at the  
Bio::Tools::Tmhmm module, which scans Tmhmm output and converts  
everything to seqfeatures.

chris


From lstein at cshl.edu  Tue Feb 20 17:25:24 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 20 Feb 2007 12:25:24 -0500
Subject: [Bioperl-l] Bio::Graphics::FeatureBase seq_id question
In-Reply-To: <45DAD8F6.1030409@sendu.me.uk>
References: <45DAD8F6.1030409@sendu.me.uk>
Message-ID: <6dce9a0b0702200925g74d2db53j3252cca8a41765b@mail.gmail.com>

Just an oversight. I'll fix it.

Lincoln

On 2/20/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Bio::Graphics::FeatureBase::seq_id is currently implemented as a
> read-only alias to ref():
> sub seq_id          { shift->ref()         }
>
>
> What is the reasoning behind this? Can it be made to handle setting of
> the value as well?:
> sub seq_id          { shift->ref(@_)       }
>
>
> Cheers,
> Sendu.
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From khan at cshl.edu  Tue Feb 20 20:42:12 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Tue, 20 Feb 2007 15:42:12 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>

Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


From michael.watson at bbsrc.ac.uk  Tue Feb 20 21:33:19 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Tue, 20 Feb 2007 21:33:19 -0000
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
References: <C8696843AE995F4EA4CDC3E2B83482A9018791C1@mailbox02.cshl.edu>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95020680FD@iahce2ksrv1.iah.bbsrc.ac.uk>

Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From neetisomaiya at gmail.com  Wed Feb 21 08:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Wed Feb 21 08:19:14 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 13:49:14 +0530
Subject: [Bioperl-l] need help in Bio-SCF
Message-ID: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>

Hi All,

I downloaded module
Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
And I am trying to install it when I got the following error. Can someone
please guide me.

[root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
Checking if your kit is complete...
Looks good
Note (probably harmless): No library found for -lread
Writing Makefile for Bio::SCF

[root at ps2288 Bio-SCF-1.01]# make
cp SCF.pm blib/lib/Bio/SCF.pm
cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
/usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
/usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
Please specify prototyping behavior for SCF.xs (see perlxs manual)
gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
-mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
"-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN SCF.c
SCF.xs:12:24: io_lib/scf.h: No such file or directory
SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
SCF.xs:27: error: `Scf' undeclared (first use in this function)
SCF.xs:27: error: (Each undeclared identifier is reported only once
SCF.xs:27: error: for each function it appears in.)
SCF.xs:27: error: `scf_data' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
SCF.xs:66: error: `Scf' undeclared (first use in this function)
SCF.xs:66: error: `scf_data' undeclared (first use in this function)
SCF.xs:68: error: `mFILE' undeclared (first use in this function)
SCF.xs:68: error: `mf' undeclared (first use in this function)
SCF.xs: In function `XS_Bio__SCF_scf_free':
SCF.xs:89: error: `Scf' undeclared (first use in this function)
SCF.xs:89: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_comments':
SCF.xs:95: error: `Scf' undeclared (first use in this function)
SCF.xs:95: error: `scf_data' undeclared (first use in this function)
SCF.xs:95: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_comments':
SCF.xs:108: error: `Scf' undeclared (first use in this function)
SCF.xs:108: error: `scf_data' undeclared (first use in this function)
SCF.xs:108: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_write':
SCF.xs:121: error: `Scf' undeclared (first use in this function)
SCF.xs:121: error: `scf_data' undeclared (first use in this function)
SCF.xs:121: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
SCF.xs:135: error: `mFILE' undeclared (first use in this function)
SCF.xs:135: error: `mf' undeclared (first use in this function)
SCF.xs:137: error: `Scf' undeclared (first use in this function)
SCF.xs:137: error: `scf_data' undeclared (first use in this function)
SCF.xs:137: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_from_header':
SCF.xs:159: error: `Scf' undeclared (first use in this function)
SCF.xs:159: error: `scf_data' undeclared (first use in this function)
SCF.xs:159: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_get_at':
SCF.xs:186: error: `Scf' undeclared (first use in this function)
SCF.xs:186: error: `scf_data' undeclared (first use in this function)
SCF.xs:186: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_base_at':
SCF.xs:242: error: `Scf' undeclared (first use in this function)
SCF.xs:242: error: `scf_data' undeclared (first use in this function)
SCF.xs:242: error: syntax error before ')' token
SCF.xs: In function `XS_Bio__SCF_set_at':
SCF.xs:255: error: `Scf' undeclared (first use in this function)
SCF.xs:255: error: `scf_data' undeclared (first use in this function)
SCF.xs:255: error: syntax error before ')' token
make: *** [SCF.o] Error 1


-- 
-Neeti
Even my blood says, B positive


From sdavis2 at mail.nih.gov  Wed Feb 21 11:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1


From sdavis2 at mail.nih.gov  Wed Feb 21 11:17:50 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 21 Feb 2007 06:17:50 -0500
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
Message-ID: <200702210617.50616.sdavis2@mail.nih.gov>

On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> Hi All,
>
> I downloaded module
> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> And I am trying to install it when I got the following error. Can someone
> please guide me.

You will probably need to read the INSTALL document.  You need to install a 
couple of libraries first.  Looks like you don't have the staden io-lib 
installed.


> [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> Checking if your kit is complete...
> Looks good
> Note (probably harmless): No library found for -lread
> Writing Makefile for Bio::SCF
>
> [root at ps2288 Bio-SCF-1.01]# make
> cp SCF.pm blib/lib/Bio/SCF.pm
> cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc SCF.c
> Please specify prototyping behavior for SCF.xs (see perlxs manual)
> gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> SCF.xs:27: error: `Scf' undeclared (first use in this function)
> SCF.xs:27: error: (Each undeclared identifier is reported only once
> SCF.xs:27: error: for each function it appears in.)
> SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> SCF.xs:66: error: `Scf' undeclared (first use in this function)
> SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> SCF.xs:68: error: `mf' undeclared (first use in this function)
> SCF.xs: In function `XS_Bio__SCF_scf_free':
> SCF.xs:89: error: `Scf' undeclared (first use in this function)
> SCF.xs:89: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_comments':
> SCF.xs:95: error: `Scf' undeclared (first use in this function)
> SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> SCF.xs:95: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_comments':
> SCF.xs:108: error: `Scf' undeclared (first use in this function)
> SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> SCF.xs:108: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_write':
> SCF.xs:121: error: `Scf' undeclared (first use in this function)
> SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> SCF.xs:121: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> SCF.xs:135: error: `mf' undeclared (first use in this function)
> SCF.xs:137: error: `Scf' undeclared (first use in this function)
> SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> SCF.xs:137: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_from_header':
> SCF.xs:159: error: `Scf' undeclared (first use in this function)
> SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> SCF.xs:159: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_get_at':
> SCF.xs:186: error: `Scf' undeclared (first use in this function)
> SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> SCF.xs:186: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_base_at':
> SCF.xs:242: error: `Scf' undeclared (first use in this function)
> SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> SCF.xs:242: error: syntax error before ')' token
> SCF.xs: In function `XS_Bio__SCF_set_at':
> SCF.xs:255: error: `Scf' undeclared (first use in this function)
> SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> SCF.xs:255: error: syntax error before ')' token
> make: *** [SCF.o] Error 1


From cjfields at uiuc.edu  Wed Feb 21 12:08:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 06:08:57 -0600
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <40C288FE-C74C-4B3F-A835-1A5C563B2B8E@uiuc.edu>


On Feb 21, 2007, at 5:17 AM, Sean Davis wrote:

> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
>> Hi All,
>>
>> I downloaded module
>> Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
>> And I am trying to install it when I got the following error. Can  
>> someone
>> please guide me.
>
> You will probably need to read the INSTALL document.  You need to  
> install a
> couple of libraries first.  Looks like you don't have the staden io- 
> lib
> installed.

Just to note, this module isn't part of BioPerl (I don't even think  
it has a Bioperl interface).  You'll probably need to contact Lincoln  
for details on using this module.

One thing you may run into is errors with the version of io_lib  
installed (a problem I've encountered with bioperl-ext), probably  
from API changes.  If you run into problems with newer versions of  
io_lib you should try downgrading to io_lib 1.8.11 or 1.8.12.


From neetisomaiya at gmail.com  Wed Feb 21 12:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Wed Feb 21 12:25:26 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Wed, 21 Feb 2007 17:55:26 +0530
Subject: [Bioperl-l] need help in Bio-SCF
In-Reply-To: <200702210617.50616.sdavis2@mail.nih.gov>
References: <764978cf0702210019ncfa2b17l8d57088eb5ae278a@mail.gmail.com>
	<200702210617.50616.sdavis2@mail.nih.gov>
Message-ID: <764978cf0702210425j544330bbr12c86a89960dbb66@mail.gmail.com>

Thanks. It resolved my problem.

On 2/21/07, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
> On Wednesday 21 February 2007 03:19, neeti somaiya wrote:
> > Hi All,
> >
> > I downloaded module
> > Bio-SCF-1.01<http://search.cpan.org/%7Elds/Bio-SCF-1.01/>from CPAN.
> > And I am trying to install it when I got the following error. Can
> someone
> > please guide me.
>
> You will probably need to read the INSTALL document.  You need to install
> a
> couple of libraries first.  Looks like you don't have the staden io-lib
> installed.
>
>
> > [root at ps2288 Bio-SCF-1.01]# perl Makefile.PL
> > Checking if your kit is complete...
> > Looks good
> > Note (probably harmless): No library found for -lread
> > Writing Makefile for Bio::SCF
> >
> > [root at ps2288 Bio-SCF-1.01]# make
> > cp SCF.pm blib/lib/Bio/SCF.pm
> > cp SCF/Arrays.pm blib/lib/Bio/SCF/Arrays.pm
> > /usr/bin/perl /usr/lib/perl5/5.8.5/ExtUtils/xsubpp  -typemap
> > /usr/lib/perl5/5.8.5/ExtUtils/typemap  SCF.xs > SCF.xsc && mv SCF.xsc
> SCF.c
> > Please specify prototyping behavior for SCF.xs (see perlxs manual)
> > gcc -c   -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -O2 -g -pipe -m32 -march=i386
> > -mtune=pentium4   -DVERSION=\"1.01\" -DXS_VERSION=\"1.01\" -fPIC
> > "-I/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE"  -DLITTLE_ENDIAN
> > SCF.c SCF.xs:12:24: io_lib/scf.h: No such file or directory
> > SCF.xs:13:26: io_lib/mFILE.h: No such file or directory
> > SCF.xs: In function `XS_Bio__SCF_get_scf_pointer':
> > SCF.xs:27: error: `Scf' undeclared (first use in this function)
> > SCF.xs:27: error: (Each undeclared identifier is reported only once
> > SCF.xs:27: error: for each function it appears in.)
> > SCF.xs:27: error: `scf_data' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_get_scf_fpointer':
> > SCF.xs:66: error: `Scf' undeclared (first use in this function)
> > SCF.xs:66: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:68: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:68: error: `mf' undeclared (first use in this function)
> > SCF.xs: In function `XS_Bio__SCF_scf_free':
> > SCF.xs:89: error: `Scf' undeclared (first use in this function)
> > SCF.xs:89: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_comments':
> > SCF.xs:95: error: `Scf' undeclared (first use in this function)
> > SCF.xs:95: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:95: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_comments':
> > SCF.xs:108: error: `Scf' undeclared (first use in this function)
> > SCF.xs:108: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:108: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_write':
> > SCF.xs:121: error: `Scf' undeclared (first use in this function)
> > SCF.xs:121: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:121: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_scf_fwrite':
> > SCF.xs:135: error: `mFILE' undeclared (first use in this function)
> > SCF.xs:135: error: `mf' undeclared (first use in this function)
> > SCF.xs:137: error: `Scf' undeclared (first use in this function)
> > SCF.xs:137: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:137: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_from_header':
> > SCF.xs:159: error: `Scf' undeclared (first use in this function)
> > SCF.xs:159: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:159: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_get_at':
> > SCF.xs:186: error: `Scf' undeclared (first use in this function)
> > SCF.xs:186: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:186: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_base_at':
> > SCF.xs:242: error: `Scf' undeclared (first use in this function)
> > SCF.xs:242: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:242: error: syntax error before ')' token
> > SCF.xs: In function `XS_Bio__SCF_set_at':
> > SCF.xs:255: error: `Scf' undeclared (first use in this function)
> > SCF.xs:255: error: `scf_data' undeclared (first use in this function)
> > SCF.xs:255: error: syntax error before ')' token
> > make: *** [SCF.o] Error 1
>


-- 
-Neeti
Even my blood says, B positive


From jay at jays.net  Wed Feb 21 00:27:01 2007
From: jay at jays.net (Jay Hannah)
Date: Tue, 20 Feb 2007 18:27:01 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
Message-ID: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>

> On 2/20/07, marian thieme <marian.thieme at lycos.de> wrote:
>> I have a series of sequences which should be aligned against a 
>> reference sequence.
>> In this special case we dont need to calculate anything, we only need 
>> to represent the sequences and get for instance some columns of 
>> interest.
>> The problem now is, that some sequences have gaps and we need to 
>> represent gaps in the rewference sequence as well as in some 
>> individual sequences.

On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
> I think the SimpleAlign object contains a set of sequences, each of
> which is a LocatableSeq object.

Fascinating. In my BLAST-centric universe I went and rolled my own 
solution for SeqLab where I hold onto the Bio::Seq from the reference 
sequences and then hold onto the Bio::Search::HSP::GenericHSP objects 
for all my BLAST hits. From that dataset I can write whatever reports I 
want and/or perform any subsequent actions. I wonder if I should have 
done that differently...

What typically creates .pfam files?

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From cjfields at uiuc.edu  Wed Feb 21 13:36:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 07:36:02 -0600
Subject: [Bioperl-l] Alignment
In-Reply-To: <cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
References: <188661178021328@lycos-europe.com>
	<358f4d650702200730x5fd09bb8p7dec8ce740553378@mail.gmail.com>
	<cdcd3b0549caf3cf9818bdf3d14f0796@jays.net>
Message-ID: <2233F0EE-94FE-42F0-B8E5-1BE14A25C0D4@uiuc.edu>


On Feb 20, 2007, at 6:27 PM, Jay Hannah wrote:
...
>
> On Feb 20, 2007, at 9:30 AM, Albert Vilella wrote:
>> I think the SimpleAlign object contains a set of sequences, each of
>> which is a LocatableSeq object.
>
> Fascinating. In my BLAST-centric universe I went and rolled my own
> solution for SeqLab where I hold onto the Bio::Seq from the reference
> sequences and then hold onto the Bio::Search::HSP::GenericHSP objects
> for all my BLAST hits. From that dataset I can write whatever  
> reports I
> want and/or perform any subsequent actions. I wonder if I should have
> done that differently...
>
> What typically creates .pfam files?
>
> j
> seqlab.net
> http://www.bioperl.org/wiki/User:Jhannah

Pfam alignments come in two formats (pfam and stockholm) that can  
both be parsed into SimpleAlign objects via Bio::AlignIO:

my $alnin = Bio::AlignIO->new(-format => 'stockholm',
                               -file => 'dho.sto');

while (my $aln = $alnin->next_aln) {
    # do stuff to $aln SimpleAlign
}

Personally I stick with Stockholm as it's a richer format (with  
annotations and so on), but the parser was rewritten recently (by  
moi!) so may have some bugs still.

I'm a bit confused as to what you do with BLAST files.  You can  
generate a SimpleAlign right from the HSP for most SearchIO parsers:

http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods

chris


From sanjib at bic.boseinst.ernet.in  Wed Feb 21 06:12:06 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Wed, 21 Feb 2007 11:42:06 +0530
Subject: [Bioperl-l] help on remote blast
In-Reply-To: <20070220073200.M42567@bic.boseinst.ernet.in>
References: <20070220073200.M42567@bic.boseinst.ernet.in>
Message-ID: <20070221061206.M37845@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors


waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
xx.pep
 
-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded
 
DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
 
<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>
 
---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG: <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>
 
---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.
--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070221/5a3382d6/attachment-0004.pl>

From granjeau at tagc.univ-mrs.fr  Wed Feb 21 13:50:39 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Wed, 21 Feb 2007 14:50:39 +0100
Subject: [Bioperl-l] Adding empty member list in Bio::Cluster::SequenceFamily
Message-ID: <45DC4E2F.4060804@tagc.univ-mrs.fr>

Hello!

Not clear to me, but I find a work around by checking for empty list 
before adding, here is what I noticed. Adding as members an empty list 
() is not the same as adding a reference to an empty list [], of course, 
but could be thought to be the same. Calling get_members, for the second 
case, I got a list of 0 member, but in the first case I got of 1 member, 
which is not an object at all. I am warned now, but may be the 
documentation should emphasize on using by the reference call.

Best regards,
--Samuel


use Bio::Cluster::SequenceFamily;

$f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$f->add_members( () );
print scalar $f->get_members();
# 1
$g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
$g->add_members( [] );
print scalar $g->get_members();
# 0


From stephen.marshall at novartis.com  Wed Feb 21 17:01:00 2007
From: stephen.marshall at novartis.com (stephen.marshall at novartis.com)
Date: Wed, 21 Feb 2007 12:01:00 -0500
Subject: [Bioperl-l] Parsing kegg files
Message-ID: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>

Hello
I"m trying to parse a Kegg file and I can't seem to get at the pathway 
information... Here's a snippet of my code. I only see dblink and 
description as annotation

use Bio::SeqIO;

my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');

while ( my $seq = $stream->next_seq() ) {
        # do something with $seq
        my $id = $seq->display_id();
        print "$id:";
        my $ann = $seq->annotation();
        foreach my $key ( $ann->get_all_annotation_keys() ) {
                my @values = $ann->get_Annotations($key);
                foreach my $value ( @values ) {
                        print "Annotation: ",$key," value: 
",$value->as_text,"\n";
                }
        }

}
_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.


From prateek.vit at gmail.com  Wed Feb 21 17:40:25 2007
From: prateek.vit at gmail.com (prateek singh yadav)
Date: Wed, 21 Feb 2007 23:10:25 +0530
Subject: [Bioperl-l] Problem in BioPerl Installation
Message-ID: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>

Hello all,

I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
shows this problem.


[root at HX342SBC054 Desktop]# cpan
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.7601)
ReadLine support available (try 'install Bundle::CPAN')

cpan> get bioperl
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
Warning: Found only 25 objects in /root/.cpan/Metadata
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Line-Count header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
contain a Last-Updated header.
Please check the validity of the index file by comparing it to more
than one CPAN mirror. I'll continue but problems seem likely to
happen.
Going to read /root/.cpan/sources/modules/03modlist.data.gz
Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
 at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
        CPAN::Index::rd_modlist('CPAN::Index',
'/root/.cpan/sources/modules/03modlist.data.gz') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 3129
        CPAN::Index::reload('CPAN::Index') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 675
        CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
        CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2078
        CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 2157
        CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
/usr/lib/perl5/5.8.5/CPAN.pm line 201
        eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
        CPAN::shell() called at /usr/bin/cpan line 193

cpan>

Can anyone give me direction  how to configure cpan again or how to install
BioPerl on linux with its complete dependencies. Because I think I have a
problem in CPAN configuration.

Regards,
Prateek

-- 
Prateek Singh
3rd year Bioinformatics(BTech)
Vellore Institute Of Technology
Vellore-632014
prateek.vit at gmail.com


From bosborne11 at verizon.net  Wed Feb 21 17:29:40 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Wed, 21 Feb 2007 12:29:40 -0500
Subject: [Bioperl-l] Parsing kegg files
In-Reply-To: <OFA3726097.8019A09E-ON85257289.005D64E3-85257289.005D7997@ah.novartis.com>
Message-ID: <C201EBB4.CEE7%bosborne11@verizon.net>

Stephen,

I don't know what your eventual goals are but you might want to take a look
at bioperl-network. However, there are problems with this package. One, it
only parses DIP tab-delimited and PSI-MI and it does this last one only
partially (you will get the graph though). Two, it seems to have only a
single developer interested in it, that's me, and few users. In my Bioperl
experience projects like this tend to fade away.

http://www.bioperl.org/wiki/Network_package


Brian O.


On 2/21/07 12:01 PM, "stephen.marshall at novartis.com"
<stephen.marshall at novartis.com> wrote:

> Hello
> I"m trying to parse a Kegg file and I can't seem to get at the pathway
> information... Here's a snippet of my code. I only see dblink and
> description as annotation
> 
> use Bio::SeqIO;
> 
> my $stream = Bio::SeqIO->new(-file => $filename, -format => 'KEGG');
> 
> while ( my $seq = $stream->next_seq() ) {
>         # do something with $seq
>         my $id = $seq->display_id();
>         print "$id:";
>         my $ann = $seq->annotation();
>         foreach my $key ( $ann->get_all_annotation_keys() ) {
>                 my @values = $ann->get_Annotations($key);
>                 foreach my $value ( @values ) {
>                         print "Annotation: ",$key," value:
> ",$value->as_text,"\n";
>                 }
>         }
> 
> }
> _________________________
> 
> CONFIDENTIALITY NOTICE
> 
> The information contained in this e-mail message is intended only for the
> exclusive use of the individual or entity named above and may contain
> information that is privileged, confidential or exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivery of the
> message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. If you have received this communication in error, please
> notify the sender immediately by e-mail and delete the material from any
> computer.  Thank you.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Feb 21 18:18:37 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 21 Feb 2007 12:18:37 -0600
Subject: [Bioperl-l] Problem in BioPerl Installation
In-Reply-To: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
References: <e7e0a2da0702210940j60a67f37t9bfddda7d760a277@mail.gmail.com>
Message-ID: <45DC8CFD.1060108@campus.iztacala.unam.mx>

You can always rebuild your CPAN configuration by deleting the existing 
.cpan/ directory in root's $HOME dir (quick & dirty trick), then invoke 
CPAN again from root's shell to rebuild the config:

# perl -MCPAN -e shell

Hope this helps.

Regards,
Mauricio.

prateek singh yadav wrote:
> Hello all,
> 
> I was trying to install Bioperl on my redhat linux (EL) using CPAN. but CPAN
> shows this problem.
> 
> 
> [root at HX342SBC054 Desktop]# cpan
> Terminal does not support AddHistory.
> 
> cpan shell -- CPAN exploration and modules installation (v1.7601)
> ReadLine support available (try 'install Bundle::CPAN')
> 
> cpan> get bioperl
> CPAN: Storable loaded ok
> Going to read /root/.cpan/Metadata
> Warning: Found only 25 objects in /root/.cpan/Metadata
> Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
> Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Line-Count header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Warning: Your /root/.cpan/sources/modules/02packages.details.txt.gz does not
> contain a Last-Updated header.
> Please check the validity of the index file by comparing it to more
> than one CPAN mirror. I'll continue but problems seem likely to
> happen.
> Going to read /root/.cpan/sources/modules/03modlist.data.gz
> Can't locate object method "data" via package "CPAN::Modulelist" (perhaps
> you forgot to load "CPAN::Modulelist"?) at (eval 13) line 1.
>  at /usr/lib/perl5/5.8.5/CPAN.pm line 3406
>         CPAN::Index::rd_modlist('CPAN::Index',
> '/root/.cpan/sources/modules/03modlist.data.gz') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 3129
>         CPAN::Index::reload('CPAN::Index') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 675
>         CPAN::exists('CPAN=HASH(0x8548f20)', 'CPAN::Module', 'bioperl')
> called at /usr/lib/perl5/5.8.5/CPAN.pm line 1842
>         CPAN::Shell::expandany('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2078
>         CPAN::Shell::rematein('CPAN::Shell', 'get', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 2157
>         CPAN::Shell::get('CPAN::Shell', 'bioperl') called at
> /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         eval {...} called at /usr/lib/perl5/5.8.5/CPAN.pm line 201
>         CPAN::shell() called at /usr/bin/cpan line 193
> 
> cpan>
> 
> Can anyone give me direction  how to configure cpan again or how to install
> BioPerl on linux with its complete dependencies. Because I think I have a
> problem in CPAN configuration.
> 
> Regards,
> Prateek
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From hlapp at gmx.net  Wed Feb 21 18:33:17 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 21 Feb 2007 13:33:17 -0500
Subject: [Bioperl-l] Adding empty member list in
	Bio::Cluster::SequenceFamily
In-Reply-To: <45DC4E2F.4060804@tagc.univ-mrs.fr>
References: <45DC4E2F.4060804@tagc.univ-mrs.fr>
Message-ID: <5B31EEBD-FFE5-4A0F-BB05-DF7297103BBD@gmx.net>

Fixed in CVS HEAD. -hilmar

On Feb 21, 2007, at 8:50 AM, Samuel GRANJEAUD - IR/IFR137 wrote:

> Hello!
>
> Not clear to me, but I find a work around by checking for empty list
> before adding, here is what I noticed. Adding as members an empty list
> () is not the same as adding a reference to an empty list [], of  
> course,
> but could be thought to be the same. Calling get_members, for the  
> second
> case, I got a list of 0 member, but in the first case I got of 1  
> member,
> which is not an object at all. I am warned now, but may be the
> documentation should emphasize on using by the reference call.
>
> Best regards,
> --Samuel
>
>
> use Bio::Cluster::SequenceFamily;
>
> $f = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $f->add_members( () );
> print scalar $f->get_members();
> # 1
> $g = new Bio::Cluster::SequenceFamily( -id => 'aa' );
> $g->add_members( [] );
> print scalar $g->get_members();
> # 0
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Feb 21 19:12:57 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 21 Feb 2007 13:12:57 -0600
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>

Dmitry,

I'm forwarding this to the mail list.  In the future please post/ 
respond to the regular mail list so other BioPerl developers/users  
can comment.  You'll get feedback much faster here (and maybe even  
some support!).

The issue at hand is whether we can support GenBank accessions/ 
display_id/version with your naming scheme.  My feeling is that  
support for nonalphanumerics was removed to be compliant with the  
GenBank standard for accessions, though I may be wrong.  Maybe  
someone who was around during bioperl 1.2 can elaborate more?

 From http://bugzilla.open-bio.org/show_bug.cgi?id=2214
--------------------------------------------------
....
Thanks for verbose explanation. It seems that I would need to apply
my local patches to the BioPerl module(s). With BioPerl-1.2 there was
no problem with '-' in sequence names.

The problem is that in the project we participate (Vizier project)  
following
sequence name convention was adopted:

VZ##<virus_ICTV>-(<GenBank LOCUS ID>or<strain designation>)-<$$>

VZ Stands for Vizier

## Your 2-digits Partner ID within the VIZIER consortium

<virus_ICTV> Virus name according to the ICTV nomenclature;

<GenBank LOCUS ID>,
<strain designation> If sequence has not been assigned a GenBank  
LOCUS ID,
available strain designation, short as possible, should be used

<$$> Unique 2-digits number on your discretion to label sequence variant
--------------------------------------------------

chris


From gabriel.cardona at uib.es  Thu Feb 22 09:33:14 2007
From: gabriel.cardona at uib.es (gcardona)
Date: Thu, 22 Feb 2007 01:33:14 -0800 (PST)
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
Message-ID: <9096740.post@talk.nabble.com>


Hello,

I am trying to install Bioperl on a Windows system, following the
installation notes in 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
find the package and answers:
Downloading bioperl-1.5.2_100 ... not found

I've looked the contents of
http://bioperl.org/DIST
and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
folder the available version is bioperl-1.5.2_102
Is this a bug? or should I download and install manually?

Thank you in advance,

Gabriel Cardona
-- 
View this message in context: http://www.nabble.com/bioperl-1.5.2_100-...-not-found-tf3271747.html#a9096740
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Thu Feb 22 12:35:14 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 22 Feb 2007 12:35:14 +0000
Subject: [Bioperl-l] bioperl-1.5.2_100 ... not found
In-Reply-To: <9096740.post@talk.nabble.com>
References: <9096740.post@talk.nabble.com>
Message-ID: <45DD8E02.1070404@sendu.me.uk>

gcardona wrote:
> Hello,
> 
> I am trying to install Bioperl on a Windows system, following the
> installation notes in 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> When the Perl Package Manager tries to download bioperl-1.5.2_100, it cannot
> find the package and answers:
> Downloading bioperl-1.5.2_100 ... not found
> 
> I've looked the contents of
> http://bioperl.org/DIST
> and in package.xml the version for bioperl is bioperl-1.5.2_100, but in that
> folder the available version is bioperl-1.5.2_102
> Is this a bug? or should I download and install manually?

Sorry, my mistake. I accidentally moved the ppm to a different folder. 
It should work now though.

I may make a 1.5.2_102 ppm at some point, but there are no relevant 
differences between _102 and _100 as far as Windows users are concerned.


From enrique_rulz at yahoo.com  Thu Feb 22 20:41:37 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Thu, 22 Feb 2007 12:41:37 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
Message-ID: <9107936.post@talk.nabble.com>


Hi every1..
I m facing a great deal of problem in simple pattern matching between
sequence & a pattern ..Program shod be designed such a way that it shod be
able do two things 1) normal matching...For eg: GATCAAT....if TC is
entered... output shod be 2...2) matching using spl character..In same
example if C*T value is entered It shod give o/p as 3 & seq to b displayed
is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
problem..output I m gettin as 1 instead of 3...Code is really simple!

#!/usr/bin/perl
$alphabet = "GATCAAT";
$pattern=  "C*T ";

$alphabet =~ /($pattern)/i;

print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";

====================
OUTPUT!
The entire C*T match began at 1 and ended at 2
====================

but the o/p shod be 3????
& Is there n e chance I can get seq too..I mean instead of C*T'' i need
'CAAT'...????

Well..Its not compulsion to use regex....But I find it quite simple..can
there be n e other method??

Thanx in advance!
Kurt!    
 
-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9107936
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Feb 22 21:01:03 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 22 Feb 2007 15:01:03 -0600
Subject: [Bioperl-l] GenBank accession bug?
In-Reply-To: <51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
References: <11B83C2C-7BFE-48B5-A20C-7EAF282F39BD@uiuc.edu>
	<51879.10.0.7.57.1172176272.squirrel@gscmail.wustl.edu>
Message-ID: <028E16D7-036A-44DA-BECD-F910BEA58E53@uiuc.edu>


On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:

>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
>
> Chris, I'm a little unsure of what you're saying here (which might  
> mean
> that you're already saying what I'm about to...say). Do you mean it  
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
>
> I would argue any arbitrary ID should be supported as long as that  
> ID is a
> contiguous non-space word (\S+).
>
> Actually the existing accession regex looks like it already  
> supports IDs
> with '-':
>
> /^ACCESSION\s+(\S.*\S)/
>
> It's only the version regex which doesn't (\w doesn't include '-'):
>
> /^\w+\.(\d+)/
>
>
> Anyone else have thoughts or comments on this? Off the top of my  
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
>
> Dave

You're right; the argument comes down simply to whether we would  
support \S+ or just \w+.  I'm neutral on this myself, but I wonder  
how allowing \S+ would affect other modules (for instance, indexing  
for a flat db), where one might just use \w+ for accessions,  
expecting them to be GenBank- or EMBL-like alphanumerics.  The fact  
that \S+ was supported in the past (as indicated in the bug report)  
and then wasn't post 1.2 makes me think there was a reason for  
someone going in and modifying it, but that was before my time on the  
group.

I'll have a look at the CVS history when I have time to see what I  
can dig up.

chris


From mkiwala at watson.wustl.edu  Thu Feb 22 20:36:33 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 22 Feb 2007 14:36:33 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
Message-ID: <45DDFED1.1090503@watson.wustl.edu>

Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?

I get the impression they are designed to do similar things.  If so is 
one deprecated and the other preferred?

If their responsibilities are orthogonal to each other, what sorts of 
tasks are suited to each?

Thanks,
Michael


From dmessina at wustl.edu  Thu Feb 22 20:53:01 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Thu, 22 Feb 2007 14:53:01 -0600 (CST)
Subject: [Bioperl-l] GenBank accession bug?
Message-ID: <51923.10.0.7.57.1172177581.squirrel@gscmail.wustl.edu>

> The issue at hand is whether we can support GenBank accessions/
> display_id/version with your naming scheme.

Chris, I'm a little unsure of what you're saying here (which might mean
that you're already saying what I'm about to...say). Do you mean it might
be tricky to support both the Genbank standard and Dmitry's
simultaneously?

I would argue any arbitrary ID should be supported as long as that ID is a
contiguous non-space word (\S+).

Actually the existing accession regex looks like it already supports IDs
with '-':

/^ACCESSION\s+(\S.*\S)/

It's only the version regex which doesn't (\w doesn't include '-'):

/^\w+\.(\d+)/


Anyone else have thoughts or comments on this? Off the top of my head, I
can't think of any issues that might arise from doing so (apart from
having to modify all of the SeqIO modules to support it).

Dave


From heikki at sanbi.ac.za  Fri Feb 23 08:25:39 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 23 Feb 2007 10:25:39 +0200
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9107936.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>
Message-ID: <200702231025.39416.heikki@sanbi.ac.za>

Kurt,

There are  few things in your code to note:

- regexp /C*T/ matches any T preceded by zero or more Cs,
  not what you meant
- $- and $+ are among the "expensive" perl functions worth 
  not using unless you have to. Using them once in your 
  code slows execution down considerable. There is always 
  an other way.
- Keep in mind what you want to use the match positions for: 
  Human readable locations usually start counting with 1 but
  perl code uses 0 as the first location. The code below assumes
  you want to print the locations out.

Study my example code below.

Yours,
	-Heikki

###################################################################
#!/usr/bin/perl
$seq = "GATCAAT";
#$pattern=  'C*T';
$pattern=  'C.*T';

while ($seq =~ m/($pattern)/gi) {

    $match = $1;
    $end = pos($seq);
    $start = $end - length($match) +1;

    print "$match : $start - $end\n";
}

###################################################################


On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> Hi every1..
> I m facing a great deal of problem in simple pattern matching between
> sequence & a pattern ..Program shod be designed such a way that it shod be
> able do two things 1) normal matching...For eg: GATCAAT....if TC is
> entered... output shod be 2...2) matching using spl character..In same
> example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> problem..output I m gettin as 1 instead of 3...Code is really simple!
>
> #!/usr/bin/perl
> $alphabet = "GATCAAT";
> $pattern=  "C*T ";
>
> $alphabet =~ /($pattern)/i;
>
> print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
>
> ====================
> OUTPUT!
> The entire C*T match began at 1 and ended at 2
> ====================
>
> but the o/p shod be 3????
> & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> 'CAAT'...????
>
> Well..Its not compulsion to use regex....But I find it quite simple..can
> there be n e other method??
>
> Thanx in advance!
> Kurt!


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From avilella at gmail.com  Fri Feb 23 09:59:49 2007
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 23 Feb 2007 09:59:49 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>

now that we are at this pattern matching thread, I was wondering if
any perl guru could enlighten me on the issue of matching exact
sequence patterns on a gapped target sequence. E.g.:

my $seq = "CGATCAACGAATCGTACGTACTC";
my $gapped_seq =
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

and one would like to get as a result:

"CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"

which is the match of $seq but in $gapped_seq.

Cheers,

    Albert.


On 2/23/07, Heikki Lehvaslaiho <heikki at sanbi.ac.za> wrote:
> Kurt,
>
> There are  few things in your code to note:
>
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth
>   not using unless you have to. Using them once in your
>   code slows execution down considerable. There is always
>   an other way.
> - Keep in mind what you want to use the match positions for:
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
>
> Study my example code below.
>
> Yours,
>         -Heikki
>
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
>
> while ($seq =~ m/($pattern)/gi) {
>
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
>
>     print "$match : $start - $end\n";
> }
>
> ###################################################################
>
>
> On Thursday 22 February 2007 22:41:37 Kurt Gobain wrote:
> > Hi every1..
> > I m facing a great deal of problem in simple pattern matching between
> > sequence & a pattern ..Program shod be designed such a way that it shod be
> > able do two things 1) normal matching...For eg: GATCAAT....if TC is
> > entered... output shod be 2...2) matching using spl character..In same
> > example if C*T value is entered It shod give o/p as 3 & seq to b displayed
> > is CAAT..I m easily getting 1st part...But in 2nd part Its giving sum
> > problem..output I m gettin as 1 instead of 3...Code is really simple!
> >
> > #!/usr/bin/perl
> > $alphabet = "GATCAAT";
> > $pattern=  "C*T ";
> >
> > $alphabet =~ /($pattern)/i;
> >
> > print "The entire '$pattern' match began at $-[0] and ended at $+[0]\n";
> >
> > ====================
> > OUTPUT!
> > The entire C*T match began at 1 and ended at 2
> > ====================
> >
> > but the o/p shod be 3????
> > & Is there n e chance I can get seq too..I mean instead of C*T'' i need
> > 'CAAT'...????
> >
> > Well..Its not compulsion to use regex....But I find it quite simple..can
> > there be n e other method??
> >
> > Thanx in advance!
> > Kurt!
>
>
>
> --
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From js5 at sanger.ac.uk  Fri Feb 23 11:34:37 2007
From: js5 at sanger.ac.uk (James Smith)
Date: Fri, 23 Feb 2007 11:34:37 +0000 (GMT)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>

On Fri, 23 Feb 2007, Albert Vilella wrote:

> now that we are at this pattern matching thread, I was wondering if
> any perl guru could enlighten me on the issue of matching exact
> sequence patterns on a gapped target sequence. E.g.:
>
> my $seq = "CGATCAACGAATCGTACGTACTC";
> my $gapped_seq =
> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>
> and one would like to get as a result:
>
> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>
> which is the match of $seq but in $gapped_seq.

Try...

 my $seq = "CGATCAACGAATCGTACGTACTC";
 my $gapped_seq =
   "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

 my $regexp = '('.join('-*?',split//,$seq).')';

 if( $gapped_seq =~ /$regexp/ ) {
   print "Match is $1\n";
 } else {
   print "No match\n";
 }

 (not sure on the efficiency if $seq is long tho')
James

>
> Cheers,


From khoueiry at ibdm.univ-mrs.fr  Fri Feb 23 13:09:33 2007
From: khoueiry at ibdm.univ-mrs.fr (pierre)
Date: Fri, 23 Feb 2007 14:09:33 +0100
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
Message-ID: <1172236173.4309.6.camel@ciona-pierre>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/0e08ebe6/attachment-0001.ksh>

From neetisomaiya at gmail.com  Fri Feb 23 12:27:28 2007
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Fri, 23 Feb 2007 17:57:28 +0530
Subject: [Bioperl-l] need help urgently - needle output parsing
Message-ID: <764978cf0702230427x5b5acf73y6538527ade3fd453@mail.gmail.com>

Hi,

I am using needle alignment tool (standalone, on a linux machine), and then
I am using Bioperl to parse the output.
All data - sequence files and alignment outputs are attached with this mail.

I have 2 small sequences :- 693.seq and revcomp693.seq
I have 2 big sequences :- 80768-4291-5639.84809_84810_84809_1.scf.seq and
80768-4291-5639.84809_84810_84810_1.scf.seq
All these are in fasta format

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84809_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 97
2) Aligning 80768-4291-5639.84809_84810_84809_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84809_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 91

All this is correct.

Now I am doing the following :-
1) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and 693.seq - output
file is 80768-4291-5639.84809_84810_84810_1.scf.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is correct)
2) Aligning 80768-4291-5639.84809_84810_84810_1.scf.seq and revcomp693.seq -
output file is 80768-4291-5639.84809_84810_84810_1.scf.comp.out
parsing the output gives me the alignment start in 'traceseq' as 341 (this
is incorrect, correct position is 330)


Part of my code is as follows :-
---------------------------------------------

# running needle
`$needle_path./needle $trace.seq $snp_position_on_con.seq -gapopen
10.0-gapextend
0.5 $output`;

# parsing needle output
my $str = Bio::AlignIO->new(-format => 'emboss',-file => $output);
my $aln = $str->next_aln();
my $pos = $aln->column_from_residue_number('original',1);

$logger->info("Alignment pos is $pos");

####################################

 # running needle
`$needle_path./needle $trace.seq revcomp$snp_position_on_con.seq -gapopen
10.0 -gapextend 0.5 $comp_output`;

# parsing needle output
my $comp_str = Bio::AlignIO->new(-format => 'emboss',-file => $comp_output);
my $comp_aln = $comp_str->next_aln();
my $comp_pos = $comp_aln->column_from_residue_number('revcomp',1);

$logger->info("Alignment pos is $comp_pos");


Can someone please tell me what is going wrong here?


-- 
-Neeti
Even my blood says, B positive
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data.zip
Type: application/zip
Size: 4456 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070223/21658b7d/attachment-0004.zip>

From bix at sendu.me.uk  Fri Feb 23 13:55:24 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 23 Feb 2007 13:55:24 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>	<358f4d650702230159n3a28fb03k6ecd850fccfed796@mail.gmail.com>
	<Pine.OSF.4.58.0702231131230.1583589@cbi1c.internal.sanger.ac.uk>
Message-ID: <45DEF24C.1010303@sendu.me.uk>

James Smith wrote:
> On Fri, 23 Feb 2007, Albert Vilella wrote:
> 
>> now that we are at this pattern matching thread, I was wondering if
>> any perl guru could enlighten me on the issue of matching exact
>> sequence patterns on a gapped target sequence. E.g.:
>>
>> my $seq = "CGATCAACGAATCGTACGTACTC";
>> my $gapped_seq =
>> "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
>>
>> and one would like to get as a result:
>>
>> "CG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTC"
>>
>> which is the match of $seq but in $gapped_seq.
> 
> Try...
> 
>  my $seq = "CGATCAACGAATCGTACGTACTC";
>  my $gapped_seq =
>    "GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";
> 
>  my $regexp = '('.join('-*?',split//,$seq).')';
> 
>  if( $gapped_seq =~ /$regexp/ ) {
>    print "Match is $1\n";
>  } else {
>    print "No match\n";
>  }

That's great stuff. If you were matching thousands of different $seq 
against the same very large $gapped_seq, and only needed the first match 
of $seq in $gapped_seq, the alternative to the above approach (remove 
the gaps from $gapped_seq and do index() matching) will be faster.

Here's one (overly long-winded) way of implementing it, that I found to 
take ~2s vs ~22s for the above regex approach when doing the job on 
999999 copies of $seq:

#!/usr/bin/perl -w
use strict;
use warnings;

my $gapped_seq = 
"GGGGGGCG-------A---TC---AACGA-----ATC---GTA---CGTACTCTACTCGGGGG";

# note the total gap-length at position in gapless 0-based coords
my @gap_lengths;
my $gap_length = 0;
while ($gapped_seq =~ /(-+)/g) {
   my $match = $1;
   my $prev_length = $gap_length;
   $gap_length += length($match);
   my $end = pos($gapped_seq) - $gap_length - 1;
   push(@gap_lengths, $prev_length) for (1..$end-$#gap_lengths);
}
push(@gap_lengths, $gap_length) for (1..(length($gapped_seq) - 
@gap_lengths - $gap_length));

# remove the gaps
my $gapless_seq = $gapped_seq;
$gapless_seq =~ s/-//g;

# now for each of thousands of seqs...
my $seq = 'CGATCAACGAATCGTACGTACTC';
my @seqs;
for (1..999999) {
   push(@seqs, $seq);
}
foreach my $seq (@seqs) {
   my $start = index($gapless_seq, $seq);
   if ($start == -1) {
     print "No match found for seq '$seq'\n";
     next;
   }
   my $end = $start + length($seq) - 1;

   # calculate the coords in $gapped_seq
   $start = $start + $gap_lengths[$start];
   $end = $end + $gap_lengths[$end];

   my $result = substr($gapped_seq, $start, ($end - $start + 1));
   #print $result, "\n";
}

exit;


From MEC at stowers-institute.org  Fri Feb 23 15:54:57 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 09:54:57 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes with
	multiple values
In-Reply-To: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>

Lincoln, and other Bio::DB::SeqFeature wanderers:

I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
does not respect the following:
 
"Multiple attributes of the same type are indicated by separating the
values with the comma "," character"  (c.f.
http://www.sequenceontology.org/gff3.shtml)
 
This one-liner demonstrates the problem:
 
perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
"J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
-name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
J	A	PH	1	2	.	.	.
foo=bar;foo=blat;Name=mec

Do you agree this is a problem? 
 
The fix is in the post-sig patch to
/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
stylistic privilege of promoting any ID, Parent, or Name attribute to
the front of column 9, so output is now:

J	A	PH	1	2	.	.	.
Name=mec;foo=bar,blat

Do you agree this is better?

I am poised to commit it, as well as the functionally same patch to the
equivilent function in Bio/Graphics/FeatureBase.pm

All clear?

-- Malcolm Cook

  
*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,498 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     
!      push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     #push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
 

From MEC at stowers-institute.org  Fri Feb 23 17:08:11 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 11:08:11 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	withmultiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F509@exchkc02.stowers-institute.org>

Oy,

I hit send too soon.  The patch I send had my new attribute encoder
commented out.  It should've been: 


*** NormalizedFeature.pm	2 Feb 2007 21:05:42 -0000	1.25
--- NormalizedFeature.pm	23 Feb 2007 17:06:37 -0000
***************
*** 481,494 ****
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
! 
!     push @result,join '=',$self->escape($t),$self->escape($_) foreach
@values;
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   push @result,"ID=".$self->escape($id)                     if defined
$id;
!   push @result,"Parent=".$self->escape($parent->primary_id) if defined
$parent;
!   push @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  
--- 481,497 ----
      next if $t eq 'load_id';
      next if $t eq 'parent_id';
      foreach (@values) { s/\s+$// } # get rid of trailing whitespace
!     # push @result,join '=',$self->escape($t),$self->escape($_)
foreach @values; 
!     # NO! Multiple attributes of the same type are indicated by
!     # separating the values with the comma "," character - per
!     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
!     push @result,join '=',$self->escape($t),join(',', map
{$self->escape($_)} @values);
    }
    my $id   = $self->primary_id;
    my $name = $self->display_name;
!   unshift @result,"ID=".$self->escape($id)                     if
defined $id;
!   unshift @result,"Parent=".$self->escape($parent->primary_id) if
defined $parent;
!   unshift @result,"Name=".$self->escape($name)                   if
defined $name;
    return join ';', at result;
  }
  

Malcolm


From lstein at cshl.edu  Fri Feb 23 17:16:01 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 12:16:01 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
References: <6dce9a0b0701301446w7fc31d6eufe27442fecd0f20e@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F501@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>

Hi Malcom,

You're quite right, and I appreciate your work in tracking down and fixing
it. Before you commit the patch, can you confirm that the loader is working
correctly so that comma-separated values are read back into the data
structure as multiple attributes?

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, and other Bio::DB::SeqFeature wanderers:
>
> I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> does not respect the following:
>
> "Multiple attributes of the same type are indicated by separating the
> values with the comma "," character"  (c.f.
> http://www.sequenceontology.org/gff3.shtml)
>
> This one-liner demonstrates the problem:
>
> perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> J       A       PH      1       2       .       .       .
> foo=bar;foo=blat;Name=mec
>
> Do you agree this is a problem?
>
> The fix is in the post-sig patch to
> /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> stylistic privilege of promoting any ID, Parent, or Name attribute to
> the front of column 9, so output is now:
>
> J       A       PH      1       2       .       .       .
> Name=mec;foo=bar,blat
>
> Do you agree this is better?
>
> I am poised to commit it, as well as the functionally same patch to the
> equivilent function in Bio/Graphics/FeatureBase.pm
>
> All clear?
>
> -- Malcolm Cook
>
>
>
> *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> ***************
> *** 481,494 ****
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   push @result,"ID=".$self->escape($id)                     if defined
> $id;
> !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> $parent;
> !   push @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
> --- 481,498 ----
>       next if $t eq 'load_id';
>       next if $t eq 'parent_id';
>       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> !
> !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> @values;
> !     # NO! Multiple attributes of the same type are indicated by
> !     # separating the values with the comma "," character - per
> !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> !     #push @result,join '=',$self->escape($t),join(',', map
> {$self->escape($_)} @values);
>     }
>     my $id   = $self->primary_id;
>     my $name = $self->display_name;
> !   unshift @result,"ID=".$self->escape($id)                     if
> defined $id;
> !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> defined $parent;
> !   unshift @result,"Name=".$self->escape($name)                   if
> defined $name;
>     return join ';', at result;
>   }
>
>
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From aaron.j.mackey at gsk.com  Fri Feb 23 14:36:18 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Fri, 23 Feb 2007 09:36:18 -0500
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <45DDFED1.1090503@watson.wustl.edu>
Message-ID: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>

The fundamental difference (in my mind) between a feature and an 
annotation, is that a feature has a location/range, and thus the 
information represented in the feature is applicable only to that 
location/range.  An annotation, on the other hand, is "global", or at 
least non-localizable (note: a feature with a "fuzzy" location of 
"somewhere along this sequence, but I'm not sure where" is still not 
global - if you did/could know the location, you'd describe it as a 
feature, so it shouldn't be represented with an annotation).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:

> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
> 
> I get the impression they are designed to do similar things.  If so is 
> one deprecated and the other preferred?
> 
> If their responsibilities are orthogonal to each other, what sorts of 
> tasks are suited to each?
> 
> Thanks,
> Michael
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From MEC at stowers-institute.org  Fri Feb 23 18:46:00 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 23 Feb 2007 12:46:00 -0600
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>

Lincoln,
 
OK.  I'll do that...
 
...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ .... 
 
...ok - parse_attributes _looks_ right to me
 
...so, let's try it
 
#load a feature into a new database:
 
bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
-create -user test -pass test <(echo -e
"J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,blat;Name=mec\n")
 
#It loaded ok.  Now, let's print it out in GFF3:
 
perl -MBio::DB::SeqFeature::Store -e 'foreach
(Bio::DB::SeqFeature::Store->new(-dsn =>
"dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->featu
res(-type => "PH:A")) {print $_->gff3_string . "\n"}'
J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat

#output looks good to me

Note, I tried loading attributes foo=bar;foo=blat and it came back
foo=bar,blat.  So, you can load either way.

I'll commit later today.

--Malcolm  

 
________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, February 23, 2007 11:16 AM
	To: Cook, Malcolm
	Cc: bioperl list; lstein at cshl.org
	Subject: Re: Bio::DB::SeqFeature to GFF mishandles attributes
with multiple values
	
	
	Hi Malcom,
	
	You're quite right, and I appreciate your work in tracking down
and fixing it. Before you commit the patch, can you confirm that the
loader is working correctly so that comma-separated values are read back
into the data structure as multiple attributes? 
	
	Lincoln
	
	
	On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote: 

		Lincoln, and other Bio::DB::SeqFeature wanderers:
		
		I find that generating GFF from a Bio::DB::SeqFeature
using gff3_string
		does not respect the following:
		
		"Multiple attributes of the same type are indicated by
separating the 
		values with the comma "," character"  (c.f.
		http://www.sequenceontology.org/gff3.shtml)
		
		This one-liner demonstrates the problem:
		
		perl -MBio::DB::SeqFeature -e 'print
Bio::DB::SeqFeature->new(-seq_id =>
		"J", -start => 1, -end => 2, -primary_tag => 'PH',
-source => 'A',
		-name => 'mec', -attributes => {foo =>  [qw(bar
blat)]})->gff3_string' 
		J       A       PH      1       2       .       .
.
		foo=bar;foo=blat;Name=mec
		
		Do you agree this is a problem?
		
		The fix is in the post-sig patch to
		/Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also
took the 
		stylistic privilege of promoting any ID, Parent, or Name
attribute to
		the front of column 9, so output is now:
		
		J       A       PH      1       2       .       .
.
		Name=mec;foo=bar,blat
		
		Do you agree this is better? 
		
		I am poised to commit it, as well as the functionally
same patch to the
		equivilent function in Bio/Graphics/FeatureBase.pm
		
		All clear?
		
		-- Malcolm Cook
		
		
		*** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
		--- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
		***************
		*** 481,494 ****
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace 
		!
		!     push @result,join
'=',$self->escape($t),$self->escape($_) foreach
		@values;
		    }
		    my $id   = $self->primary_id;
		    my $name = $self->display_name;
		!   push @result,"ID=".$self->escape($id)
if defined 
		$id;
		!   push
@result,"Parent=".$self->escape($parent->primary_id) if defined
		$parent;
		!   push @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result; 
		  }
		
		--- 481,498 ----
		      next if $t eq 'load_id';
		      next if $t eq 'parent_id';
		      foreach (@values) { s/\s+$// } # get rid of
trailing whitespace
		!
		!      push @result,join
'=',$self->escape($t),$self->escape($_) foreach 
		@values;
		!     # NO! Multiple attributes of the same type are
indicated by
		!     # separating the values with the comma ","
character - per
		!     # http://www.sequenceontology.org/gff3.shtml.  Do
it this way:
		!     #push @result,join '=',$self->escape($t),join(',',
map
		{$self->escape($_)} @values);
		    }
		    my $id   = $self->primary_id; 
		    my $name = $self->display_name;
		!   unshift @result,"ID=".$self->escape($id)
if
		defined $id;
		!   unshift
@result,"Parent=".$self->escape($parent->primary_id) if 
		defined $parent;
		!   unshift @result,"Name=".$self->escape($name)
if
		defined $name;
		    return join ';', at result;
		  }
		
		
	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Fri Feb 23 18:49:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 23 Feb 2007 12:49:44 -0600
Subject: [Bioperl-l] Bio::AnnotatableI vs Bio::FeatureHolderI
In-Reply-To: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
References: <OFC6F4AA9E.AE02C572-ON8525728B.004FF987-8525728B.00503B29@gsk.com>
Message-ID: <FEDC420E-AE3A-4AD4-A30B-54F8DF904D84@uiuc.edu>

To add to that, there's a HOWTO describing the differences:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

I agree w/ Aaron; if it has a location it's a feature,  otherwise  
it's an annotation.

chris

On Feb 23, 2007, at 8:36 AM, aaron.j.mackey at gsk.com wrote:

> The fundamental difference (in my mind) between a feature and an
> annotation, is that a feature has a location/range, and thus the
> information represented in the feature is applicable only to that
> location/range.  An annotation, on the other hand, is "global", or at
> least non-localizable (note: a feature with a "fuzzy" location of
> "somewhere along this sequence, but I'm not sure where" is still not
> global - if you did/could know the location, you'd describe it as a
> feature, so it shouldn't be represented with an annotation).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 02/22/2007 03:36:33 PM:
>
>> Are Bio::AnnotatableI and Bio::FeatureHolderI competing interfaces?
>>
>> I get the impression they are designed to do similar things.  If  
>> so is
>> one deprecated and the other preferred?
>>
>> If their responsibilities are orthogonal to each other, what sorts of
>> tasks are suited to each?
>>
>> Thanks,
>> Michael
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Fri Feb 23 21:20:26 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 23 Feb 2007 16:20:26 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature to GFF mishandles attributes
	with multiple values
In-Reply-To: <CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
References: <6dce9a0b0702230916o684e490cu9d01d3be3e2ae061@mail.gmail.com>
	<CED81D34E37D5043A1211565277A51E50768F50F@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0702231320j1f24d4b4oe33bce6d2da96db7@mail.gmail.com>

Excellent!

Lincoln

On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
>  Lincoln,
>
> OK.  I'll do that...
>
> ...let's see, a quick squiz at Bio/DB/SeqFeature/Store/ ....
>
> ...ok - parse_attributes _looks_ right to me
>
> ...so, let's try it
>
> #load a feature into a new database:
>
> bp_seqfeature_load.PLS -dsn 'dbi:mysql:database=test;host=mysql-dev'
> -create -user test -pass test <(echo -e "J\tA\tPH\t1\t2\t.\t.\t.\tfoo=bar,
> blat;Name=mec\n")
>
> #It loaded ok.  Now, let's print it out in GFF3:
>
> perl -MBio::DB::SeqFeature::Store -e 'foreach
> (Bio::DB::SeqFeature::Store->new(-dsn =>
> "dbi:mysql:database=test;host=mysql-dev;user=test;password=test")->features(-type
> => "PH:A")) {print $_->gff3_string . "\n"}'
> J A PH 1 2 . . . Name=mec;ID=1;foo=bar,blat
>
> #output looks good to me
>
> Note, I tried loading attributes foo=bar;foo=blat and it came back
> foo=bar,blat.  So, you can load either way.
>
> I'll commit later today.
>
> --Malcolm
>
>
>  ------------------------------
> *From:* lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] *On
> Behalf Of *Lincoln Stein
> *Sent:* Friday, February 23, 2007 11:16 AM
> *To:* Cook, Malcolm
> *Cc:* bioperl list; lstein at cshl.org
> *Subject:* Re: Bio::DB::SeqFeature to GFF mishandles attributes with
> multiple values
>
> Hi Malcom,
>
> You're quite right, and I appreciate your work in tracking down and fixing
> it. Before you commit the patch, can you confirm that the loader is working
> correctly so that comma-separated values are read back into the data
> structure as multiple attributes?
>
> Lincoln
>
> On 2/23/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
> >
> > Lincoln, and other Bio::DB::SeqFeature wanderers:
> >
> > I find that generating GFF from a Bio::DB::SeqFeature using gff3_string
> > does not respect the following:
> >
> > "Multiple attributes of the same type are indicated by separating the
> > values with the comma "," character"  (c.f.
> > http://www.sequenceontology.org/gff3.shtml)
> >
> > This one-liner demonstrates the problem:
> >
> > perl -MBio::DB::SeqFeature -e 'print Bio::DB::SeqFeature->new(-seq_id =>
> > "J", -start => 1, -end => 2, -primary_tag => 'PH', -source => 'A',
> > -name => 'mec', -attributes => {foo =>  [qw(bar blat)]})->gff3_string'
> > J       A       PH      1       2       .       .       .
> > foo=bar;foo=blat;Name=mec
> >
> > Do you agree this is a problem?
> >
> > The fix is in the post-sig patch to
> > /Bio/DB/SeqFeature/NormalizedFeature.pm, in which I also took the
> > stylistic privilege of promoting any ID, Parent, or Name attribute to
> > the front of column 9, so output is now:
> >
> > J       A       PH      1       2       .       .       .
> > Name=mec;foo=bar,blat
> >
> > Do you agree this is better?
> >
> > I am poised to commit it, as well as the functionally same patch to the
> > equivilent function in Bio/Graphics/FeatureBase.pm
> >
> > All clear?
> >
> > -- Malcolm Cook
> >
> >
> >
> > *** NormalizedFeature.pm 2 Feb 2007 21:05:42 -0000 1.25
> > --- NormalizedFeature.pm 23 Feb 2007 15:37:01 -0000
> > ***************
> > *** 481,494 ****
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !     push @result,join '=',$self->escape($t),$self->escape($_) foreach
> > @values;
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   push @result,"ID=".$self->escape($id)                     if defined
> >
> > $id;
> > !   push @result,"Parent=".$self->escape($parent->primary_id) if defined
> > $parent;
> > !   push @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> > --- 481,498 ----
> >       next if $t eq 'load_id';
> >       next if $t eq 'parent_id';
> >       foreach (@values) { s/\s+$// } # get rid of trailing whitespace
> > !
> > !      push @result,join '=',$self->escape($t),$self->escape($_) foreach
> >
> > @values;
> > !     # NO! Multiple attributes of the same type are indicated by
> > !     # separating the values with the comma "," character - per
> > !     # http://www.sequenceontology.org/gff3.shtml.  Do it this way:
> > !     #push @result,join '=',$self->escape($t),join(',', map
> > {$self->escape($_)} @values);
> >     }
> >     my $id   = $self->primary_id;
> >     my $name = $self->display_name;
> > !   unshift @result,"ID=".$self->escape($id)                     if
> > defined $id;
> > !   unshift @result,"Parent=".$self->escape($parent->primary_id) if
> > defined $parent;
> > !   unshift @result,"Name=".$self->escape($name)                   if
> > defined $name;
> >     return join ';', at result;
> >   }
> >
> >
> >
> >
>
>
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
>
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From enrique_rulz at yahoo.com  Sat Feb 24 21:23:59 2007
From: enrique_rulz at yahoo.com (Kurt Gobain)
Date: Sat, 24 Feb 2007 13:23:59 -0800 (PST)
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <200702231025.39416.heikki@sanbi.ac.za>
References: <9107936.post@talk.nabble.com>
	<200702231025.39416.heikki@sanbi.ac.za>
Message-ID: <9137941.post@talk.nabble.com>


Heikki Lehvaslaiho wrote:
> 
> Kurt,
> 
> There are  few things in your code to note:
> 
> - regexp /C*T/ matches any T preceded by zero or more Cs,
>   not what you meant
> - $- and $+ are among the "expensive" perl functions worth 
>   not using unless you have to. Using them once in your 
>   code slows execution down considerable. There is always 
>   an other way.
> - Keep in mind what you want to use the match positions for: 
>   Human readable locations usually start counting with 1 but
>   perl code uses 0 as the first location. The code below assumes
>   you want to print the locations out.
> 
> Study my example code below.
> 
> Yours,
> 	-Heikki
> 
> ###################################################################
> #!/usr/bin/perl
> $seq = "GATCAAT";
> #$pattern=  'C*T';
> $pattern=  'C.*T';
> 
> while ($seq =~ m/($pattern)/gi) {
> 
>     $match = $1;
>     $end = pos($seq);
>     $start = $end - length($match) +1;
> 
>     print "$match : $start - $end\n";
> }
> 
> ###################################################################
> 
> 


Thanx for the instant reply!...Sorry cudn reply earlier..

Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
& 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
the code which I need to write says T*A shod be only the input not T.*A..So
Can we use replacment reg ex...sumthing like 
$pattern =~  s/.*/*/...or sumthing else...
But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

N e ways thanx a lot again for the code...Hope to listen frm you soon!

Kurt!


-- 
View this message in context: http://www.nabble.com/Sequence-matching-problem%21-tf3275153.html#a9137941
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From biology0046 at hotmail.com  Sun Feb 25 04:14:51 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 04:14:51 +0000
Subject: [Bioperl-l] how to change align output format
Message-ID: <BAY109-F2409DB6CAA116F289F8F17B48C0@phx.gbl>

Dear all:

I have problems in changing the output format of clustal alignment.
I use the Bio::Tools::Run::Alignment::Clustalw module to carry out an 
mulitple sequences alignment, then i use the Bio::AlignIO module to write 
out the alignment. Scripts like this:
my 
$aln_out=Bio::AlignIO->new(-file=>">./clustal/${outfilename}.aln",-format=>'clustalw');

The output :
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dere_GLEANR_9270       
..............S.............................................
FBgn0000097            
..............S.............................................
dsec_GLEANR_671        
..............S.............................................
dsim_GLEANR_6613       
..............S.............................................
dyak_GLEANR_1669       
..............S.............................................
                                     .


dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dere_GLEANR_9270       
............................................................
FBgn0000097            
............................................................
dsec_GLEANR_671        
............................................................
dsim_GLEANR_6613       
............................................................
dyak_GLEANR_1669       
............................................................

But , I want to change the output format as below, which do not change the 
identical residues into "." character. 
dere_GLEANR_9270       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dyak_GLEANR_1669       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsec_GLEANR_671        
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dsim_GLEANR_6613       
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
FBgn0000097            
MSKMKMLPVQLSLNSLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
dana_GLEANR_16071      
MSKMKMLPVQLSLNQLNPGIWSDVLWRCPPAPSSQLAELKTQLPPSLPSDPRLWSREDVL
                       
**************.*********************************************

dere_GLEANR_9270       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dyak_GLEANR_1669       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsec_GLEANR_671        
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dsim_GLEANR_6613       
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
FBgn0000097            
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
dana_GLEANR_16071      
VFLRFCVREFDLPKLDFDLFQMNGKALCLLTRADFGHRCPGAGDVLHNVLQMLIIESHMM
                       
************************************************************

Are their any parameters in the package that can be changed so that i can 
get the postier output format? Thank you Sincerely!

Jiang

_________________________________________________________________
??????????????? MSN Hotmail?  http://www.hotmail.com  


From bix at sendu.me.uk  Sun Feb 25 10:53:48 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:53:48 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
Message-ID: <45E16ABC.3060405@sendu.me.uk>

Tels,

I've forwarded this to the author of the module, Nat Goodman, and to the 
Bioperl mailing list 
(http://www.bioperl.org/wiki/Mailing_lists#Main_BioPerl_list).

But actually we have Bio::Graph::* as tentatively deprecated:
http://www.bioperl.org/wiki/Deprecated_modules#Bio::Graph_modules
so any further work on it doesn't seem worthwhile.


-------- Original Message --------
Subject: Bio::Graph::SimpleGraph
Date: Sat, 24 Feb 2007 12:07:31 +0100
From: Tels <nospam-abuse at bloodgate.com>

Moin,

I just stumble dover Bio::Graph::SimpleGraph and read this comment:

"This is a simple, hopefully fast undirected graph package. The only reason
this exists is that the standard CPAN Graph pacakge, Graph::Base, is
seriously broken."

Really sad to see people always reinventing the wheel :/

Anyway, I wonder if you would like to make your module support Graph::Easy
(http://search.cpan.org/~tels/Graph-Easy/)? I would be willing to submit
patches and do testing/documention for that.

All the best,

Tels


From bix at sendu.me.uk  Sun Feb 25 10:45:21 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Sun, 25 Feb 2007 10:45:21 +0000
Subject: [Bioperl-l] Sequence matching problem!
In-Reply-To: <9137941.post@talk.nabble.com>
References: <9107936.post@talk.nabble.com>	<200702231025.39416.heikki@sanbi.ac.za>
	<9137941.post@talk.nabble.com>
Message-ID: <45E168C1.80306@sendu.me.uk>

Kurt Gobain wrote:
> Code works perfectly fine...but...sum time its not givin reqd o/p..For eg.
> If I type sequence as "GATCAAGTCAGGAT" & pattern to be matched as T.*A..then
> o/p which I am getting frm above prog is TCAAGTCAGGA instead of TCA...
> & 1 more thing Is there n e chance by which I can replace T*A to T.*A cos
> the code which I need to write says T*A shod be only the input not T.*A..So
> Can we use replacment reg ex...sumthing like 
> $pattern =~  s/.*/*/...or sumthing else...
> But its kinda givin sum error again...Dam! Regex is really hairy!!...:P

These aren't Bioperl questions. For regular expression help see:
http://perldoc.perl.org/perlretut.html

Basically, you want a non-greedy match, so T.*?A

You can convert T*A by doing s/\*/.*?/

Here are some more regexs for you:
s/sum/some/g
s/frm/from/g
s/n e/any/g
etc...


From biology0046 at hotmail.com  Sun Feb 25 13:28:34 2007
From: biology0046 at hotmail.com (=?gb2312?B?va0gzsTi/Q==?=)
Date: Sun, 25 Feb 2007 13:28:34 +0000
Subject: [Bioperl-l] AlignIO problems
Message-ID: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>

hi, all,
I use the AlignIO module to convert the alignment file.
my original file is :
CLUSTAL W(1.81) multiple sequence alignment


dana_GLEANR_11249      
MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
dere_GLEANR_7213       
...V...................I....................................
dgri_GLEANR_6962       
.......................I....................................
FBgn0004638            
.......................I....................................
dmoj_GLEANR_6118       
...........N...........I....................................
dper_GLEANR_18885      
...V...................I....................................
dpse_GLEANR_14384      
...V...................I....................................
dsec_GLEANR_3096       
.................N.....I....................................
dsim_GLEANR_9744       
-----------------------------...............................
dvir_GLEANR_4811       
.......................I....................................
dwil_GLEANR_10869      
.......................I....................................
dyak_GLEANR_13576      
.......................I....................................


dana_GLEANR_11249      
YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
.................L..........................................
dper_GLEANR_18885      
............................................................
dpse_GLEANR_14384      
............................................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      
VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
dere_GLEANR_7213       
............................................................
dgri_GLEANR_6962       
............................................................
FBgn0004638            
............................................................
dmoj_GLEANR_6118       
..............................V.D...........................
dper_GLEANR_18885      
.......................E....................................
dpse_GLEANR_14384      
.......................E....................................
dsec_GLEANR_3096       
............................................................
dsim_GLEANR_9744       
............................................................
dvir_GLEANR_4811       
............................................................
dwil_GLEANR_10869      
............................................................
dyak_GLEANR_13576      
............................................................


dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
dere_GLEANR_7213       ...............................
dgri_GLEANR_6962       ...............................
FBgn0004638            ...............................
dmoj_GLEANR_6118       ............Q..................
dper_GLEANR_18885      ...............................
dpse_GLEANR_14384      ...............................
dsec_GLEANR_3096       ...............................
dsim_GLEANR_9744       ...............................
dvir_GLEANR_4811       ...............................
dwil_GLEANR_10869      ...............................
dyak_GLEANR_13576      ...............................


I want to change those "." characters back to alphabetic expression, then i 
write the code like this:
use Bio::AlignIO;
my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
                      -format => 'clustalw');
my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
                       -format =>'clustalw');
while (my $aln=$in->next_aln() ){
    $aln->unmatch();
    $aln->set_displayname_flat();
    $out->write_aln($aln);
}

but when i run the code, there are error message like:

-------------------- WARNING ---------------------
MSG: Got a sequence with no letters in it cannot guess alphabet []
---------------------------------------------------

------------- EXCEPTION  -------------
MSG: No sequence with name [dsim_GLEANR_9744/1-182]
STACK Bio::SimpleAlign::displayname 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2307
STACK Bio::SimpleAlign::set_displayname_flat 
/home/src/bioperl/bioperl-live/Bio/SimpleAlign.pm:2374
STACK toplevel aligntest.pl:11

--------------------------------------

I don't know where is the problem.

Jiang

_________________________________________________________________
???? MSN Explorer:   http://explorer.msn.com/lccn/  


From cjfields at uiuc.edu  Sun Feb 25 19:58:23 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 25 Feb 2007 13:58:23 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
References: <BAY109-F38C3F2A7B77BE0A42F9ADEB48C0@phx.gbl>
Message-ID: <19EA5F46-D1A4-45B5-B2DB-55194F79215C@uiuc.edu>

Bio::AlignIO::clustalw doesn't work with masked sequences; it parses  
the output quite literally as is, so any [.-] are treated as gaps.   
If the seqs are 100% identical then you will have a seq with 100%  
gaps and no sequence, thus giving you the warnings you see.

The best way to accomplish what you want is to not mask the sequence  
alignment to begin with when running clustalw/muscle/whatever.   
Exactly how are you generating these?  When I use clustalw no  
identity masking occurs by default.

chris

On Feb 25, 2007, at 7:28 AM, ? ?? wrote:

> hi, all,
> I use the AlignIO module to convert the alignment file.
> my original file is :
> CLUSTAL W(1.81) multiple sequence alignment
>
>
> dana_GLEANR_11249       
> MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
> dere_GLEANR_7213       ...V...................I....................... 
> .............
> dgri_GLEANR_6962       .......................I....................... 
> .............
> FBgn0004638            .......................I....................... 
> .............
> dmoj_GLEANR_6118       ...........N...........I....................... 
> .............
> dper_GLEANR_18885      ...V...................I....................... 
> .............
> dpse_GLEANR_14384      ...V...................I....................... 
> .............
> dsec_GLEANR_3096       .................N.....I....................... 
> .............
> dsim_GLEANR_9744        
> -----------------------------...............................
> dvir_GLEANR_4811       .......................I....................... 
> .............
> dwil_GLEANR_10869      .......................I....................... 
> .............
> dyak_GLEANR_13576      .......................I....................... 
> .............
>
>
>
> dana_GLEANR_11249       
> YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       .................L............................. 
> .............
> dper_GLEANR_18885      ............................................... 
> .............
> dpse_GLEANR_14384      ............................................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249       
> VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       ..............................V.D.............. 
> .............
> dper_GLEANR_18885      .......................E....................... 
> .............
> dpse_GLEANR_14384      .......................E....................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
> dere_GLEANR_7213       ...............................
> dgri_GLEANR_6962       ...............................
> FBgn0004638            ...............................
> dmoj_GLEANR_6118       ............Q..................
> dper_GLEANR_18885      ...............................
> dpse_GLEANR_14384      ...............................
> dsec_GLEANR_3096       ...............................
> dsim_GLEANR_9744       ...............................
> dvir_GLEANR_4811       ...............................
> dwil_GLEANR_10869      ...............................
> dyak_GLEANR_13576      ...............................
>
>
> I want to change those "." characters back to alphabetic  
> expression, then i write the code like this:
> use Bio::AlignIO;
> my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
>                      -format => 'clustalw');
> my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
>                       -format =>'clustalw');
> while (my $aln=$in->next_aln() ){
>    $aln->unmatch();
>    $aln->set_displayname_flat();
>    $out->write_aln($aln);
> }
>
> but when i run the code, there are error message like:
>
> -------------------- WARNING ---------------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> ---------------------------------------------------
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name [dsim_GLEANR_9744/1-182]
> STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ 
> Bio/SimpleAlign.pm:2307
> STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ 
> bioperl-live/Bio/SimpleAlign.pm:2374
> STACK toplevel aligntest.pl:11
>
> --------------------------------------
>
> I don't know where is the problem.
>
> Jiang
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cristiangary at gmail.com  Sun Feb 25 21:04:57 2007
From: cristiangary at gmail.com (Cristian Gary)
Date: Sun, 25 Feb 2007 18:04:57 -0300
Subject: [Bioperl-l] problem with blast report to ncbi webpage
Message-ID: <95ef8cd0702251304o45bea6a0tcedc59156cb0cfe4@mail.gmail.com>

i have a problem with the blast report to the ncbi server.  the time to wait
the Rids dont showme any result.
the problem is the ncbi server o the biperl version.?
pd: the same code works very well a 3 weeks ago.


-- 
"El conocimiento le pertecene  a la humanidad"

"Gnu/linux   -------- free your mind......
www.kubuntu.org


From granjeau at tagc.univ-mrs.fr  Mon Feb 26 09:17:15 2007
From: granjeau at tagc.univ-mrs.fr (Samuel GRANJEAUD - IR/IFR137)
Date: Mon, 26 Feb 2007 10:17:15 +0100
Subject: [Bioperl-l] Reading a XML sequence (UniParc) into a BioSeq object
Message-ID: <45E2A59B.6080300@tagc.univ-mrs.fr>

Hello !

I would like to fill a BioSeq object with the output from a dbfetch
request at EI on UniParc database (which replies only XML code, as I am
interested in references). If somebody could tell which BioPerl object
to use or a way or convert it in Swiss format or could tell me the way
to do it or has got a piece of code (is
http://doc.bioperl.org/bioperl-live/Bio/SeqIO/interpro.html a good
starting point), I would appreciate a lot.

Best regards,
--Samuel

<entry accession="UPI00004A0D4A">
<dbReferenceList>
    <dbReference db="EMBL" id="CAI39485" version="1" version_i="1" 
active="Y" created="04-Jan-2005" last="15-Dec-2006"/>
    <dbReference db="UniProtKB/TrEMBL" id="Q5JVT0" version="1" 
version_i="1" active="N" created="15-Feb-2005" last="06-Feb-2007"/>
    <dbReference db="ENSEMBL" id="ENSP00000352958" version_i="2" 
active="Y" created="03-Apr-2006" last="27-Nov-2006"/>
    <dbReference db="IPI" id="IPI00418471" version="4" version_i="4" 
active="N" created="07-Mar-2005" last="07-Mar-2005"/>
    <dbReference db="IPI" id="IPI00646867" version="1" version_i="1" 
active="N" created="06-Sep-2005" last="06-Oct-2006"/>
    <dbReference db="VEGA" id="OTTHUMP00000019225" version_i="1" 
active="N" created="15-Aug-2005" last="02-Dec-2005"/>
</dbReferenceList>
<sequence length="431" crc64="8913D1F04A71CCFB">
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGV
YATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDK
VRFLEQQNKILLAELEQLKGQGKSRLGDLYEEEMRELRRQVDQLTNDKARVEVERDNLAE
DIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVESLQEEIAFLKKLHEE
EIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE
AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQD
TIGRLQDEIQNMKEEMARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSS
LNLRGKHFISL
</sequence>
</entry>


From bix at sendu.me.uk  Mon Feb 26 11:46:39 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 26 Feb 2007 11:46:39 +0000
Subject: [Bioperl-l] [Fwd: Bio::Graph::SimpleGraph]
In-Reply-To: <45E16ABC.3060405@sendu.me.uk>
References: <45E16ABC.3060405@sendu.me.uk>
Message-ID: <45E2C89F.1020402@sendu.me.uk>

Nat replied, but I messed up to To:s so his reply didn't make it to the
list. Here's what he said:


Nathan (Nat) Goodman wrote:
Hi Tels

I agree it's sad to reinvent the wheel, but I don't think that's what
happened here. Your module seems to be focused on rendering graphs while
my module is concerned with computations on graphs.

In any case, as Sendu notes, SimpleGraph is in the process of being
deprecated. I fully support this move. It was intended to be a stopgap
until the main Perl Graph module was fixed.  Since that has now
happened, it's time for SimpleGraph to retire.

For the benefit of anyone using Graph: last I checked (six months or
more ago), it had serious performance problems on large graphs (probably
not too much of a surprise), and also was inexplicably slow on graphs
with edge attributes.  I see that the latter bug is marked "resolved" in
CPAN, but there's no indication of when or how.  We've moved to Boost
for graphs as large as the human protein interaction network.

Best,
Nat


From sanjib at bic.boseinst.ernet.in  Mon Feb 26 05:23:36 2007
From: sanjib at bic.boseinst.ernet.in (Sanjib Kumar Gupta)
Date: Mon, 26 Feb 2007 10:53:36 +0530
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070221064743.M54123@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
Message-ID: <20070226052336.M74918@bic.boseinst.ernet.in>

Hi
I have been running this script for some time and it was running fine. I am 
using this linux machine with live IP(no proxy). But suudenly it has stopped 
working with this errors

waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
xx.pep

-------------------- WARNING ---------------------
MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
Content-Length: 497
Content-Type: application/x-www-form-urlencoded

DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTAGDTLDVF
TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVTAFTSLPV
YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAGAAVIAMV
HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_STATISTI
CS=off&EXPECT=1e-
10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62&ENTREZ_
QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp

<HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad 
hostname 'www.ncbi.nlm.nih.gov')
</BODY>
</HTML>

---------------------------------------------------
waiting...waiting...
-------------------- WARNING ---------------------
MSG:  <HTML>
<HEAD><TITLE>An Error Occurred</TITLE></HEAD>
<BODY>
<H1>An Error Occurred</H1>
500 Internal Server Error
</BODY>
</HTML>

---------------------------------------------------

Though I am able to see the ncbi page from browser but am unable to ping ot 
trace route to the server.

Please help me.

On Wed, 21 Feb 2007 01:00:46 -0500, bioperl-l-request wrote
> Mailing list subscription confirmation notice for mailing list
> Bioperl-l
> 
> We have received a request from 202.141.148.27 for subscription of
> your email address, "sanjib at bic.boseinst.ernet.in", to the
> bioperl-l at lists.open-bio.org mailing list.  To confirm that you want
> to be added to this mailing list, simply reply to this message,
> keeping the Subject: header intact.  Or visit this web page:
> 
>     http://lists.open-bio.org/mailman/confirm/bioperl-
l/d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Or include the following line -- and only the following line -- in a
> message to bioperl-l-request at lists.open-bio.org:
> 
>     confirm d31449c0ad1146c7ae6d2d9b585816664f476568
> 
> Note that simply sending a `reply' to this message should work from
> most mail readers, since that usually leaves the Subject: line in the
> right form (additional "Re:" text in the Subject: is okay).
> 
> If you do not wish to be subscribed to this list, please simply
> disregard this message.  If you think you are being maliciously
> subscribed to the list, or have any other questions, send them to
> bioperl-l-owner at lists.open-bio.org.

--
Sanjib Kumar Gupta
Bioinformatics Centre
Bose Institute
Kolkata 700054, INDIA
Phone  : +91-33-2355 6626, 2816, 2355 4766
Fax    : +91-33-2355 3886
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: n9.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070226/86a0137c/attachment-0004.pl>

From cjfields at uiuc.edu  Mon Feb 26 14:59:21 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 08:59:21 -0600
Subject: [Bioperl-l] Remote blast
In-Reply-To: <20070226052336.M74918@bic.boseinst.ernet.in>
References: <mailman.0.1172037646.4756.bioperl-l@lists.open-bio.org>
	<20070221064743.M54123@bic.boseinst.ernet.in>
	<20070226052336.M74918@bic.boseinst.ernet.in>
Message-ID: <C668C555-39ED-43A9-8B49-C7D0376D971F@uiuc.edu>

I tested this out and got BLAST to work for my test case (single  
fasta seq, since you didn't send any seqs for testing).  It keeps  
querying for the RID in what appears to be an infinite loop (i.e. it  
doesn't get rid of the RID properly); you can see this if you add '- 
verbose => 1' to your parameters.  I don't have time to delve into it  
but from a quick glance it may be due to your looping structure and  
how you are saving your rids.

As for your particular error, could it be something as simple as the  
server was overloaded or down?  It does happen from time to time...

Beyond that I can't make heads or tails of your script.  Was it  
cobbled together from a bunch of others?  If you are doing that you  
can probably expect some bugs to occur.

chris

On Feb 25, 2007, at 11:23 PM, Sanjib Kumar Gupta wrote:

> Hi
> I have been running this script for some time and it was running  
> fine. I am
> using this linux machine with live IP(no proxy). But suudenly it  
> has stopped
> working with this errors
>
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> xx.pep
>
> -------------------- WARNING ---------------------
> MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
> User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.5
> Content-Length: 497
> Content-Type: application/x-www-form-urlencoded
>
> DATABASE=nr&QUERY=%3ENM_005277.3%3B_1414961_1+837%
> 0AMEENMEEGQTQKGCFECCIKCLGGIPYASLIATILLYAGVALFCGCGHEALSGTVNILQTYFEMARTA 
> GDTLDVF
> TMIDIFKYVIYGIAAAFFVYGILLMVEGFFTTGAIKDLYGDFKITTCGRCVSAWFIMLTYLFMLAWLGVT 
> AFTSLPV
> YMYFNLWTICRNTTLVEGANLCLDLRQFGIVTIGEEKKICTVSENFLRMCESTELNMTFHLFIVALAGAG 
> AAVIAMV
> HYLMVLSANWAYVKDACRMQKYEDIKSKEEQELHDIHSTRSKERLNAYT*&COMPOSITION_BASED_S 
> TATISTI
> CS=off&EXPECT=1e-
> 10&SERVICE=plain&FORMAT_OBJECT=Alignment&CMD=Put&MATRIX_NAME=BLOSUM62& 
> ENTREZ_
> QUERY=Xenopus+laevis[Organism]&PROGRAM=blastp
>
> <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Can't connect to www.ncbi.nlm.nih.gov:80 (Bad
> hostname 'www.ncbi.nlm.nih.gov')
> </BODY>
> </HTML>
>
> ---------------------------------------------------
> waiting...waiting...
> -------------------- WARNING ---------------------
> MSG:  <HTML>
> <HEAD><TITLE>An Error Occurred</TITLE></HEAD>
> <BODY>
> <H1>An Error Occurred</H1>
> 500 Internal Server Error
> </BODY>
> </HTML>
>
> ---------------------------------------------------
>
> Though I am able to see the ncbi page from browser but am unable to  
> ping ot
> trace route to the server.
>
> Please help me.


From cjfields at uiuc.edu  Mon Feb 26 15:05:50 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 09:05:50 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
References: <BAY109-F30DD48142FC0984AF9A284B4830@phx.gbl>
Message-ID: <082E0708-6B1C-45CE-B387-429B8B6A8D7A@uiuc.edu>

Make sure to keep this on the list, others may have some input.

You should be able to test the various sequence objects you're  
retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what  
you're expecting, then track down the problematic sequences.  My  
guess is the odd seqs are due to the way you are using Bio::DB::Fasta  
for each of the files.  I'm wondering if you are having problems with  
indices overwriting one another and are thus getting back blank seq  
objects.

You should probably consider just indexing all of your files  
together; according to the POD you can use a single Bio::DB::Fasta to  
index all of the files in one go (indicate the path and use '-glob')  
and retrieve what you need that way.  Either that or separating them  
into separate directories so the indices are also separate.

chris

On Feb 25, 2007, at 9:50 PM, ? ?? wrote:

> Thank you for your help!
> May be you are right, I use the following code to create my seq  
> object arrays:
>          my $outfilename=$dmel;
>          my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta");
>          my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta");
>          my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta");
>          my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta");
>          my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta");
>          my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta");
>          my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta");
>          my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta");
>          my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta");
>          my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta");
>          my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta");
>          my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta");
>          my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana);
>          my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana);
>          my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere);
>          my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere);
>          my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel);
>          my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel);
>          my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec);
>          my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec);
>          my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim);
>          my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim);
>          my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak);
>          my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak);
>          push @prots, $ana_pep_obj;
>          push @cdna, $ana_nuc_obj;
>          push @prots, $ere_pep_obj;
>          push @cdna, $ere_nuc_obj;
>          push @prots, $mel_pep_obj;
>          push @cdna, $mel_nuc_obj;
>          push @prots, $sec_pep_obj;
>          push @cdna, $sec_nuc_obj;
>          push @prots, $sim_pep_obj;
>          push @cdna, $sim_nuc_obj;
>          push @prots, $yak_pep_obj;
>          push @cdna, $yak_nuc_obj;
>
> then I use the @prots as input for  my  $aln=$aln_factory->align 
> (\@prots);
> This method will create align files with sequences masked.
>
> But if I use fasta files(not an object) which contain protein  
> sequences as input, $inputfile='FBgn0000097.pep';
> @params=('outorder'=>'INPUT');
> $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params);
> $aln=$factory->align($inputfile);
> #$aln->gap_char('-');
> $aln->map_chars('\.','-');
> $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw');
> $aln_out->write_aln($aln);
>
> This methods create files without masking~~~
> I think sequence objects created by "get_Seq_by_id" from sequence  
> databases directly are not appropriate.
>
> Thank you for your suggestion again!
>
> Jiang.
>
>> From: Chris Fields <cjfields at uiuc.edu>
>> To: ????? <biology0046 at hotmail.com>
>> Subject: Re: [Bioperl-l] AlignIO problems
>> Date: Sun, 25 Feb 2007 21:26:34 -0600
>>
>> I ran the same using a local fasta formatted file on my system  
>> which  works (no masking).
>>
>> Of note, the gaps were all marked as '.'.  You're gaps were both  
>> '.'  and '-',  which may mean that something is wrong with the seq  
>> objects  themselves.  Maybe SeqIO is misreading them?
>>
>> chris
>>
>> On Feb 25, 2007, at 7:34 PM, ????? wrote:
>>
>>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry  
>>> out  multiple alignment.
>>> my code is:
>>>         my @clustal_param=('outorder'=>'INPUT');
>>>         my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new  
>>> (@clustal_param);
>>>         my  $aln=$aln_factory->align(\@prots);###@prots is   
>>> array  of protein sequence objects
>>>         my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ 
>>> clustal/ ${outfilename}.aln",-format=>'clustalw');
>>>
>>>         $aln_out->write_aln($aln);
>>> This code produce alignment which mask identity residues.
>>> But if i use clustalW directly, the output is normal.
>>> Thank you for your help~
>>>
>>> Jiang
>>
>
> _________________________________________________________________
> ???? MSN Explorer:   http://explorer.msn.com/lccn

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From michael.watson at bbsrc.ac.uk  Mon Feb 26 16:00:31 2007
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Mon, 26 Feb 2007 16:00:31 -0000
Subject: [Bioperl-l] The axis of GC content in Bio::Graphics::glyph:dna
In-Reply-To: <6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
References: <8975119BCD0AC5419D61A9CF1A923E9503E2EB70@iahce2ksrv1.iah.bbsrc.ac.uk>
	<6dce9a0b0702151053v19ab190fh1f752fea3b2ed722@mail.gmail.com>
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503E2EBD3@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi Lincoln/List
 
That's great, the axis now appears, but there are no labels.  This in
itself isn't a problem, as long as we can assume that the tick marks are
at 0, 50% and 100%?  If that's true, we can go with what we have,
otherwise I'm going to have to figure out a way to label the y-axis
 
Thanks
Mick

________________________________

From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com] On Behalf
Of Lincoln Stein
Sent: 15 February 2007 18:53
To: michael watson (IAH-C)
Cc: BioPerl-List
Subject: Re: [Bioperl-l] The axis of GC content in
Bio::Graphics::glyph:dna


Hi Michael,

When you set up the panel, do this:

 Bio::Graphics::Panel->new(-blah -blah,
                                         -pad_left => 20,
                                          -pad_right => 20); 

This will leave enough room on the left and right for you to see the Y
axis. Otherwise it runs off the edge of the image (ok, this is a
mis-design, but it was the only way to solve a chicken-and-egg problem
about who gets to say how wide the panel is) 

Lincoln


On 2/15/07, michael watson (IAH-C) <michael.watson at bbsrc.ac.uk> wrote: 

	Hi
	
	OK I have some great images out of this glyph, but I can't see
the axis,
	and nor is it labelled (ie does it go from 0 - 100%?) so isn't
great for
	publication.  The docs say:
	
	"NOTE: -gc_window=>'auto' gives nice results and is recommended
for 
	drawing GC content. The GC content axes draw slightly outside
the
	panel, so you may wish to add some extra padding on the right
and
	left. "
	
	Any idea how to do this?
	
	Basically, I want a nice GC graph with the axis quite clearly
labelled, 
	and a nice "%GC" title next to it :)
	
	Thanks
	
	Mick
	
	The information contained in this message may be confidential or
legally
	privileged and is intended solely for the addressee. If you have

	received this message in error please delete it & notify the
originator
	immediately.
	Unauthorised use, disclosure, copying or alteration of this
message is
	forbidden & may be unlawful.
	The contents of this e-mail are the views of the sender and do
not 
	necessarily represent the views of the Institute.
	This email and associated attachments has been checked locally
for
	viruses but we can accept no responsibility once it has left our
	systems.
	Communications on Institute computers are monitored to secure
the 
	effective operation of the systems and for other lawful
purposes.
	
	_______________________________________________
	Bioperl-l mailing list
	Bioperl-l at lists.open-bio.org 
	http://lists.open-bio.org/mailman/listinfo/bioperl-l
	

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory 
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu 


From cjfields at uiuc.edu  Mon Feb 26 17:18:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 26 Feb 2007 11:18:38 -0600
Subject: [Bioperl-l] AlignIO problems
In-Reply-To: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
References: <BAY109-F1391C0C6FAEEA3B83565BFB4830@phx.gbl>
Message-ID: <7DF958E6-E233-427F-8901-3FE571CD99BD@uiuc.edu>


On Feb 26, 2007, at 9:59 AM, ? ?? wrote:

> Thank you!
> I have checked the sequences retrieved through lots of Bio:DB  
> objects work simultaneously.
> There are not problems you mentioned, the sequences are not  
> overwritten.

Again, keep this on the list.  I have my hands full this month so I  
will be checking the list only very sporadically; someone else may be  
able to help you.

The only explanation for the clustalw output you get is that you are  
not retrieving the correct sequence in some way fundamental way,  
which to me indicates the bug originates either in the way the  
sequences are retrieved (i.e. somehow via Bio::DB::Fasta, hence my  
thought about conflicting indices) or in the way they are converted  
via Bio::SeqIO, which is used in Bio::Tools::Run::Alignment::Clustalw.

When I have used Bio::DB::Fasta in the past I have never had a  
problem when indexing multiple files and retrieving sequences, so  
beyond running tests with your data I can't help you much beyond the  
above conjecturing.

chris


From jason at bioperl.org  Mon Feb 26 18:45:34 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 10:45:34 -0800
Subject: [Bioperl-l] Question to Bio::Tools::Run RemoteBlast
In-Reply-To: <20070226095515.68810@gmx.net>
References: <20070226095515.68810@gmx.net>
Message-ID: <2D2DF6D9-6DAE-4BB7-B31B-8C19CCCA7301@bioperl.org>

Alex -
I am glad to see of your interest in the module, but I don't  
currently have any time to maintain it and so queries should be sent  
to the BioPerl mailing list.  In general we prefer you don't contact  
developers directly, but use the mailing list so that others can  
learn from questions.

Please note there are several tutorials and documentation on the  
website, you will get a better response from people if you can show  
you have at least tried to use the existing example code to construct  
your program.

-jason
On Feb 26, 2007, at 1:55 AM, Alexander Auner wrote:

> Daer Jason Stajich,
> I hope you can me help.
>
> I am inspired of their module and would like to work with it.
> I am a student to the TFH Wildau.
> I have problems with the understanding of the module.
>
> You could send me an example.
>
> The example is to process a text file (FASTA) with NCBI-Blast (Web).
>
> Parameter:
> Choose database -> Others -> nr
> Limit by entrez query -> Campylobacter -> or select from: ->  
> Bacteria [ORGN]
> Expect -> 10
> Other advanced -> -q-1
>
> output format
> plain text without Graphical Overview
> Number of: -> Descriptions -> 10000
> Alignment view -> query-anchored with identities
>
> All other parameters remain undef.
>
> Thank you for your help.
>
> faithfully Alexander Auner
> -- 
> "Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ...
> Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out


From jason at bioperl.org  Mon Feb 26 19:13:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 26 Feb 2007 11:13:00 -0800
Subject: [Bioperl-l] BioPerl leadership additions
Message-ID: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>

Dear BioPerl Users and Developers,

I want to announce a addition in the leadership of BioPerl.   
Christopher Fields and and Sendu Bala are now members of the BioPerl  
Core developer group to recognize their ongoing leadership in the  
project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
release and have made a significant commitment and contribution to  
the quality of the code and the documentation of the project.  We  
have invited them to be part of the core to recognize their work and  
to feel comfortable to ask them to do more. ;-)

The Core group was established to insure that someone was responsible  
for making code releases, vetting new developers for CVS write  
accounts, and generally dealing with things that might otherwise slip  
through the cracks.  We are very excited to have more people  
contributing to and maintaining the toolkit.  We look forward to  
their help along with all the other developers, as we work towards a  
1.6 release release this year.

As always, while their is a need for some individuals to lead the  
project, we encourage contributions from all levels of expertise to  
improve the code, documentation, and tutorials of the project.

We plan to discuss the progress of the toolkit at this year's  
Bioinformatics Open Source Conference held in Vienna, Austria in  
conjunction with the SIG meetings at ISMB.   We are trying to use  
BOSC 2007 as a chance for the developers of Open Bioinformatics  
Foundation sponsored and related projects to coordinate future  
development and release cycles.

Jason Stajich on behalf of the Core developers


From khan at cshl.edu  Mon Feb 26 20:29:19 2007
From: khan at cshl.edu (Khan, Sohail)
Date: Mon, 26 Feb 2007 15:29:19 -0500
Subject: [Bioperl-l] parsing a list of ids to a fasta file.
Message-ID: <C8696843AE995F4EA4CDC3E2B83482A9018791CA@mailbox02.cshl.edu>

Thanks Michael.  I have the scripts installed.  I can pass an id to indexed fasta file and retrieve the seq.  However, I was wondering if I can pass a list of ids from a file and get seq. for all the ids?
Thanks.

-Sohail

-----Original Message-----
From: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk]
Sent: Tuesday, February 20, 2007 4:33 PM
To: Khan, Sohail; Bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] parsing a list of ids to a fasta file.


Suggest you use Bio::Index::Fasta to create an index for the fasta file and then a simple script to retrieve sequences using that index.  Or just use the pre-written bp_index.PLS and bp_fetch.PLS scripts.
 
http://www.bioperl.org/wiki/Module:Bio::Index::Fasta

________________________________

From: bioperl-l-bounces at lists.open-bio.org on behalf of Khan, Sohail
Sent: Tue 20/02/2007 8:42 PM
To: Bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] parsing a list of ids to a fasta file.


Dear list,

I am new to Bio-Perl.  I have the following question:
I have a list of ids, which I would like to parse against a large fasta file to retrieve the Seq for parsed ids.  I appreciate any suggestions.
Thanks.

Khan


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Mon Feb 26 21:44:49 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Mon, 26 Feb 2007 15:44:49 -0600
Subject: [Bioperl-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <45E354D1.4070600@campus.iztacala.unam.mx>

Congrats Chris & Sendu! Very well-deserved. Keep up the great work.

Cheers!
Mauricio.

Jason Stajich wrote:
> Dear BioPerl Users and Developers,
> 
> I want to announce a addition in the leadership of BioPerl.   
> Christopher Fields and and Sendu Bala are now members of the BioPerl  
> Core developer group to recognize their ongoing leadership in the  
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer  
> release and have made a significant commitment and contribution to  
> the quality of the code and the documentation of the project.  We  
> have invited them to be part of the core to recognize their work and  
> to feel comfortable to ask them to do more. ;-)
> 
> The Core group was established to insure that someone was responsible  
> for making code releases, vetting new developers for CVS write  
> accounts, and generally dealing with things that might otherwise slip  
> through the cracks.  We are very excited to have more people  
> contributing to and maintaining the toolkit.  We look forward to  
> their help along with all the other developers, as we work towards a  
> 1.6 release release this year.
> 
> As always, while their is a need for some individuals to lead the  
> project, we encourage contributions from all levels of expertise to  
> improve the code, documentation, and tutorials of the project.
> 
> We plan to discuss the progress of the toolkit at this year's  
> Bioinformatics Open Source Conference held in Vienna, Austria in  
> conjunction with the SIG meetings at ISMB.   We are trying to use  
> BOSC 2007 as a chance for the developers of Open Bioinformatics  
> Foundation sponsored and related projects to coordinate future  
> development and release cycles.
> 
> Jason Stajich on behalf of the Core developers
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From lubapardo at gmail.com  Tue Feb 27 13:26:30 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 27 Feb 2007 14:26:30 +0100
Subject: [Bioperl-l] parsing blast results
Message-ID: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>

Hi,
I am using the module Bio::SearchIO to parse some blast results. I would
like to store the ids of the results into an array but I am not sure if this
is possible to do it with an existing subroutine. Does anyone have an idea
whether there is a method included within the module Bio::SearchIO to do so?
Thanks in advance,
L.Pardo


From cjfields at uiuc.edu  Tue Feb 27 14:11:37 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 08:11:37 -0600
Subject: [Bioperl-l] parsing blast results
In-Reply-To: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
References: <58ff33550702270526q43ab584ci7baf6a09e2584e38@mail.gmail.com>
Message-ID: <E1B6ED22-1120-4333-AA73-19B57D102EA9@uiuc.edu>


On Feb 27, 2007, at 7:26 AM, Luba Pardo wrote:

> Hi,
> I am using the module Bio::SearchIO to parse some blast results. I  
> would
> like to store the ids of the results into an array but I am not  
> sure if this
> is possible to do it with an existing subroutine. Does anyone have  
> an idea
> whether there is a method included within the module Bio::SearchIO  
> to do so?
> Thanks in advance,
> L.Pardo

Bio::SearchIO doesn't currently have a method to retrieve all the  
accessions in a BLAST result.  The best way to do this is to iterate  
through the objects:

my @accs;

while (my $result = $searchio->next_result) {
     while (my $hit = $result->next_hit) {
         push @accs, $hit->accession;
         # do whatever else here...
     }
}

print join ',', @accs;

I don't think all accessions in the description are parsed out at the  
moment, just the first one (or the one in the hit table).  If you  
want all of them or if you want the NCBI GI you'll need to parse them  
out of the description heading ($hit->description).

chris


From sac at bioperl.org  Tue Feb 27 17:59:22 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Tue, 27 Feb 2007 09:59:22 -0800
Subject: [Bioperl-l] [Bioperl-announce-l] BioPerl leadership additions
In-Reply-To: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
References: <41922044-0BDA-4F29-B045-5E544A4F679F@bioperl.org>
Message-ID: <8f200b4c0702270959n7521f693l915bfabe7ccb7ef7@mail.gmail.com>

Welcome to the club, Chris & Sendu. Always good to have an infusion of new
blood and capable, motivated hands.

Steve

On 2/26/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Dear BioPerl Users and Developers,
>
> I want to announce a addition in the leadership of BioPerl.
> Christopher Fields and and Sendu Bala are now members of the BioPerl
> Core developer group to recognize their ongoing leadership in the
> project.  Chris and Sendu were instrumental in the 1.5.2 Developer
> release and have made a significant commitment and contribution to
> the quality of the code and the documentation of the project.  We
> have invited them to be part of the core to recognize their work and
> to feel comfortable to ask them to do more. ;-)
>
> The Core group was established to insure that someone was responsible
> for making code releases, vetting new developers for CVS write
> accounts, and generally dealing with things that might otherwise slip
> through the cracks.  We are very excited to have more people
> contributing to and maintaining the toolkit.  We look forward to
> their help along with all the other developers, as we work towards a
> 1.6 release release this year.
>
> As always, while their is a need for some individuals to lead the
> project, we encourage contributions from all levels of expertise to
> improve the code, documentation, and tutorials of the project.
>
> We plan to discuss the progress of the toolkit at this year's
> Bioinformatics Open Source Conference held in Vienna, Austria in
> conjunction with the SIG meetings at ISMB.   We are trying to use
> BOSC 2007 as a chance for the developers of Open Bioinformatics
> Foundation sponsored and related projects to coordinate future
> development and release cycles.
>
> Jason Stajich on behalf of the Core developers
>
> _______________________________________________
> Bioperl-announce-l mailing list
> Bioperl-announce-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-announce-l
>


From cjfields at uiuc.edu  Tue Feb 27 20:57:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 27 Feb 2007 14:57:40 -0600
Subject: [Bioperl-l] Bio::SeqIO::FTHelper
Message-ID: <D6922F04-A349-41C4-B4DC-6763E3195B05@uiuc.edu>

Could anyone tell me what FTHelper is used for?  From what I gather  
it rolls up seqfeature data into a lightweight object but then  
creates a SeqFeature::Generic anyway (at least for GenBank/EMBL/ 
Swiss), which seems to be a waste of memory and time.  Is there  
something I'm missing (besides my sanity of course)?

chris


From Jay at jays.net  Wed Feb 28 09:39:55 2007
From: Jay at jays.net (Jay Hannah)
Date: Wed, 28 Feb 2007 03:39:55 -0600
Subject: [Bioperl-l] "Command-Line Bioinformatics"
Message-ID: <F7C1E903-1712-40A5-B817-8CDAADECEBF4@jays.net>

Reading this article:
http://www.linuxjournal.com/article/6977
Sequencing the SARS Virus - Linux Journal, Nov 2003

This guy needs Perl and/or BioPerl.  :)

> The sequence file is in FASTA format consisting of a header line  
> and the sequence, split into fixed-width lines. The following  
> counts the number of Gs and Cs in the sequence and presents the  
> total as a fraction of the total number of bases:
>
> > grep -v "^>" AY274119.fa | fold -w 1 |
> tr "ATGC" "..xx" | sort | uniq -c |
> sed 's/[^0-9]//g' | t -s "\012" " " |
> sed 's/\([0-9]*\) \([0-9]*\)/scale = 3;
> ?\2 \/ (\1+\2)/' |
> bc -i
> scale = 3; 12127 / (17624+12127)
> .407
>
> Out of the 29,751 bases in our sequence, 12,127 are either G or C,  
> giving a GC content of 41%.

BioPerl version:

use Bio::SeqIO;
my $io = Bio::SeqIO->new(
   -file   => 'AY274119.fa',
   -format => 'Fasta'
);
my $seq = $io->next_seq->seq;
print ( ($seq =~ tr/GC/GC/) / length ($seq) );

Command-line Perl:

perl -e '$/ = undef; $_ = <>; s/>.*//; s/\n//g; print tr/GC/GC/ /  
length($_)' AY274119.fa

I'm sure you can Perl Golf my stabs at it.  :)

j
seqlab.net
http://www.bioperl.org/wiki/User:Jhannah


From n.saunders at uq.edu.au  Wed Feb 28 10:25:08 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:25:08 +1000
Subject: [Bioperl-l] Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E55884.9010908@uq.edu.au>

Dear Bioperlers,

I'm trying to understand an error that occurs when Bio::Factory::EMBOSS is used 
in a CGI script.  Using BioPerl 1.5.2 on Ubuntu Dapper, Apache 2.0.55, Perl 5.8.7.

If I load this test CGI script (cgi.pl) in a browser:

BEGIN CODE
----------
#!/usr/bin/perl -Tw
use strict;
use CGI;
use Bio::Factory::EMBOSS;

my $cgi = new CGI;
my $f   = new Bio::Factory::EMBOSS;

print $cgi->header,
       $cgi->start_html,
       $cgi->end_html;
--------
END CODE

I get a 500 server error and the Apache error log reads:
[error] [client 192.168.0.3] Premature end of script headers: cgi.pl

I can fix this in 2 ways:

(1) Move the "my $f = new Bio::Factory::EMBOSS" line to the end of the script, 
which isn't a very useful fix.
(2) Remove the -T switch from the shebang line

There seem to be a few old posts on the list regarding "taint-safe" modules.  It 
seems that the new Bio::Factory::EMBOSS object is interfering with the headers 
in some way, but I'm no CGI.pm guru and wondered if anyone could shed light on this.

thanks,
Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From n.saunders at uq.edu.au  Wed Feb 28 10:30:31 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:30:31 +1000
Subject: [Bioperl-l] more on Bio::Factory::EMBOSS, CGI and taint
Message-ID: <45E559C7.1090308@uq.edu.au>

Further to my previous email, adding:

BEGIN {
     $|=1;
     print "Content-type: text/html\n\n";
     use CGI::Carp('fatalsToBrowser');
}

to my CGI script generates:

Insecure $ENV{PATH} while running with -T switch at 
/usr/local/share/perl/5.8.7/Bio/Factory/EMBOSS.pm line 251.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From n.saunders at uq.edu.au  Wed Feb 28 10:50:58 2007
From: n.saunders at uq.edu.au (Neil Saunders)
Date: Wed, 28 Feb 2007 20:50:58 +1000
Subject: [Bioperl-l] CGI taint solved
Message-ID: <45E55E92.10608@uq.edu.au>

Apologies for running a one-man thread, but I realised that I've now answered my 
own question regarding errors with CGI, Bio::Factory::EMBOSS and taint.

Given that the EMBOSS binaries are in /usr/local/bin, adding:

$ENV{'PATH'} = '/usr/local/bin'

near the top of the script does the trick.


Neil
-- 
  School of Molecular and Microbial Sciences
  University of Queensland
  Brisbane 4072 Australia

http://nsaunders.wordpress.com


From cjfields at uiuc.edu  Wed Feb 28 13:39:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 07:39:24 -0600
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <45E55E92.10608@uq.edu.au>
References: <45E55E92.10608@uq.edu.au>
Message-ID: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>

That could possibly clobber any other program calls from within the  
same script (unless they reside in /usr/local/bin) since you're  
explicitly assigning PATH, not appending:

$ENV{"PATH"} = '/usr/local/bin';

gets me (printing $ENV{"PATH"}):

/usr/local/bin

whereas this:

$ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};

gets me:

/usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin

There's probably a File::* module that does this safely per OS flavor.

chris

On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:

> Apologies for running a one-man thread, but I realised that I've  
> now answered my
> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
> taint.
>
> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>
> $ENV{'PATH'} = '/usr/local/bin'
>
> near the top of the script does the trick.
>
>
> Neil
> -- 
>   School of Molecular and Microbial Sciences
>   University of Queensland
>   Brisbane 4072 Australia
>
> http://nsaunders.wordpress.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From stefan.kirov at bms.com  Wed Feb 28 15:35:31 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 10:35:31 -0500
Subject: [Bioperl-l] CGI taint solved
In-Reply-To: <E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
References: <45E55E92.10608@uq.edu.au>
	<E4959877-011C-4F2C-ABA2-20D3876F6B4C@uiuc.edu>
Message-ID: <45E5A143.3080303@bms.com>

Neil, I believe this is your situation:
http://wn.cyberwerks.com/2000/0411.html
my advice: any commands executed from within cgi script should have a 
path hardcoded whenever possible.
If those commands require different path, try writing a wrapper shell 
script that sets the environment (which should be reset to the default 
once the shell script terminates). It all also depends on the type of 
environment you have- it it is not secure you may wish to think hard how 
to eliminate all security loopholes with CGI, I am definitely not an 
expert on this.
Stefan

Chris Fields wrote:
> That could possibly clobber any other program calls from within the  
> same script (unless they reside in /usr/local/bin) since you're  
> explicitly assigning PATH, not appending:
>
> $ENV{"PATH"} = '/usr/local/bin';
>
> gets me (printing $ENV{"PATH"}):
>
> /usr/local/bin
>
> whereas this:
>
> $ENV{"PATH"} = '/usr/local/bin:' . $ENV{"PATH"};
>
> gets me:
>
> /usr/local/bin:/Users/cjfields/bin:/Users/cjfields/dart/bin:/usr/ 
> local/mysql/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
>
> There's probably a File::* module that does this safely per OS flavor.
>
> chris
>
> On Feb 28, 2007, at 4:50 AM, Neil Saunders wrote:
>
>   
>> Apologies for running a one-man thread, but I realised that I've  
>> now answered my
>> own question regarding errors with CGI, Bio::Factory::EMBOSS and  
>> taint.
>>
>> Given that the EMBOSS binaries are in /usr/local/bin, adding:
>>
>> $ENV{'PATH'} = '/usr/local/bin'
>>
>> near the top of the script does the trick.
>>
>>
>> Neil
>> -- 
>>   School of Molecular and Microbial Sciences
>>   University of Queensland
>>   Brisbane 4072 Australia
>>
>> http://nsaunders.wordpress.com
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From lubapardo at gmail.com  Wed Feb 28 17:21:07 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Wed, 28 Feb 2007 18:21:07 +0100
Subject: [Bioperl-l] retrieven ids
Message-ID: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>

Hi everyone,
I wonder if someone could give an advice of the following:
I want to retrieve the DNA coding sequence of a RefSeq protein id. I do not
want to translate the protein back to DNA, but rather get the DNA coding
sequence ID and then retrieve the DNA sequence from Gen Bank. Is there any
module that allow to get all possible ids for a sequence given a gi protein
?

Thank you very much in advance,
L. Pardo


From johnston at biochem.ucl.ac.uk  Wed Feb 28 17:05:49 2007
From: johnston at biochem.ucl.ac.uk (Caroline Johnston)
Date: Wed, 28 Feb 2007 17:05:49 +0000 (GMT)
Subject: [Bioperl-l] _rearrange
Message-ID: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>

hi,

Is there a discussion of the rationale behind the _rearrange method
somewhere? I'm probably just being gormless, but I think I'm missing the
point a bit.

Is it okay for a method just to expect named params like
->foo(arg1=>'stuff', arg2=>'things'); ?

Cxx


From ckuanglim at yahoo.com  Wed Feb 28 15:51:50 2007
From: ckuanglim at yahoo.com (Chan Kuang Lim)
Date: Wed, 28 Feb 2007 07:51:50 -0800 (PST)
Subject: [Bioperl-l] Problem of Installing Bioperl
Message-ID: <459942.77644.qm@web60518.mail.yahoo.com>

I have problem of installing bioperl in windows using command-line installation.
In the cmd windows, after 
ppm-shell
search bioperl
install 2

many downloading had done, but the next line is:
Unpacking bioperl-1.5.2_100...ppm install failed: Can't extract files from C:.............../Bioperl-1.5.2_100.tgz


Hope you can answer my question. Thank you.

Regards,
Chan Kuang Lim
Malaysia

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From cjfields at uiuc.edu  Wed Feb 28 18:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 12:30:45 -0600
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <25C736A2-2DCA-413B-8F92-D799F583515B@uiuc.edu>

 From what I gather it's a convenient utility method that is used for  
consistent and enforced parameter checking/setting for any method,  
including the constructor.

There are a few modules that don't use _rearrange (Bio::WebAgent::new 
() comes to mind).  It's not required that you use it but the naming  
conventions for parameters outlined in _rearrange (in  
Bio::Root::RootI POD) are generally enforced for consistency across  
classes.

As a note, Sendu has committed a related method (_set_from_args) to  
CVS which works rather well, but I don't think it is in the last  
release.

chris

On Feb 28, 2007, at 11:05 AM, Caroline Johnston wrote:

> hi,
>
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm  
> missing the
> point a bit.
>
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?
>
> Cxx
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dmessina at wustl.edu  Wed Feb 28 19:31:29 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 13:31:29 -0600 (CST)
Subject: [Bioperl-l] retrieven ids
In-Reply-To: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
References: <58ff33550702280921s25f749aagd6e62b1a5cc34edb@mail.gmail.com>
Message-ID: <57419.10.0.7.57.1172691089.squirrel@gscmail.wustl.edu>

Whenever I'm unsure of how to do something, I first look to see if one of
the  HOWTOs on bioperl.org covers it. In this case, the Features HOWTO has
example code which I think will do what you want.

Genbank records typically have the coding sequence of a protein as a
feature, so I would do something like:

- use the RefSeq protein IDs to query Entrez and get back the Genbank
records.

- read the Features HOWTO to refresh my memory on the syntax for grabbing
features.

That HOWTO is at:
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

- whip up a little script to loop through the Genbank records one at a
time with SeqIO and pull out the cDNA sequence features.


Dave


From bix at sendu.me.uk  Wed Feb 28 19:38:46 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 19:38:46 +0000
Subject: [Bioperl-l] _rearrange
In-Reply-To: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
References: <Pine.LNX.4.58.0702281624380.8207@localhost.localdomain>
Message-ID: <45E5DA46.3020503@sendu.me.uk>

Caroline Johnston wrote:
> hi,
> 
> Is there a discussion of the rationale behind the _rearrange method
> somewhere? I'm probably just being gormless, but I think I'm missing the
> point a bit.
> 
> Is it okay for a method just to expect named params like
> ->foo(arg1=>'stuff', arg2=>'things'); ?

The Bioperl style for named args is -arg1, and wrong case is allowed as 
well. So, make use of _rearrange; it won't do you any harm.


From johnsonm at gmail.com  Wed Feb 28 19:59:09 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 13:59:09 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for Genemark
	and Glimmer
Message-ID: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>

    I happen to need something like Bio::Tools::Run::Genemark, so I'm coding
one up.  When I started on the tests for it, I realized I have a problem.  I
can distribute a fasta file downloaded from GenBank to use as input, but I
can't distribute the model file needed to actually run Genemark (
Genemark.hmm for prokaryotes, gmhmmp, in my case).
    It took *forever* to get a license, and I'm not thrilled with the
prospect of talking them out of a redistributable model file.  I'd love to
distribute the test, but I don't see how I'm going to be able to.
Suggestions?
    Also, I've settled on IPC::Run instead of system().  The docs indicate
the bits of it I'm using should be OK on Windows, except maybe for Win9X.
I don't want to clutter up the console, I don't like embedding stdout/stderr
redirection in command strings, and I don't want to have to worry about
signal handling (What if the child catches a ctrl-c halfway through
parsing?  What if the parent does?).  Anybody object to that?
   One final thing.  I'm lazy, I don't want to deal with parsing arguments
to the constructor, so I'm just calling _rearrange() to deal with it.  The
Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
stuff in Bio::Tools::Run:: takes dashless args.  Objections?


From dmessina at wustl.edu  Wed Feb 28 20:14:56 2007
From: dmessina at wustl.edu (Dave Messina)
Date: Wed, 28 Feb 2007 14:14:56 -0600 (CST)
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>

> I'm not thrilled with the prospect of talking them out of a
redistributable
> model file.

I suppose it's not possible to fake your own, or at least the parts of it
you're testing for?

If not, I'd put the tests in a skip block while waiting to hear from the
Genemark folks.


> The Bio::Tools:: parsers all take dash options, but it looks like a
bunch of
> the stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu will chime in I'm sure, but I think he was planning to switch
everything  in Bio::Tools::Run over to dashed args anyway...


Dave


From bix at sendu.me.uk  Wed Feb 28 20:52:23 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 20:52:23 +0000
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
 Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <45E5EB87.9020106@sendu.me.uk>

Mark Johnson wrote:
>    One final thing.  I'm lazy, I don't want to deal with parsing arguments
> to the constructor, so I'm just calling _rearrange() to deal with it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

You can make use of _set_from_args(). See Bio::Tools::Run::Phylo::Gumby 
for an example.


From bix at sendu.me.uk  Wed Feb 28 21:29:32 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 21:29:32 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
Message-ID: <45E5F43C.9080902@sendu.me.uk>

I have GD 2.35 and GD::SVG 2.33 installed.

I have a working script in which a Bio::Graphics::Panel object is made 
and output with:

print $panel->png;

This is fine. Changing it to:

print $panel->svg;

Gives the error:

Can't locate object method "svg" via package "GD:Image" at 
/.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.


Am I supposed to do something else to get this to work?


Cheers,
Sendu.


From crabtree at tigr.ORG  Wed Feb 28 21:40:52 2007
From: crabtree at tigr.ORG (Jonathan Crabtree)
Date: Wed, 28 Feb 2007 16:40:52 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F6E4.80003@tigr.org>


Sendu-

I believe you must set 'image_class' to 'GD::SVG' when you create the 
Panel (and note that older versions of Bio::Graphics::Panel don't know 
anything about this parameter.)  Here's the relevant part of the Panel 
perldoc:

   -image_class To create output in scalable vector
                graphics (SVG), optionally pass the image
                class parameter 'GD::SVG'. Defaults to
                using vanilla GD. See the corresponding
                image_class() method below for details.

Jonathan


Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
> 
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
> 
> print $panel->png;
> 
> This is fine. Changing it to:
> 
> print $panel->svg;
> 
> Gives the error:
> 
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
> 
> 
> Am I supposed to do something else to get this to work?
> 
> 
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Wed Feb 28 22:01:17 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 28 Feb 2007 22:01:17 +0000
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F6E4.80003@tigr.org>
References: <45E5F43C.9080902@sendu.me.uk> <45E5F6E4.80003@tigr.org>
Message-ID: <45E5FBAD.3030404@sendu.me.uk>

Jonathan Crabtree wrote:
> 
> Sendu-
> 
> I believe you must set 'image_class' to 'GD::SVG' when you create the 
> Panel (and note that older versions of Bio::Graphics::Panel don't know 
> anything about this parameter.)  Here's the relevant part of the Panel 
> perldoc:

... Oh! I had no idea there was any perldoc for these modules, hiding 
down there at the bottom. Does anyone want to intersperse the docs?...


From cjfields at uiuc.edu  Wed Feb 28 22:10:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 28 Feb 2007 16:10:54 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
Message-ID: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>

On Feb 28, 2007, at 1:59 PM, Mark Johnson wrote:

>     I happen to need something like Bio::Tools::Run::Genemark, so  
> I'm coding
> one up.  When I started on the tests for it, I realized I have a  
> problem.  I
> can distribute a fasta file downloaded from GenBank to use as  
> input, but I
> can't distribute the model file needed to actually run Genemark (
> Genemark.hmm for prokaryotes, gmhmmp, in my case).
>     It took *forever* to get a license, and I'm not thrilled with the
> prospect of talking them out of a redistributable model file.  I'd  
> love to
> distribute the test, but I don't see how I'm going to be able to.
> Suggestions?

For bioperl-run tests you have to have the program installed for  
tests to work (otherwise they are passed over).  Therefore one would  
assume if you had the GeneMark program you would have the models as  
well.

You could set up your module to require an env. variable be set (like  
the HMMER module, for instance) which contains the executables and/or  
the models, so that if it isn't set the tests are skipped.

>     Also, I've settled on IPC::Run instead of system().  The docs  
> indicate
> the bits of it I'm using should be OK on Windows, except maybe for  
> Win9X.
> I don't want to clutter up the console, I don't like embedding  
> stdout/stderr
> redirection in command strings, and I don't want to have to worry  
> about
> signal handling (What if the child catches a ctrl-c halfway through
> parsing?  What if the parent does?).  Anybody object to that?

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?   
Otherwise we'll need to add it to the optional dependencies for  
bioperl-run.

>    One final thing.  I'm lazy, I don't want to deal with parsing  
> arguments
> to the constructor, so I'm just calling _rearrange() to deal with  
> it.  The
> Bio::Tools:: parsers all take dash options, but it looks like a  
> bunch of the
> stuff in Bio::Tools::Run:: takes dashless args.  Objections?

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in  
another thread _rearrange() works as well.

chris


From johnsonm at gmail.com  Wed Feb 28 22:29:36 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:29:36 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<57538.10.0.7.57.1172693696.squirrel@gscmail.wustl.edu>
Message-ID: <ebf5eb170702281429u51e8f7fgb9c0591a410500f8@mail.gmail.com>

On 2/28/07, Dave Messina <dmessina at wustl.edu> wrote:
>
> > I'm not thrilled with the prospect of talking them out of a
> redistributable model file.
>
> I suppose it's not possible to fake your own, or at least the parts of it
> you're testing for?


We got a gzipped tarball with some model files and a precompiled executable
(gmhmmp).  As far as building a model file goes, I don't even have two
sticks to rub together.  I suppose it's possible that it's not actually some
weird proprietary format, I'll go dig for some docs...but I don't hold out a
lot of hope.


From sukhinder.sandhu at osumc.edu  Wed Feb 28 21:49:31 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Wed, 28 Feb 2007 16:49:31 -0500
Subject: [Bioperl-l] Problem installing bioperl: plz reply soon. thx
Message-ID: <C20B631B.1E0%sukhinder.sandhu@osumc.edu>

Hi
I am having trouble installing Bundle::BioPerl through CPAN. I don't know if
this has something to do with my having root priveleges. Can you please
suggest how may I proceed to get over this. I shall really appreciate any
help. I am pasting part of the error it keeps giving after trying to install
every module.
######################
CPAN.pm: Going to build G/GA/GAAS/HTML-Parser-3.56.tar.gz

make: *** No rule to make target
`/System/Library/Perl/5.8.6/darwin-thread-multi-2level/CORE/config.h',
needed by `Makefile'.  Stop.
  /usr/bin/make  -- NOT OK
Running make test
  Can't test without successful make
Running make install
  make had returned bad status, install seems impossible

###############################
Thanks

sukhinder


From sukhinder.sandhu at osumc.edu  Wed Feb 28 04:41:43 2007
From: sukhinder.sandhu at osumc.edu (Sukhinder Sandhu)
Date: Tue, 27 Feb 2007 23:41:43 -0500
Subject: [Bioperl-l] Problem installing bioperl-1.5.2_102
Message-ID: <C20A7237.1DB%sukhinder.sandhu@osumc.edu>

Hi
I am trying to install bioperl on my MACOSX and having problems. I try to
following the instructions both at the www.tc.umn.edu..... And in the README
and INSTALL files in the bioperl folder that I downloaded.
The error I get is the following: (end part of the output is copied)
####################
t/versions........ok
t/xs..............skipped
        all skipped: C_support not enabled
Failed Test Stat Wstat Total Fail  Failed  List of Failed
----------------------------------------------------------------------------
---
t/compat.t     5  1280    60    5   8.33%  25-28 31
4 tests and 31 subtests skipped.
Failed 1/22 test scripts, 95.45% okay. 5/683 subtests failed, 99.27% okay.
make: *** [test] Error 2
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force
Couldn't install Module::Build, giving up.
BEGIN failed--compilation aborted at ModuleBuildBioperl.pm line 51.
Compilation failed in require at Build.PL line 14.
BEGIN failed--compilation aborted at Build.PL line 14.
###########################################################################
I am not able to figure out whats' going wrong.
And when I try to run the CPAN, I get the follwing error. I have no idea how
to fix these. Any help is greatly appreciated.
############################################################################
[Sukhinders-Computer:~/Desktop/bioperl-1.5.2_102] sand60% perl -MCPAN -e
shell  Terminal does not support AddHistory.

There seems to be running another CPAN process (pid 7207).  Contacting...
Lockfile /Users/sand60/.cpan/.lock not writeable by you. Cannot proceed.
    On UNIX try:
    rm /Users/sand60/.cpan/.lock
  and then rerun us.
 at -e line 1
###################################################
And doing what it says, removing some lock file doesn't help. I am wondering
if all this has something to do with having root priveleges on the system
and if so , is there an alternative? Thanks


sukhinder


From stefan.kirov at bms.com  Wed Feb 28 21:44:05 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Wed, 28 Feb 2007 16:44:05 -0500
Subject: [Bioperl-l] SVG output with Bio::Graphics::Panel fails
In-Reply-To: <45E5F43C.9080902@sendu.me.uk>
References: <45E5F43C.9080902@sendu.me.uk>
Message-ID: <45E5F7A5.3090805@bms.com>

I think you should create the object with -image_class='svg'. Can you 
post the code with wich you create the object?
Stefan

Sendu Bala wrote:
> I have GD 2.35 and GD::SVG 2.33 installed.
>
> I have a working script in which a Bio::Graphics::Panel object is made 
> and output with:
>
> print $panel->png;
>
> This is fine. Changing it to:
>
> print $panel->svg;
>
> Gives the error:
>
> Can't locate object method "svg" via package "GD:Image" at 
> /.../Bio/Graphics/Panel.pm line 971, <DATA> line 192.
>
>
> Am I supposed to do something else to get this to work?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From johnsonm at gmail.com  Wed Feb 28 22:54:02 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 28 Feb 2007 16:54:02 -0600
Subject: [Bioperl-l] Issues with new Bio::Tools::Run modules for
	Genemark and Glimmer
In-Reply-To: <894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
References: <ebf5eb170702281159y7dcedd88n7bdef2c8bb9d3288@mail.gmail.com>
	<894754FB-2B20-415F-A6FC-7ACE839C6540@uiuc.edu>
Message-ID: <ebf5eb170702281454i52cea9eeqbcaebee5e4ec5e0e@mail.gmail.com>

On 2/28/07, Chris Fields <cjfields at uiuc.edu> wrote:

> For bioperl-run tests you have to have the program installed for
> tests to work (otherwise they are passed over).  Therefore one would
> assume if you had the GeneMark program you would have the models as
> well.
>
> You could set up your module to require an env. variable be set (like
> the HMMER module, for instance) which contains the executables and/or
> the models, so that if it isn't set the tests are skipped.


Sounds like a plan.

I wouldn't worry too much about Win9x.  Is IPC::Run in perl core?
> Otherwise we'll need to add it to the optional dependencies for
> bioperl-run.


I'd test it, but I don't have access to any Win9x boxes anymore.  IPC::Run
is not a core module, but I think it's worth the dependency.  I considered
IPC::Open3, but it can't be made reliable on Win32, something about not
being able to select() on filehandles, only sockets.  I also looked at
IPC::Run3, but under the hood, it's just got STDOUT/STDERR redirection
layered on top of system().  I don't like using system() due to issues with
signals (Such as the user hitting ctrl-c and taking out the child).  I feel
better knowing the wrapped executable is in another process disconnected
from the console.

Sendu's suggestion (_set_from_args() ) is the best.  As mentioned in
> another thread _rearrange() works as well.


I'm using _rearrange() now.  I'll look at _set_from_args().  Is either one
preferred to the other?